AHEAD: Automated Framework for Hardware Accelerated Iterative Data Analysis

被引:0
作者
Songhori, Ebrahim M. [1 ]
Mirhoseini, Azalia [1 ]
Lu, Xuyang [1 ]
Koushanfar, Farinaz [1 ]
机构
[1] Rice Univ, Houston, TX 77005 USA
来源
2015 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE) | 2015年
关键词
Iterative Solver; Gram Matrix; Least Squares; FP-GAs; Sparse Approximation; FISTA; HLS; Dense Matrix; API;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces AHEAD, a novel domain-specific framework for automated (hardware-based) acceleration of massive data analysis applications with a dense (non-sparse) correlation matrix. Due to non-scalability of matrix inversion, often iterative computation is used for converging to a solution. AHEAD addresses two sets of domain-specific matrix computation challenges. First, the I/O and memory bandwidth constraints which limit the performance of hardware accelerators. Second, the hardness of handling large data because of the complexity of the known matrix transformations and the inseparability of non-sparse correlations. The inseparability problem translates to an increased communication cost with the accelerators. To optimize the performance within these limits, AHEAD learns the dependency structure of the domain data and suggests a scalable matrix transformation. The transformation minimizes the memory access required for matrix computing within an error threshold and thus, optimizes the mapping of domain data to the available (bandwidth constrained) accelerator resources. To facilitate automation, AHEAD also provides an Application Programming Interface (API) so users can customize the framework to an arbitrary iterative analysis algorithm and hardware mapping. Proof-of-concept implementation of AHEAD is performed on the widely used compressive sensing and general l(1) regularized least squares solvers. On a massive light field imaging data set with 4.6B non-zeros, AHEAD attains up to 320x iteration speed improvement using reconfigurable hardware accelerators compared with the conventional solver and about 4x improvement compared to our transformed matrix solver on a general purpose processor (without hardware acceleration).
引用
收藏
页码:942 / 947
页数:6
相关论文
共 27 条
  • [1] Aharon Michal, 2006, SIGNAL PROCESSING IE
  • [2] Anderson M., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P48, DOI 10.1109/IPDPS.2011.15
  • [3] [Anonymous], FCCM
  • [4] [Anonymous], 2004, PAR DISTR PROC S 200
  • [5] [Anonymous], 2005, P 2005 ACM SIGDA 13
  • [6] [Anonymous], FIELD PROGR CUST COM
  • [7] [Anonymous], THESIS
  • [8] [Anonymous], JSTOR
  • [9] [Anonymous], 2005, JMLR
  • [10] [Anonymous], IEEE DESIGN TEST COM