A Many-core Architecture for In-Memory Data Processing

被引:20
作者
Agrawal, Sandeep R. [1 ]
Idicula, Sam [1 ]
Raghavan, Arun [1 ]
Vlachos, Evangelos [1 ]
Govindaraju, Venkatraman [1 ]
Varadarajan, Venkatanathan [1 ]
Balkesen, Cagri [1 ]
Giannikis, Georgios [1 ]
Roth, Charlie [1 ]
Agarwal, Nipun [1 ]
Sedlar, Eric [1 ]
机构
[1] Oracle Labs, Burlington, MA 01803 USA
来源
50TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO) | 2017年
关键词
Accelerator; Big data; Microarchitecture; Databases; DPU; Low power; Analytics Processor; In-Memory Data Processing; Data Movement System; SUPPORT; VISION;
D O I
10.1145/3123939.3123985
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
For many years, the highest energy cost in processing has been data movement rather than computation, and energy is the limiting factor in processor design [21]. As the data needed for a single application grows to exabytes [56], there is clearly an opportunity to design a bandwidth-optimized architecture for big data computation by specializing hardware for data movement. We present the Data Processing Unit or DPU, a shared memory many-core that is specifically designed for high bandwidth analytics workloads. The DPU contains a unique Data Movement System (DMS), which provides hardware acceleration for data movement and partitioning operations at the memory controller that is sufficient to keep up with DDR bandwidth. The DPU also provides acceleration for core to core communication via a unique hardware RPC mechanism called the Atomic Transaction Engine. Comparison of a DPU chip fabricated in 40nm with a Xeon processor on a variety of data processing applications shows a 3x-15x performance per watt advantage.
引用
收藏
页码:245 / 258
页数:14
相关论文
共 62 条
[1]   Column-oriented Database Systems [J].
Abadi, Daniel J. ;
Boncz, Peter A. ;
Harizopoulos, Stavros .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (02) :1664-1665
[2]  
ABADI M, 2015, TENSORFLOW LARGE SCA, DOI DOI 10.48550/ARXIV.1605.08695
[3]   Exploiting Accelerators for Efficient High Dimensional Similarity Search [J].
Agrawal, Sandeep R. ;
Dee, Christopher M. ;
Lebeck, Alvin R. .
ACM SIGPLAN NOTICES, 2016, 51 (08) :25-36
[4]   Rhythm: Harnessing Data Parallel Hardware for Server Workloads [J].
Agrawal, Sandeep R. ;
Pistol, Valentin ;
Pang, Jun ;
Tran, John ;
Tarjan, David ;
Lebeck, Alvin R. .
ACM SIGPLAN NOTICES, 2014, 49 (04) :19-34
[5]  
Andersen DG, 2009, SOSP'09: PROCEEDINGS OF THE TWENTY-SECOND ACM SIGOPS SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, P1
[6]  
[Anonymous], 2007, Discrete Mathematics and Theoretical Computer Science
[7]  
[Anonymous], 2016, The 49th Annual IEEE/ACM International Symposium on Microarchitecture, DOI [10.1109/MICRO.2016.7783710, DOI 10.1109/MICRO.2016.7783710]
[8]  
[Anonymous], Nvidia tesla p100 gpu accelerator pcie datasheet
[9]  
Austin Chad, 2013, SAJSON SINGLE ALLOCA
[10]   On-line handwriting recognition with support vector machines - A kernel approach [J].
Bahlmann, C ;
Haasdonk, B ;
Burkhardt, H .
EIGHTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION: PROCEEDINGS, 2002, :49-54