Biscuit: A Framework for Near-Data Processing of Big Data Workloads

被引:179
作者
Gu, Boncheol [1 ]
Yoon, Andre S. [1 ]
Bae, Duck-Ho [1 ]
Jo, Insoon [1 ]
Lee, Jinyoung [1 ]
Yoon, Jonghyun [1 ]
Kang, Jeong-Uk [1 ]
Kwon, Moonsang [1 ]
Yoon, Chanho [1 ]
Cho, Sangyeun [1 ]
Jeong, Jaeheon [1 ]
Chang, Duckhyun [1 ]
机构
[1] Samsung Elect Co Ltd, Memory Business, Suwon, South Korea
来源
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA) | 2016年
关键词
near-data processing; in-storage computing; SSD;
D O I
10.1109/ISCA.2016.23
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data-intensive queries are common in business intelligence, data warehousing and analytics applications. Typically, processing a query involves full inspection of large in-storage data sets by CPUs. An intuitive way to speed up such queries is to reduce the volume of data transferred over the storage network to a host system. This can be achieved by filtering out extraneous data within the storage, motivating a form of near-data processing. This work presents Biscuit, a novel near-data processing framework designed for modern solid-state drives. It allows programmers to write a data-intensive application to run on the host system and the storage system in a distributed, yet seamless manner. In order to offer a high-level programming model, Biscuit builds on the concept of data flow. Data processing tasks communicate through typed and data-ordered ports. Biscuit does not distinguish tasks that run on the host system and the storage system. As the result, Biscuit has desirable traits like generality and expressiveness, while promoting code reuse and naturally exposing concurrency. We implement Biscuit on a host system that runs the Linux OS and a high-performance solid-state drive. We demonstrate the effectiveness of our approach and implementation with experimental results. When data filtering is done by hardware in the solid-state drive, the average speed-up obtained for the top five queries of TPC-H is over 15x.
引用
收藏
页码:153 / 165
页数:13
相关论文
共 33 条
[1]   Active disks: Programming model, algorithms and evaluation [J].
Acharya, A ;
Uysal, M ;
Saltz, J .
ACM SIGPLAN NOTICES, 1998, 33 (11) :81-91
[2]  
[Anonymous], 2011, Airways (Pty) Ltd v Aviation Union of South Africa Others 2011 (3) SA 148 (SCA) paras 25-26, P25, DOI [10.1145/1989323.1989327, DOI 10.1145/1989323.1989327]
[3]   Intelligent SSD: A Turbo for Big Data Mining [J].
Bae, Duck-Ho ;
Kim, Jin-Hyung ;
Kim, Sang-Wook ;
Oh, Hyunok ;
Park, Chanik .
PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, :1573-1576
[4]   NEAR-DATA PROCESSING: INSIGHTS FROM A MICRO-46 WORKSHOP [J].
Balasubramonian, Rajeev ;
Chang, Jichuan ;
Manning, Troy ;
Moreno, Jaime H. ;
Murphy, Richard ;
Nair, Ravi ;
Swanson, Steven .
IEEE MICRO, 2014, 34 (04) :36-42
[5]   FAST STRING SEARCHING ALGORITHM [J].
BOYER, RS ;
MOORE, JS .
COMMUNICATIONS OF THE ACM, 1977, 20 (10) :762-772
[6]   The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117
[7]  
BRYANT RE, 2007, CMUCS07128
[8]  
Calder B, 2011, SOSP 11: PROCEEDINGS OF THE TWENTY-THIRD ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, P143
[9]  
Cho S., 2013, P 27 INT ACM C INT C, P91102, DOI DOI 10.1145/2464996.2465003
[10]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137