Compact Filters for Fast Online Data Partitioning

被引:12
作者
Zheng, Qing [1 ]
Cranor, Charles D. [1 ]
Jain, Ankush [1 ]
Ganger, Gregory R. [1 ]
Gibson, Garth A. [1 ]
Amvrosiadis, George [1 ]
Settlemyer, Bradley W. [2 ]
Grider, Gary [2 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Los Alamos Natl Lab, Los Alamos, NM USA
来源
2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) | 2019年
关键词
PERFORMANCE;
D O I
10.1109/cluster.2019.8890992
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We are approaching a point in time when it will be infeasible to catalog and query data after it has been generated. This trend has fueled research on in-situ data processing (i.e. operating on data as it is streamed to storage). One important example of this approach is in-situ data indexing. Prior work has shown the feasibility of indexing at scale as a two-step process. First, one partitions data by key across the CPU cores of a parallel job. Then each core indexes its subset as data is persisted. Online partitioning requires transferring data over the network so that it can be indexed and stored by the core responsible for the data. This approach is becoming increasingly costly as new computing platforms emphasize parallelism instead of individual core performance that is crucial for communication libraries and systems software in general. In addition to indexing, scalable online data partitioning is also useful in other contexts such as load balancing and efficient compression. We present FilterKV, an efficient data management scheme for fast online data partitioning of key-value (KV) pairs. FilterKV reduces the total amount of data sent over the network and to storage. We achieve this by: (a) partitioning pointers to KV pairs instead of the KV pairs themselves and (b) using a compact format to represent and store KV pointers. Results from LANL show that FilterKV can reduce total write slowdown (including partitioning overhead) by up to 3x across 4096 CPU cores.
引用
收藏
页码:313 / 324
页数:12
相关论文
共 58 条
[1]  
Alverson Bob, 2012, White Paper WP-Aries01-1112
[2]  
Anand Ashok., 2010, Proc. Conference on Networked Systems Design and Implementation (NSDI), P29
[3]  
[Anonymous], 2018, P INT C HIGH PERFORM, DOI DOI 10.1109/SC.2018.00006
[4]  
[Anonymous], 2015, TUNING PARALLEL I O
[5]  
[Anonymous], 2013, CLUSTER, DOI DOI 10.1109/CLUSTER.2013.6702617
[6]  
[Anonymous], 2015, PROC 7 USENIX WORKSH
[7]  
[Anonymous], 2013, TRILLION PARTICLES 1
[8]  
[Anonymous], SC
[9]  
[Anonymous], TECH REP
[10]  
[Anonymous], COMPUTATIONAL SCI DI