Simplifying Index File Structure to Improve I/O Performance of Parallel Indexing

被引:0
作者
Chiu, Hsuan-Te [1 ]
Chou, Jerry [1 ]
Vishwanath, Venkat [2 ]
Byna, Surendra [3 ]
Wu, Kesheng [3 ]
机构
[1] Natl Tsing Hua Univ, Hsinchu 30013, Taiwan
[2] Argonne Natl Lab, Argonne, IL 60439 USA
[3] Lawrence Berkeley Natl Lab, Berkeley, CA USA
来源
2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS) | 2014年
关键词
Parallel I/O; Storage system; Bitmap indexing;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Complex indexing techniques are needed to reduce the time of analyzing massive scientific datasets, but generating these indexing data structures can be very time consuming. In this work, we propose a set of strategies to simplify the index file structure and to improve the I/O performance during index construction using FastQuery, which is a parallel indexing and querying system for scientific data. FastQuery has been used to analyze data from various scientific applications, including a trillion plasma particles simulation. To accelerate query process, FastQuery uses FastBit to build indexes, and then stores the indexes into file system through parallel scientific data format libraries, such as HDF5. Although these data format libraries are designed to support more complex multi-dimensional arrays, we observed that it still takes considerable work to map the indexing data structures into arrays, especially on parallel machines. To address this problem, in this paper, we attempt to minimize the I/O time by storing indexes into our self-defined binary data format. By fully controlling the data structure, we can minimize the I/O synchronization overhead and explore more efficient I/O strategy for storing indexes. Our experiments of indexing a trillion particle dataset using 20,000 cores of a supercomputer show that the proposed binary I/O driver can reach 85% of the peak I/O bandwidth on the system, and achieves a speedup of up to 4X in terms of the total execution time comparing to the previous FastQuery implementation with HDF5 I/O driver.
引用
收藏
页码:576 / 583
页数:8
相关论文
共 19 条
[1]  
Aguilera MK, 2008, PROC VLDB ENDOW, V1, P598
[2]  
[Anonymous], 2008, SUPERCOMPUTING 2008
[3]  
[Anonymous], SSDBM
[4]  
[Anonymous], SC
[5]  
[Anonymous], 2010, HDF5 USER GUIDE
[6]  
[Anonymous], SCIDAC
[7]  
[Anonymous], C INN DAT SYST RES
[8]  
[Anonymous], 2008, PHYS PLASMAS
[9]  
[Anonymous], NETCDF US GUID
[10]  
Boncz PA., 2005, CIDR, V5, P225