A GPU-accelerated highly compact and encoding based database system

被引:0
|
作者
Luo, Xinyuan [1 ]
Chen, Gang [1 ]
Wu, Sai [1 ]
机构
[1] College of Computer Science, Zhejiang University, Hangzhou
来源
Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2015年 / 52卷 / 02期
关键词
CUDA; Database system; Encoding; GPU; Hybrid row-column storage; Rule mining;
D O I
10.7544/issn1000-1239.2015.20140254
中图分类号
学科分类号
摘要
In the big data era, business applications generate huge volumes of data, making it extremely challenging to store and manage those data. One possible solution adopted in previous database systems is to employ some types of encoding techniques, which can effectively reduce the size of data and consequential improve the query performance. However, existing encoding approaches still cannot make a good tradeoff between the compression ratio, importing time and query performance. In this paper, to address the problem, we propose a new encoding-based database system, HEGA-STORE, which adopts the hybrid row-oriented and column-oriented storage model. In HEGA-STORE, we design a GPU-assistant encoding scheme by combining the rule-based encoding and conventional compression algorithms. By exploiting the computation power of GPU, we efficiently improve the performance of encoding and decoding algorithms. To evaluate the performance of HEGA-STORE, it is deployed in Netease to support log analysis. We compare HEGA-STORE with other database systems and the results show that HEGA-STORE can provide better performance for data import and query processing. It is a much compact encoding database for big data applications. ©, 2015, Science Press. All right reserved.
引用
收藏
页码:362 / 376
页数:14
相关论文
共 33 条
  • [1] Li J., Gao H., Luo J., Et al., InfiniteDB: A PC-cluster based parallel massive database management system, Proc of the 2007 ACM SIGMOD Int Conf on Management of Data, pp. 899-909, (2007)
  • [2] Gemawat S., Gobioff H., Shun-Tak L., The Google file system, Proc of the 19th ACM Symp on Operating Systems Principles, pp. 29-43, (2003)
  • [3] Chang F., Dean J., Ghemawat S., Et al., Bigtable: A distributed storage system for structured data, Proc of the 7th Symp on Operating System Design and Implementation, pp. 205-218, (2006)
  • [4] Dean J., Ghemawat S., MapReduce: Simplified data processing on large clusters, Proc of the 6th Symp on Operating System Design and Implementation, pp. 10-23, (2004)
  • [5] Isard M., Yu Y., Birrell A., Et al., Dryad: Distributed data-parallel Programs from Sequential Building Blocks, (2006)
  • [6] Meikel P., Dmitry P., Data compression in oracle, Proc of the 29th Int Conf on Very Large Data Bases, pp. 937-947, (2003)
  • [7] Westmann T., Kossmann D., The Implementation and Performance of Compressed Database, Proc of the 2000 ACM SIGMOD Int Conf on Management of Data, pp. 55-67, (2000)
  • [8] Macnicol R., French B., Sybase IQ multiplex-designed for analytics, Proc of the 30th Int Conf on Very Large Data Bases, pp. 1227-1230, (2004)
  • [9] Iyer B.R., David W., Data compression support in databases, Proc of the 20th Int Conf on Very Large Data Bases, pp. 695-704, (1994)
  • [10] Paolo B., Rama N., DB2 for OS/390 and data compression