Good to the Last Bit: Data-Driven Encoding with CodecDB

被引:23
|
作者
Jiang, Hao [1 ]
Liu, Chunwei [1 ]
Paparrizos, John [1 ]
Chien, Andrew A. [1 ]
Ma, Jihong [2 ]
Elmore, Aaron J. [1 ]
机构
[1] Univ Chicago, Chicago, IL 60637 USA
[2] Alibaba, Hangzhou, Peoples R China
基金
美国国家科学基金会;
关键词
COMPRESSION; ALGORITHM;
D O I
10.1145/3448016.3457283
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Columnar databases rely on specialized encoding schemes to reduce storage requirements. These encodings also enable efficient in-situ data processing. Nevertheless, many existing columnar databases are encoding-oblivious. When storing the data, these systems rely on a global understanding of the dataset or the data types to derive simple rules for encoding selection. Such rule-based selection leads to unsatisfactory performance. Specifically, when performing queries, the systems always decode data into memory, ignoring the possibility of optimizing access to encoded data. We develop CodecDB, an encoding-aware columnar database, to demonstrate the benefit of tightly-coupling the database design with the data encoding schemes. CodecDB chooses in a principled manner the most efficient encoding for a given data column and relies on encoding-aware query operators to optimize access to encoded data. Storage-wise, CodecDB achieves on average 90% accuracy for selecting the best encoding and improves the compression ratio by up to 40% compared to the state-of-the-art encoding selection solution. Query-wise, CodecDB is on average one order of magnitude faster than the latest open-source and commercial columnar databases on the TPC-H benchmark, and on average 3x faster than a recent research project on the Star-Schema Benchmark (SSB).
引用
收藏
页码:843 / 856
页数:14
相关论文
共 50 条
  • [41] Data-driven Geodynamics
    Alik Ismail-Zadeh
    Journal of the Geological Society of India, 2021, 97 : 223 - 226
  • [42] DATA-DRIVEN POWER
    Higginbotham, Stacey
    IEEE SPECTRUM, 2020, 57 (08) : 20 - 20
  • [43] Data-Driven Mergers
    de Corniere, Alexandre
    Taylor, Greg
    INFORMATION SYSTEMS RESEARCH, 2024,
  • [44] Data-driven policy
    Lansky, David
    ISSUES IN SCIENCE AND TECHNOLOGY, 2007, 24 (01) : 11 - 14
  • [45] Data-driven solutions
    Voskresenskii, Boris
    Kingsep, Kseniia
    Steel Times International, 2023, 47 (02): : 21 - 25
  • [46] Data-Driven Immunization
    Zhang, Yao
    Ramanathan, Arvind
    Vullikanti, Anil
    Pullum, Laura
    Prakash, B. Aditya
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 615 - 624
  • [47] Data-Driven Objectness
    Kang, Hongwen
    Hebert, Martial
    Efros, Alexei A.
    Kanade, Takeo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (01) : 189 - 195
  • [48] Data-driven grasping
    Goldfeder, Corey
    Allen, Peter K.
    AUTONOMOUS ROBOTS, 2011, 31 (01) : 1 - 20
  • [49] Data-driven grasping
    Corey Goldfeder
    Peter K. Allen
    Autonomous Robots, 2011, 31 : 1 - 20
  • [50] THE DATA-DRIVEN MICROPROCESSOR
    KOMORI, S
    SHIMA, K
    MIYATA, S
    OKAMOTO, T
    TERADA, H
    IEEE MICRO, 1989, 9 (03) : 45 - 59