Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging

被引:119
作者
Dayan, Niv [1 ]
Idreos, Stratos [1 ]
机构
[1] Harvard Univ, Cambridge, MA 02138 USA
来源
SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2018年
基金
美国国家科学基金会;
关键词
D O I
10.1145/3183713.3196927
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We slim that all mainstream ISM-tree based key-value stores in the literature and in industry suboptimally trade between the I/O cost of updates on one hand and the I/O cost of lookups and storage space on the other. The reason is that they perform equally expensive merge operations across all levels of ISM-tree to bound the number of runs that a lookup has to probe and to remove obsolete entries to reclaim storage space. With state-of-the-art designs, however, merge operations from all levels of LSM-tree but the largest (i.e., most merge operations) reduce point lookup cost, long range lookup cost, and storage space by a negligible amount while significantly adding to the amortized cost of updates. To address this problem, we introduce Lazy Leveling, a new design that removes merge operations from all levels of LSM-tree but the largest. Lazy Leveling improves the worst-case complexity of update cost while maintaining the same bounds on point lookup cost, long range lookup cost, and storage space. We further introduce, Fluid ISM-tree, a generalization of the entire LSM-tree design space that can be parameterized to assume any existing design. Relative to Lazy Leveling, Fluid LSM-tree can optimize more for updates by merging less at the largest level, or it can optimize more for short range lookups by merging more at all other levels. We put everything together to design Dostoevsky, a key-value store that adaptively removes superfluous merging by navigating the Fluid LSM-tree design space based on the application workload and hardware. We implemented Dostoevsky on top of RocksDB, and we show that it strictly dominates state-of-the-art designs in terms of performance and storage space.
引用
收藏
页码:505 / 520
页数:16
相关论文
共 46 条
[1]   THE INPUT OUTPUT COMPLEXITY OF SORTING AND RELATED PROBLEMS [J].
AGGARWAL, A ;
VITTER, JS .
COMMUNICATIONS OF THE ACM, 1988, 31 (09) :1116-1127
[2]   Lazy-Adaptive Thee: An Optimized Index Structure for Flash Devices [J].
Agrawal, Devesh ;
Ganesan, Deepak ;
Sitaraman, Ramesh ;
Diao, Yanlei ;
Singh, Shashi .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (01) :361-372
[3]   Compaction management in distributed key-value datastores [J].
Ahmad, Muhammad Yousuf ;
Kemme, Bettina .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (08) :850-861
[4]  
Andersen DG, 2009, SOSP'09: PROCEEDINGS OF THE TWENTY-SECOND ACM SIGOPS SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, P1
[5]  
[Anonymous], 2013, ANN TECHN C ATC USEN
[6]  
[Anonymous], 2010, P 1 ACM S CLOUD COMP, DOI DOI 10.1145/1807128.1807152
[7]  
[Anonymous], 2013, P 2013 ACM SIGMOD IN, DOI DOI 10.1145/2463676.2465296
[8]  
[Anonymous], 2011, Airways (Pty) Ltd v Aviation Union of South Africa Others 2011 (3) SA 148 (SCA) paras 25-26, P25, DOI [10.1145/1989323.1989327, DOI 10.1145/1989323.1989327]
[9]   Online Updates on Data Warehouses via Judicious Use of Solid-State Storage [J].
Athanassoulis, Manos ;
Chen, Shimin ;
Ailamaki, Anastasia ;
Gibbons, Philip B. ;
Stoica, Radu .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2015, 40 (01)
[10]  
Athanassoulis Manos., 2011, SIGMOD C, P865