Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes

被引:36
|
作者
Kondylakis, Haridimos [1 ]
Dayan, Niv [2 ]
Zoumpatianos, Kostas [2 ]
Palpanas, Themis [3 ]
机构
[1] FORTH ICS, Iraklion, Greece
[2] Harvard Univ, Cambridge, MA 02138 USA
[3] Paris Descartes Univ, Paris, France
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2018年 / 11卷 / 06期
关键词
SEARCH;
D O I
10.14778/3184470.3184472
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many modern applications produce massive amounts of data series that need to be analyzed, requiring efficient similarity search operations. However, the state-of-the-art data series indexes that are used for this purpose do not scale well for massive datasets in terms of performance, or storage costs. We pinpoint the problem to the fact that existing summarizations of data series used for indexing cannot be sorted while keeping similar data series close to each other in the sorted order. This leads to two design problems. First, traditional bulk-loading algorithms based on sorting cannot be used. Instead, index construction takes place through slow top-down insertions, which create a non-contiguous index that results in many random I/Os. Second, data series cannot be sorted and split across nodes evenly based on their median value; thus, most leaf nodes are in practice nearly empty. This further slows down query speed and amplifies storage costs. To address these problems, we present Coconut. The first innovation in Coconut is an inverted, sortable data series summarization that organizes data series based on a z-order curve, keeping similar series close to each other in the sorted order. As a result, Coconut is able to use bulk-loading techniques that rely on sorting to quickly build a contiguous index using large sequential disk I/Os. We then explore prefix-based and median-based splitting policies for bottom-up bulk-loading, showing that median-based splitting outperforms the state of the art, ensuring that all nodes are densely populated. Overall, we show analytically and empirically that Coconut dominates the state-of-the-art data series indexes in terms of construction speed, query speed, and storage costs.
引用
收藏
页码:677 / 690
页数:14
相关论文
共 50 条
  • [41] Holographic bottom-up approach to Σ baryons
    Guo, Xi
    Contreras, Miguel Angel Martin
    Chen, Xun
    Xiang, Dong
    CHINESE PHYSICS C, 2025, 49 (01)
  • [42] A bottom-up approach to multimedia teachware
    Caumanns, J
    INTELLIGENT TUTORING SYSTEMS, 1998, 1452 : 116 - 125
  • [43] Facial fractures: The "bottom-up" approach
    Kochkine, Sergey
    Baxter, Alexander B.
    McMenamy, John M.
    Bernstein, Mark P.
    CLINICAL IMAGING, 2023, 101 : 167 - 179
  • [44] A bottom-up approach to gene regulation
    Nicholas J. Guido
    Xiao Wang
    David Adalsteinsson
    David McMillen
    Jeff Hasty
    Charles R. Cantor
    Timothy C. Elston
    J. J. Collins
    Nature, 2006, 439 : 856 - 860
  • [45] A bottom-up approach to cell mechanics
    A. R. Bausch
    K. Kroy
    Nature Physics, 2006, 2 : 231 - 238
  • [46] A Bottom-Up Approach to SUSY Analyses
    Horn, Claus
    SUSY09: THE 17TH INTERNATIONAL CONFERENCE ON SUPERSYMMETRY AND THE UNIFICATION OF FUNDAMENTAL INTERACTIONS, 2009, 1200 : 794 - 797
  • [47] A bottom-up approach to SUSY analyses
    Horn, Claus
    JOURNAL OF PHYSICS G-NUCLEAR AND PARTICLE PHYSICS, 2009, 36 (10)
  • [48] Bottom-up biclustering of expression data
    Bryan, Kenneth
    Cunningham, Padraig
    PROCEEDINGS OF THE 2006 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2006, : 232 - 239
  • [49] Design of a construction management data visualization environment: A bottom-up approach
    Chiu, Chao-Ying
    Russell, Alan D.
    AUTOMATION IN CONSTRUCTION, 2013, 35 : 353 - 373
  • [50] A bottom-up approach for forecasting GDP in a data-rich environment
    Dias, Francisco
    Pinheiro, Maximiano
    Rua, Antonio
    APPLIED ECONOMICS LETTERS, 2018, 25 (10) : 718 - 723