Fast data series indexing for in-memory data

被引:15
作者
Peng, Botao [1 ]
Fatourou, Panagiota [2 ]
Palpanas, Themis [3 ,4 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[2] FORTH ICS, Iraklion, Greece
[3] Univ Paris, LIPADE, Paris, France
[4] French Univ Inst IUF, Paris, France
关键词
Data series; Indexing; Modern hardware; SIMILARITY SEARCH; TIME;
D O I
10.1007/s00778-021-00677-2
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive exploration, or analysis of large data series collections. In this work, we propose MESSI, the first data series index designed for in-memory operation on modern hardware. Our index takes advantage of the modern hardware parallelization opportunities (i.e., SIMD instructions, multi-socket and multi-core architectures), in order to accelerate both index construction and similarity search processing times. Moreover, it benefits from a careful design in the setup and coordination of the parallel workers and data structures, so that it maximizes its performance for in-memory operations. MESSI supports similarity search using both the Euclidean and dynamic time warping (DTW) distances. Our experiments with synthetic and real datasets demonstrate that overall MESSI is up to 4x faster at index construction and up to 11x faster at query answering than the state-of-the-art parallel approach. MESSI is the first to answer exact similarity search queries on 100GB datasets in similar to 50 ms (30-75 ms across diverse datasets), which enables real-time, interactive data exploration on very large data series collections.
引用
收藏
页码:1041 / 1067
页数:27
相关论文
共 50 条
[21]   Query Workloads for Data Series Indexes [J].
Zoumpatianos, Kostas ;
Lou, Yin ;
Palpanas, Themis ;
Gehrke, Johannes .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :1603-1612
[22]   Generating data series query workloads [J].
Kostas Zoumpatianos ;
Yin Lou ;
Ioana Ileana ;
Themis Palpanas ;
Johannes Gehrke .
The VLDB Journal, 2018, 27 :823-846
[23]   Slicing the metric space to provide quick indexing of complex data in the main memory [J].
Mori Carelo, Caio Cesar ;
Venturini Pola, Ives Rene ;
Ciferri, Ricardo Rodrigues ;
Machado Traina, Agma Juci ;
Traina, Caetano, Jr. ;
de Aguiar Ciferri, Cristina Dutra .
INFORMATION SYSTEMS, 2011, 36 (01) :79-98
[24]   Attributed Data for CHR Indexing [J].
Starosta, Beata Sarna ;
Schrijvers, Tom .
LOGIC PROGRAMMING, 2009, 5649 :357-+
[25]   Self-selective memristor-enabled in-memory search for highly efficient data mining [J].
Yang, Ling ;
Huang, Xiaodi ;
Li, Yi ;
Zhou, Houji ;
Yu, Yingjie ;
Bao, Han ;
Li, Jiancong ;
Ren, Shengguang ;
Wang, Feng ;
Ye, Lei ;
He, Yuhui ;
Chen, Jia ;
Pu, Guiyou ;
Li, Xiang ;
Miao, Xiangshui .
INFOMAT, 2023, 5 (05)
[26]   Automata Approach to XML Data Indexing [J].
Sestakova, Eliska ;
Janousek, Jan .
INFORMATION, 2018, 9 (01)
[27]   Deep Learning Embeddings for Data Series Similarity Search [J].
Wang, Qitong ;
Palpanas, Themis .
KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, :1708-1716
[28]   A survey of image data indexing techniques [J].
Saurabh Sharma ;
Vishal Gupta ;
Mamta Juneja .
Artificial Intelligence Review, 2019, 52 :1189-1266
[29]   Analysis of Indexing Structures for Immutable Data [J].
Yue, Cong ;
Xie, Zhongle ;
Zhang, Meihui ;
Chen, Gang ;
Ooi, Beng Chin ;
Wang, Sheng ;
Xiao, Xiaokui .
SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, :925-935
[30]   Linked Data Indexing Methods: A Survey [J].
Svoboda, Martin ;
Mlynkova, Irena .
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2011 WORKSHOPS, 2011, 7046 :474-483