Fast data series indexing for in-memory data

被引:15
作者
Peng, Botao [1 ]
Fatourou, Panagiota [2 ]
Palpanas, Themis [3 ,4 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[2] FORTH ICS, Iraklion, Greece
[3] Univ Paris, LIPADE, Paris, France
[4] French Univ Inst IUF, Paris, France
关键词
Data series; Indexing; Modern hardware; SIMILARITY SEARCH; TIME;
D O I
10.1007/s00778-021-00677-2
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive exploration, or analysis of large data series collections. In this work, we propose MESSI, the first data series index designed for in-memory operation on modern hardware. Our index takes advantage of the modern hardware parallelization opportunities (i.e., SIMD instructions, multi-socket and multi-core architectures), in order to accelerate both index construction and similarity search processing times. Moreover, it benefits from a careful design in the setup and coordination of the parallel workers and data structures, so that it maximizes its performance for in-memory operations. MESSI supports similarity search using both the Euclidean and dynamic time warping (DTW) distances. Our experiments with synthetic and real datasets demonstrate that overall MESSI is up to 4x faster at index construction and up to 11x faster at query answering than the state-of-the-art parallel approach. MESSI is the first to answer exact similarity search queries on 100GB datasets in similar to 50 ms (30-75 ms across diverse datasets), which enables real-time, interactive data exploration on very large data series collections.
引用
收藏
页码:1041 / 1067
页数:27
相关论文
共 50 条
[31]   Uniform indexing for geospatial data and authorizations [J].
Atluri, V ;
Mazzoleni, P .
RESEARCH DIRECTIONS IN DATA AND APPLICATIONS SECURITY, 2003, 128 :207-218
[32]   A Hybrid B plus -tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing Platforms [J].
Shahvarani, Amirhesam ;
Jacobsen, Hans-Arno .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :1523-1538
[33]   Parallel and Distributed Data Series Processing on Modern and Emerging Hardware [J].
Fatourou, Panagiota .
MANAGEMENT OF DIGITAL ECOSYSTEMS, MEDES 2023, 2024, 2022 :399-407
[34]   Performance analysis for similarity data fusion model for enabling time series indexing in internet of things applications [J].
Younan, Mina ;
Houssein, Essam H. ;
Elhoseny, Mohamed ;
Ali, Abd El-mageid .
PEERJ COMPUTER SCIENCE, 2021,
[35]   Performance analysis for similarity data fusion model for enabling time series indexing in internet of things applications [J].
Younan M. ;
Houssein E.H. ;
Elhoseny M. ;
Ali A.E.-M. .
PeerJ Computer Science, 2021, 7 :1-18
[36]   Big IoT Data Indexing: Architecture, Techniques and Open Research Challenges [J].
Ferrag, Mohamed Amine ;
Kouahla, Zineddine ;
Seridi, Hamid ;
Kurulay, Muhammet .
2019 4TH INTERNATIONAL CONFERENCE ON NETWORKING AND ADVANCED SYSTEMS (ICNAS 2019), 2019, :140-145
[37]   Indexing Uncertain Data in General Metric Spaces [J].
Angiulli, Fabrizio ;
Fassetti, Fabio .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (09) :1640-1657
[38]   On the analysis of big data indexing execution strategies [J].
Siddiqa, Aisha ;
Karim, Ahmad ;
Saba, Tanzila ;
Chang, Victor .
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2017, 32 (05) :3259-3271
[39]   Spatial Data Indexing and Query Processing in GeoCloud [J].
Shankar, Karthi ;
Sevugan, Prabu .
JOURNAL OF TESTING AND EVALUATION, 2019, 47 (06) :4039-4053
[40]   An indexing method for wireless broadcast XML data [J].
Chung, Yon Dohn ;
Lee, Ji Yeon .
INFORMATION SCIENCES, 2007, 177 (09) :1931-1953