Fast data series indexing for in-memory data

被引:15
作者
Peng, Botao [1 ]
Fatourou, Panagiota [2 ]
Palpanas, Themis [3 ,4 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[2] FORTH ICS, Iraklion, Greece
[3] Univ Paris, LIPADE, Paris, France
[4] French Univ Inst IUF, Paris, France
关键词
Data series; Indexing; Modern hardware; SIMILARITY SEARCH; TIME;
D O I
10.1007/s00778-021-00677-2
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive exploration, or analysis of large data series collections. In this work, we propose MESSI, the first data series index designed for in-memory operation on modern hardware. Our index takes advantage of the modern hardware parallelization opportunities (i.e., SIMD instructions, multi-socket and multi-core architectures), in order to accelerate both index construction and similarity search processing times. Moreover, it benefits from a careful design in the setup and coordination of the parallel workers and data structures, so that it maximizes its performance for in-memory operations. MESSI supports similarity search using both the Euclidean and dynamic time warping (DTW) distances. Our experiments with synthetic and real datasets demonstrate that overall MESSI is up to 4x faster at index construction and up to 11x faster at query answering than the state-of-the-art parallel approach. MESSI is the first to answer exact similarity search queries on 100GB datasets in similar to 50 ms (30-75 ms across diverse datasets), which enables real-time, interactive data exploration on very large data series collections.
引用
收藏
页码:1041 / 1067
页数:27
相关论文
共 50 条
  • [21] FAST SIMILARITY SEARCH ON A LARGE SPEECH DATA SET WITH NEIGHBORHOOD GRAPH INDEXING
    Aoyama, Kazuo
    Watanabe, Shinji
    Sawada, Hiroshi
    Minami, Yasuhiro
    Ueda, Naonori
    Saito, Kazumi
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5358 - 5361
  • [22] Query Workloads for Data Series Indexes
    Zoumpatianos, Kostas
    Lou, Yin
    Palpanas, Themis
    Gehrke, Johannes
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 1603 - 1612
  • [23] Slicing the metric space to provide quick indexing of complex data in the main memory
    Mori Carelo, Caio Cesar
    Venturini Pola, Ives Rene
    Ciferri, Ricardo Rodrigues
    Machado Traina, Agma Juci
    Traina, Caetano, Jr.
    de Aguiar Ciferri, Cristina Dutra
    INFORMATION SYSTEMS, 2011, 36 (01) : 79 - 98
  • [24] Attributed Data for CHR Indexing
    Starosta, Beata Sarna
    Schrijvers, Tom
    LOGIC PROGRAMMING, 2009, 5649 : 357 - +
  • [25] Self-selective memristor-enabled in-memory search for highly efficient data mining
    Yang, Ling
    Huang, Xiaodi
    Li, Yi
    Zhou, Houji
    Yu, Yingjie
    Bao, Han
    Li, Jiancong
    Ren, Shengguang
    Wang, Feng
    Ye, Lei
    He, Yuhui
    Chen, Jia
    Pu, Guiyou
    Li, Xiang
    Miao, Xiangshui
    INFOMAT, 2023, 5 (05)
  • [26] Deep Learning Embeddings for Data Series Similarity Search
    Wang, Qitong
    Palpanas, Themis
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 1708 - 1716
  • [27] Linked Data Indexing Methods: A Survey
    Svoboda, Martin
    Mlynkova, Irena
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2011 WORKSHOPS, 2011, 7046 : 474 - 483
  • [28] Uniform indexing for geospatial data and authorizations
    Atluri, V
    Mazzoleni, P
    RESEARCH DIRECTIONS IN DATA AND APPLICATIONS SECURITY, 2003, 128 : 207 - 218
  • [29] A survey of image data indexing techniques
    Saurabh Sharma
    Vishal Gupta
    Mamta Juneja
    Artificial Intelligence Review, 2019, 52 : 1189 - 1266
  • [30] Analysis of Indexing Structures for Immutable Data
    Yue, Cong
    Xie, Zhongle
    Zhang, Meihui
    Chen, Gang
    Ooi, Beng Chin
    Wang, Sheng
    Xiao, Xiaokui
    SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 925 - 935