DIMS: Distributed Index for Similarity Search in Metric Spaces

被引:0
作者
Zhu, Yifan [1 ]
Luo, Chengyang [1 ]
Qian, Tang [1 ]
Chen, Lu [1 ]
Gao, Yunjun [2 ]
Zheng, Baihua [3 ]
机构
[1] Zhejiang Univ, Hangzhou 310027, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci, Hangzhou 310027, Peoples R China
[3] Singapore Management Univ, Singapore 188065, Singapore
关键词
Extraterrestrial measurements; Distributed databases; Search problems; Indexing; Costs; Trajectory; Filtering; Data models; Vectors; Optimization; Similarity search; metric space; distributed index; homogeneous and heterogeneous partition; ALGORITHM;
D O I
10.1109/TKDE.2024.3487759
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Similarity search finds objects that are similar to a given query object based on a similarity metric. As the amount and variety of data continue to grow, similarity search in metric spaces has gained significant attention. Metric spaces can accommodate any type of data and support flexible distance metrics, making similarity search in metric spaces beneficial for many real-world applications, such as multimedia retrieval, personalized recommendation, trajectory analytics, data mining, decision planning, and distributed servers. However, existing studies mostly focus on indexing metric spaces on a single machine, which faces efficiency and scalability limitations with increasing data volume and query amount. Recent advancements in similarity search turn towards distributed methods, while they face challenges including inefficient local data management, unbalanced workload, and low concurrent search efficiency. To this end, we propose DIMS, an efficient Distributed Index for similarity search in Metric Spaces. First, we design a novel three-stage heterogeneous partition to achieve workload balance. Then, we present an effective three-stage indexing structure to efficiently manage objects. We also develop concurrent search methods with filtering and validation techniques that support efficient distributed similarity search. Additionally, we devise a cost-based optimization model to balance communication and computation cost. Extensive experiments demonstrate that DIMS significantly outperforms existing distributed similarity search approaches.
引用
收藏
页码:210 / 225
页数:16
相关论文
共 56 条
  • [1] [Anonymous], 2009, Content -based photo image retrieval test -collection
  • [2] [Anonymous], 2024, Source code of distributed index for similarity search in metric spaces
  • [3] [Anonymous], 2024, The size of the world wide web
  • [4] [Anonymous], 2024, Start with Google search
  • [5] A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks?
    Antonakaki, Despoina
    Fragopoulou, Paraskevi
    Ioannidis, Sotiris
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 164
  • [6] Batko M, 2005, LECT NOTES COMPUT SC, V3664, P25
  • [7] Automatic political discourse analysis with multi-scale convolutional neural networks and contextual data
    Bilbao-Jayo, Aritz
    Almeida, Aitor
    [J]. INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2018, 14 (11):
  • [8] Indexing large metric spaces for similarity search queries
    Bozkaya, T
    Ozsoyoglu, M
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 1999, 24 (03): : 361 - 404
  • [9] Bozkaya T., 1997, SIGMOD Record, V26, P357, DOI 10.1145/253262.253345
  • [10] Brin S., 1995, VLDB '95. Proceedings of the 21st International Conference on Very Large Data Bases, P574