CM-tree: A dynamic clustered index for similarity search in metric databases

被引:18
作者
Aronovich, Lior [1 ]
Spiegler, Israel [1 ]
机构
[1] Tel Aviv Univ, Dept Informat Syst, IL-69978 Tel Aviv, Israel
关键词
metric access methods; similarity search; metric spaces; database indexing; clustering methods;
D O I
10.1016/j.datak.2007.06.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Repositories of unstructured data types, such as free text, images, audio and video, have been recently emerging in various fields. A general searching approach for such data types is that of similarity search, where the search is for similar objects and similarity is modeled by a metric distance function. In this article we propose a new dynamic paged and balanced access method for similarity search in metric data sets, named CM-tree (Clustered Metric tree). It fully supports dynamic capabilities of insertions and deletions both of single objects and in bulk. Distinctive from other methods, it is especially designed to achieve a structure of tight and low overlapping clusters via its primary construction algorithms (instead of post-processing), yielding significantly improved performance. Several new methods are introduced to achieve this: a strategy for selecting representative objects of nodes, clustering based node split algorithm and criteria for triggering a node split, and an improved sub-tree pruning method used during search. To facilitate these methods the pairwise distances between the objects of a node are maintained within each node. Results from an extensive experimental study show that the CM-tree outperforms the M-tree and the Slim-tree, improving search performance by up to 312% for I/O costs and 303% for CPU costs. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:919 / 946
页数:28
相关论文
共 38 条
  • [1] [Anonymous], 2004, ANN INT WORKSH DAT T
  • [2] [Anonymous], P 21 INT C VER LARG
  • [3] ARONOVICH L, 2006, THESIS TEL AVIV U IN
  • [4] ATHITSOS V, 2004, P IEEE C COMP VIS PA
  • [5] Bayer R., 1972, Acta Informatica, V1, P173, DOI 10.1007/BF00288683
  • [6] BECKMANN N, 1990, SIGMOD REC, V19, P322, DOI 10.1145/93605.98741
  • [7] Berchtold S, 1996, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P28
  • [8] Indexing large metric spaces for similarity search queries
    Bozkaya, T
    Ozsoyoglu, M
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 1999, 24 (03): : 361 - 404
  • [9] BURKHARD WA, 1973, COMMUN ACM, V16, P230, DOI 10.1145/362003.362025
  • [10] Pivot selection techniques for proximity searching in metric spaces
    Bustos, B
    Navarro, G
    Chávez, E
    [J]. PATTERN RECOGNITION LETTERS, 2003, 24 (14) : 2357 - 2366