Efficient k-closest pair queries in general metric spaces

被引:0
|
作者
Gao, Yunjun [1 ,2 ]
Chen, Lu [1 ]
Li, Xinhan [1 ]
Yao, Bin [3 ]
Chen, Gang [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, Hangzhou 310003, Zhejiang, Peoples R China
[2] Zhejiang Univ, Innovat Joint Res Ctr Cyber Phys Soc Syst, Hangzhou 310003, Zhejiang, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200030, Peoples R China
关键词
k-Closest pair query; Metric space; Query processing; Cost model; Algorithm; NEAREST-NEIGHBOR SEARCH; COST MODEL; DISTANCE; ALGORITHMS; JOINS;
D O I
10.1007/s00778-015-0383-4
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Given two object sets and , a k-closest pair query finds closest object pairs from . This operation is common in many real-life applications such as GIS, data mining, and recommender systems. Although it has received much attention in the Euclidean space, there is little prior work on the metric space. In this paper, we study the problem of kCP query processing in general metric spaces, namely Metric kCP search, and propose several efficient algorithms using dynamic disk-based metric indexes (e.g., M-tree), which can be applied to arbitrary type of data as long as a certain metric distance is defined and satisfies the triangle inequality. Our approaches follow depth-first and/or best-first traversal paradigm(s), employ effective pruning rules based on metric space properties and the counting information preserved in the metric index, take advantage of aggressive pruning and compensation to further boost query efficiency, and derive a node-based cost model for retrieval. In addition, we extend our techniques to tackle two interesting variants of queries. Extensive experiments with both real and synthetic data sets demonstrate the performance of our proposed algorithms, the effectiveness of our developed pruning rules, and the accuracy of our presented cost model.
引用
收藏
页码:415 / 439
页数:25
相关论文
共 50 条
  • [31] MSQL: efficient similarity search in metric spaces using SQL
    Wei Lu
    Jiajia Hou
    Ying Yan
    Meihui Zhang
    Xiaoyong Du
    Thomas Moscibroda
    The VLDB Journal, 2017, 26 : 829 - 854
  • [32] Indexing Dense Nested Metric Spaces for Efficient Similarity Search
    Brisaboa, Nieves R.
    Luaces, Miguel R.
    Pedreira, Oscar
    Places, Angeles S.
    Seco, Diego
    PERSPECTIVES OF SYSTEMS INFORMATICS, 2010, 5947 : 98 - 109
  • [33] SEMI-NORMAL STRUCTURE AND BEST PROXIMITY PAIR RESULTS IN CONVEX METRIC SPACES
    Gabeleh, Moosa
    BANACH JOURNAL OF MATHEMATICAL ANALYSIS, 2014, 8 (02): : 214 - 228
  • [34] An efficient algorithm for approximated self-similarity joins in metric spaces
    Ferrada, Sebastian
    Bustos, Benjamin
    Reyes, Nora
    INFORMATION SYSTEMS, 2020, 91
  • [35] New distance lower bounds for efficient proximity searching in metric spaces
    Ban, Tao
    Kadobayashi, Youki
    IMECS 2008: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2008, : 437 - +
  • [36] PSLSH: An Index Structure for Efficient Execution of Set Queries in High-Dimensional Spaces
    Nagarkar, Parth
    Candan, K. Selcuk
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 477 - 486
  • [37] ON CORESETS FOR k-MEDIAN AND k-MEANS CLUSTERING IN METRIC AND EUCLIDEAN SPACES AND THEIR APPLICATIONS
    Chen, Ke
    SIAM JOURNAL ON COMPUTING, 2009, 39 (03) : 923 - 947
  • [38] Efficient Metric All-k-Nearest-Neighbor Search on Datasets Without Any Index
    Hai-Da Zhang
    Zhi-Hao Xing
    Lu Chen
    Yun-Jun Gao
    Journal of Computer Science and Technology, 2016, 31 : 1194 - 1211
  • [39] Supporting Efficient Top-k Queries in Type-Ahead Search
    Li, Guoliang
    Wang, Jiannan
    Li, Chen
    Feng, Jianhua
    SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 355 - 364
  • [40] Efficient Dual-Resolution Layer Indexing for Top-k Queries
    Lee, Jongwuk
    Cho, Hyunsouk
    Hwang, Seung-won
    2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 1084 - 1095