Efficient k-closest pair queries in general metric spaces

被引：0

作者：

Gao, Yunjun ^{[1
,2
]}

Chen, Lu ^{[1
]}

Li, Xinhan ^{[1
]}

Yao, Bin ^{[3
]}

Chen, Gang ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci, Hangzhou 310003, Zhejiang, Peoples R China

[2] Zhejiang Univ, Innovat Joint Res Ctr Cyber Phys Soc Syst, Hangzhou 310003, Zhejiang, Peoples R China

[3] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200030, Peoples R China

来源：

VLDB JOURNAL | 2015年 / 24卷 / 03期

关键词：

k-Closest pair query; Metric space; Query processing; Cost model; Algorithm; NEAREST-NEIGHBOR SEARCH; COST MODEL; DISTANCE; ALGORITHMS; JOINS;

D O I：

10.1007/s00778-015-0383-4

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Given two object sets and , a k-closest pair query finds closest object pairs from . This operation is common in many real-life applications such as GIS, data mining, and recommender systems. Although it has received much attention in the Euclidean space, there is little prior work on the metric space. In this paper, we study the problem of kCP query processing in general metric spaces, namely Metric kCP search, and propose several efficient algorithms using dynamic disk-based metric indexes (e.g., M-tree), which can be applied to arbitrary type of data as long as a certain metric distance is defined and satisfies the triangle inequality. Our approaches follow depth-first and/or best-first traversal paradigm(s), employ effective pruning rules based on metric space properties and the counting information preserved in the metric index, take advantage of aggressive pruning and compensation to further boost query efficiency, and derive a node-based cost model for retrieval. In addition, we extend our techniques to tackle two interesting variants of queries. Extensive experiments with both real and synthetic data sets demonstrate the performance of our proposed algorithms, the effectiveness of our developed pruning rules, and the accuracy of our presented cost model.

引用

页码：415 / 439

页数：25

共 50 条

[31] MSQL: efficient similarity search in metric spaces using SQL
Wei Lu
Jiajia Hou
Ying Yan
Meihui Zhang
Xiaoyong Du
Thomas Moscibroda
The VLDB Journal, 2017, 26 : 829 - 854
[32] Indexing Dense Nested Metric Spaces for Efficient Similarity Search
Brisaboa, Nieves R.
Luaces, Miguel R.
Pedreira, Oscar
Places, Angeles S.
Seco, Diego
PERSPECTIVES OF SYSTEMS INFORMATICS, 2010, 5947 : 98 - 109
[33] SEMI-NORMAL STRUCTURE AND BEST PROXIMITY PAIR RESULTS IN CONVEX METRIC SPACES
Gabeleh, Moosa
BANACH JOURNAL OF MATHEMATICAL ANALYSIS, 2014, 8 (02): : 214 - 228
[34] An efficient algorithm for approximated self-similarity joins in metric spaces
Ferrada, Sebastian
Bustos, Benjamin
Reyes, Nora
INFORMATION SYSTEMS, 2020, 91
[35] New distance lower bounds for efficient proximity searching in metric spaces
Ban, Tao
Kadobayashi, Youki
IMECS 2008: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2008, : 437 - +
[36] PSLSH: An Index Structure for Efficient Execution of Set Queries in High-Dimensional Spaces
Nagarkar, Parth
Candan, K. Selcuk
CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 477 - 486
[37] ON CORESETS FOR k-MEDIAN AND k-MEANS CLUSTERING IN METRIC AND EUCLIDEAN SPACES AND THEIR APPLICATIONS
Chen, Ke
SIAM JOURNAL ON COMPUTING, 2009, 39 (03) : 923 - 947
[38] Efficient Metric All-k-Nearest-Neighbor Search on Datasets Without Any Index
Hai-Da Zhang
Zhi-Hao Xing
Lu Chen
Yun-Jun Gao
Journal of Computer Science and Technology, 2016, 31 : 1194 - 1211
[39] Supporting Efficient Top-k Queries in Type-Ahead Search
Li, Guoliang
Wang, Jiannan
Li, Chen
Feng, Jianhua
SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 355 - 364
[40] Efficient Dual-Resolution Layer Indexing for Top-k Queries
Lee, Jongwuk
Cho, Hyunsouk
Hwang, Seung-won
2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 1084 - 1095

← 1 2 3 4 5 →