Dimensional Testing for Multi-Step Similarity Search

被引:25
作者
Houle, Michael E. [1 ]
Ma, Xiguo [2 ]
Nett, Michael [1 ,3 ]
Oria, Vincent [2 ]
机构
[1] Natl Inst Informat, Tokyo 1018430, Japan
[2] New Jersey Inst Technol, Newark, NJ 07102 USA
[3] Univ Tokyo, Tokyo 1138656, Japan
来源
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012) | 2012年
关键词
Similarity search; k-NN; nearest neighbor; intrinsic dimensionality; multi-step; adaptive similarity;
D O I
10.1109/ICDM.2012.91
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In data mining applications such as subspace clustering or feature selection, changes to the underlying feature set can require the reconstruction of search indices to support fundamental data mining tasks. For such situations, multi-step search approaches have been proposed that can accommodate changes in the underlying similarity measure without the need to rebuild the index. In this paper, we present a heuristic multi-step search algorithm that utilizes a measure of intrinsic dimension, the generalized expansion dimension (GED), as the basis of its search termination condition. Compared to the current state-of-the-art method, experimental results show that our heuristic approach is able to obtain significant improvements in both the number of candidates and the running time, while losing very little in the accuracy of the query results.
引用
收藏
页码:299 / 308
页数:10
相关论文
共 23 条
[1]  
Andoni A., 2008, P SIGMOD 03 P INT C
[2]  
[Anonymous], 2007, Uci machine learning repository
[3]  
[Anonymous], P IEEE INT C COMP VI
[4]  
Assent I., 2006, P INT C DAT ENG
[5]  
Boujemaa N., 2001, P INT WORKSH MULT CO
[6]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[7]   Anomaly Detection: A Survey [J].
Chandola, Varun ;
Banerjee, Arindam ;
Kumar, Vipin .
ACM COMPUTING SURVEYS, 2009, 41 (03)
[8]   Optimal aggregation algorithms for middleware [J].
Fagin, R ;
Lotem, A ;
Naor, M .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) :614-656
[9]   The Amsterdam Library of Object Images [J].
Geusebroek, JM ;
Burghouts, GJ ;
Smeulders, AWM .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2005, 61 (01) :103-112
[10]   EFFICIENT COLOR HISTOGRAM INDEXING FOR QUADRATIC FORM DISTANCE FUNCTIONS [J].
HAFNER, J ;
SAWHNEY, HS ;
EQUITZ, W ;
FLICKNER, M ;
NIBLACK, W .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (07) :729-736