Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

被引：0

作者：

Radovanovic, Milos ^{[1
]}

Nanopoulos, Alexandros ^{[2
]}

Ivanovic, Mirjana ^{[1
]}

机构：

[1] Univ Novi Sad, Dept Math & Informat, Novi Sad 21000, Serbia

[2] Univ Hildesheim, Inst Comp Sci, D-31141 Hildesheim, Germany

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2010年 / 11卷

关键词：

nearest neighbors; curse of dimensionality; classification; semi-supervised learning; clustering; POINT-PROCESSES; REDUCTION; ALGORITHMS;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Different aspects of the curse of dimensionality are known to present serious challenges to various machine-learning methods and tasks. This paper explores a new aspect of the dimensionality curse, referred to as hubness, that affects the distribution of k-occurrences: the number of times a point appears among the k nearest neighbors of other points in a data set. Through theoretical and empirical analysis involving synthetic and real data sets we show that under commonly used assumptions this distribution becomes considerably skewed as dimensionality increases, causing the emergence of hubs, that is, points with very high k-occurrences which effectively represent "popular" nearest neighbors. We examine the origins of this phenomenon, showing that it is an inherent property of data distributions in high-dimensional vector space, discuss its interaction with dimensionality reduction, and explore its influence on a wide range of machine-learning tasks directly or indirectly based on measuring distances, belonging to supervised, semi-supervised, and unsupervised learning families.

引用

页码：2487 / 2531

页数：45

共 50 条

[1] REPORTING NEIGHBORS IN HIGH-DIMENSIONAL EUCLIDEAN SPACE
Aiger, Dror
Kaplan, Haim
Sharir, Micha
SIAM JOURNAL ON COMPUTING, 2014, 43 (04) : 1363 - 1395
[2] On the Behavior of Intrinsically High-Dimensional Spaces: Distances, Direct and Reverse Nearest Neighbors, and Hubness
Angiulli, Fabrizio
JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 18
[3] Hybrid (CPU/GPU) Exact Nearest Neighbors Search in High-Dimensional Spaces
Muhr, David
Affenzeller, Michael
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2022, PART II, 2022, 647 : 112 - 123
[4] Fuzzy nearest neighbor clustering of high-dimensional data
Wang, HB
Yu, YQ
Zhou, DR
Meng, B
2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2569 - 2572
[5] An efficient nearest neighbor search in high-dimensional data spaces
Lee, DH
Kim, HJ
INFORMATION PROCESSING LETTERS, 2002, 81 (05) : 239 - 246
[6] On the Design and Applicability of Distance Functions in High-Dimensional Data Space
Hsu, Chih-Ming
Chen, Ming-Syan
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (04) : 523 - 536
[7] A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach
Chen, Hao
Xia, Yin
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (541) : 719 - 731
[8] Efficient and Accurate Nearest Neighbor and Closest Pair Search in High-Dimensional Space
Tao, Yufei
Yi, Ke
Sheng, Cheng
Kalnis, Panos
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2010, 35 (03):
[9] The Role of Hubness in Clustering High-Dimensional Data
Tomasev, Nenad
Radovanovic, Milos
Mladenic, Dunja
Ivanovic, Mirjana
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (03) : 739 - 751
[10] Fast geometrical extraction of nearest neighbors from multi-dimensional data
Aziz, Yasir
Memon, Kashif Hussain
PATTERN RECOGNITION, 2023, 136

← 1 2 3 4 5 →