Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

被引:0
|
作者
Radovanovic, Milos [1 ]
Nanopoulos, Alexandros [2 ]
Ivanovic, Mirjana [1 ]
机构
[1] Univ Novi Sad, Dept Math & Informat, Novi Sad 21000, Serbia
[2] Univ Hildesheim, Inst Comp Sci, D-31141 Hildesheim, Germany
关键词
nearest neighbors; curse of dimensionality; classification; semi-supervised learning; clustering; POINT-PROCESSES; REDUCTION; ALGORITHMS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Different aspects of the curse of dimensionality are known to present serious challenges to various machine-learning methods and tasks. This paper explores a new aspect of the dimensionality curse, referred to as hubness, that affects the distribution of k-occurrences: the number of times a point appears among the k nearest neighbors of other points in a data set. Through theoretical and empirical analysis involving synthetic and real data sets we show that under commonly used assumptions this distribution becomes considerably skewed as dimensionality increases, causing the emergence of hubs, that is, points with very high k-occurrences which effectively represent "popular" nearest neighbors. We examine the origins of this phenomenon, showing that it is an inherent property of data distributions in high-dimensional vector space, discuss its interaction with dimensionality reduction, and explore its influence on a wide range of machine-learning tasks directly or indirectly based on measuring distances, belonging to supervised, semi-supervised, and unsupervised learning families.
引用
收藏
页码:2487 / 2531
页数:45
相关论文
共 50 条
  • [11] SpaceMAP: Visualizing High-dimensional Data by Space Expansion
    Zu, Xinrui
    Tao, Qian
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [12] A depth-based nearest neighbor algorithm for high-dimensional data classification
    Harikumar S.
    Aravindakshan Savithri A.
    Kaimal R.
    Turkish Journal of Electrical Engineering and Computer Sciences, 2019, 27 (06): : 4082 - 4101
  • [13] Accelerating massive queries of approximate nearest neighbor search on high-dimensional data
    Liu, Yingfan
    Song, Chaowei
    Cheng, Hong
    Xia, Xiaofang
    Cui, Jiangtao
    KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (10) : 4185 - 4212
  • [14] A depth-based nearest neighbor algorithm for high-dimensional data classification
    Harikumar, Sandhya
    Aravindakshan Savithri, Akhil
    Kaimal, Ramachandra
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (06) : 4082 - 4101
  • [15] A hybrid feature selection scheme for high-dimensional data
    Ganjei, Mohammad Ahmadi
    Boostani, Reza
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 113
  • [16] Clustering Lines in High-Dimensional Space: Classification of Incomplete Data
    Gao, Jie
    Langberg, Michael
    Schulman, Leonard J.
    ACM TRANSACTIONS ON ALGORITHMS, 2010, 7 (01)
  • [17] Secure Cloud-Aided Approximate Nearest Neighbor Search on High-Dimensional Data
    Liu, Jia
    Wang, Yinchai
    Wei, Fengrui
    Han, Qing
    Tao, Yunting
    Zhao, Liping
    Li, Xinjin
    Sun, Hongbo
    IEEE ACCESS, 2023, 11 : 109027 - 109037
  • [18] A Sparse Reconstructive Evidential K-Nearest Neighbor Classifier for High-Dimensional Data
    Gong, Chaoyu
    Su, Zhi-Gang
    Wang, Pei-Hong
    Wang, Qian
    You, Yang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (06) : 5563 - 5576
  • [19] Sequential random k-nearest neighbor feature selection for high-dimensional data
    Park, Chan Hee
    Kim, Seoung Bum
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (05) : 2336 - 2342
  • [20] A filter feature selection for high-dimensional data
    Janane, Fatima Zahra
    Ouaderhman, Tayeb
    Chamlal, Hasna
    JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17