A depth-based nearest neighbor algorithm for high-dimensional data classification

被引:0
|
作者
Harikumar S. [1 ]
Aravindakshan Savithri A. [1 ]
Kaimal R. [1 ]
机构
[1] Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham, Amritapuri
来源
Turkish Journal of Electrical Engineering and Computer Sciences | 2019年 / 27卷 / 06期
关键词
Classification; Data-depth; Information gain; Nearest neighbor; Subspace-clustering;
D O I
10.3906/ELK-1807-163
中图分类号
学科分类号
摘要
Nearest neighbor algorithms like k-nearest neighbors (kNN) are fundamental supervised learning techniques to classify a query instance based on class labels of its neighbors. However, quite often, huge volumes of datasets are not fully labeled and the unknown probability distribution of the instances may be uneven. Moreover, kNN suffers from challenges like curse of dimensionality, setting the optimal number of neighbors, and scalability for high-dimensional data. To overcome these challenges, we propose an improvised approach of classification via depth representation of subspace clusters formed from high-dimensional data. We offer a consistent and principled approach to dynamically choose the nearest neighbors for classification of a query point by i) identifying structures and distributions of data; ii) extracting relevant features, and iii) deriving an optimum value of k depending on the structure of data by representing data using data depth function. We propose an improvised classification algorithm using a depth-based representation of clusters, to improve performance in terms of execution time and accuracy. Experimentation on real-world datasets reveals that proposed approach is at least two orders of magnitude faster for high-dimensional dataset and is at least as accurate as traditional kNN. © TÜBİTAK.
引用
收藏
页码:4082 / 4101
页数:19
相关论文
共 50 条
  • [1] A depth-based nearest neighbor algorithm for high-dimensional data classification
    Harikumar, Sandhya
    Aravindakshan Savithri, Akhil
    Kaimal, Ramachandra
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (06) : 4082 - 4101
  • [2] A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach
    Chen, Hao
    Xia, Yin
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (541) : 719 - 731
  • [3] Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification
    Nenad Tomašev
    Miloš Radovanović
    Dunja Mladenić
    Mirjana Ivanović
    International Journal of Machine Learning and Cybernetics, 2014, 5 : 445 - 458
  • [4] Hubness-aware shared neighbor distances for high-dimensional -nearest neighbor classification
    Tomasev, Nenad
    Mladenic, Dunja
    KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 39 (01) : 89 - 122
  • [5] Depth-based classification for functional data
    Lopez-Pintado, Sara
    Romo, Juan
    Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications, 2006, 72 : 103 - 119
  • [6] Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification
    Tomasev, Nenad
    Radovanovic, Milos
    Mladenic, Dunja
    Ivanovic, Mirjana
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2014, 5 (03) : 445 - 458
  • [7] A training algorithm for classification of high-dimensional data
    Vieira, A
    Barradas, N
    NEUROCOMPUTING, 2003, 50 : 461 - 472
  • [8] Depth-based classification for relational data with multiple attributes
    Zhang, Xu
    Tian, Yahui
    Guan, Guoyu
    Gel, Yulia R.
    JOURNAL OF MULTIVARIATE ANALYSIS, 2021, 184
  • [9] A Classification Method for Imbalanced Data Based on SMOTE and Fuzzy Rough Nearest Neighbor Algorithm
    Zhao, Weibin
    Xu, Mengting
    Jia, Xiuyi
    Shang, Lin
    ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, RSFDGRC 2015, 2015, 9437 : 340 - 351
  • [10] Nearest Neighbor-Based Classification of Uncertain Data
    Angiulli, Fabrizio
    Fassetti, Fabio
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2013, 7 (01)