Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification

被引:0
|
作者
Deegalla, Sampath [1 ,2 ]
Bostrom, Henrik [2 ]
机构
[1] Stockholm Univ, Dept Comp & Syst Sci, Forum 100, SE-16440 Kista, Sweden
[2] Royal Inst Technol, SE-16440 Kista, Sweden
来源
ICMLA 2006: 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS | 2006年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The computational cost of using nearest neighbor classification often prevents the method from being applied in practice when dealing with high-dimensional data, such as images and micro arrays. One possible solution to this problem is to reduce the dimensionality of the data, ideally without loosing predictive performance. Two different dimensionality reduction methods, principle component analysis (PCA) and random projection (RP), are investigated for this purpose and compared w.r.t. the performance of the resulting nearest neighbor classifier on five image data sets and five micro array data sets. The experiment results demonstrate that PCA outperforms RP for all data sets used in this study. However the experiments also show that PCA is more sensitive to the choice of the number of reduced dimensions. After reaching a peak, the accuracy degrades with the number of dimensions for PCA, while the accuracy for RP increases with the number of dimensions. The experiments also show that the use of PCA and RP may even outperform using the non-reduced feature set (in 9 respectively 6 cases out of 10), hence not only resulting in more efficient, but also more effective, nearest neighbor classification.
引用
收藏
页码:245 / +
页数:3
相关论文
共 50 条
  • [31] Test for high-dimensional outliers with principal component analysis
    Nakayama, Yugo
    Yata, Kazuyoshi
    Aoshima, Makoto
    JAPANESE JOURNAL OF STATISTICS AND DATA SCIENCE, 2024, 7 (02) : 739 - 766
  • [32] Accelerating massive queries of approximate nearest neighbor search on high-dimensional data
    Liu, Yingfan
    Song, Chaowei
    Cheng, Hong
    Xia, Xiaofang
    Cui, Jiangtao
    KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (10) : 4185 - 4212
  • [33] Forecasting High-Dimensional Covariance Matrices Using High-Dimensional Principal Component Analysis
    Shigemoto, Hideto
    Morimoto, Takayuki
    AXIOMS, 2022, 11 (12)
  • [34] Accelerating massive queries of approximate nearest neighbor search on high-dimensional data
    Yingfan Liu
    Chaowei Song
    Hong Cheng
    Xiaofang Xia
    Jiangtao Cui
    Knowledge and Information Systems, 2023, 65 : 4185 - 4212
  • [35] High-dimensional covariance forecasting based on principal component analysis of high-frequency data
    Jian, Zhihong
    Deng, Pingjun
    Zhu, Zhican
    ECONOMIC MODELLING, 2018, 75 : 422 - 431
  • [36] The effect of principal component analysis on machine learning accuracy with high-dimensional spectral data
    Howley, Tom
    Madden, Michael G.
    O'Connell, Marie-Louise
    Ryder, Alan G.
    KNOWLEDGE-BASED SYSTEMS, 2006, 19 (05) : 363 - 370
  • [37] Evaluating the performance of sparse principal component analysis methods in high-dimensional data scenarios
    Bonner, Ashley J.
    Beyene, Joseph
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2017, 46 (05) : 3794 - 3811
  • [38] Secure Cloud-Aided Approximate Nearest Neighbor Search on High-Dimensional Data
    Liu, Jia
    Wang, Yinchai
    Wei, Fengrui
    Han, Qing
    Tao, Yunting
    Zhao, Liping
    Li, Xinjin
    Sun, Hongbo
    IEEE ACCESS, 2023, 11 : 109027 - 109037
  • [39] A Sparse Reconstructive Evidential K-Nearest Neighbor Classifier for High-Dimensional Data
    Gong, Chaoyu
    Su, Zhi-Gang
    Wang, Pei-Hong
    Wang, Qian
    You, Yang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (06) : 5563 - 5576
  • [40] A nearest neighbor search algorithm of high-dimensional data based on sequential NPsim matrix
    李文法
    Wang Gongming
    Ma Nan
    Liu Hongzhe
    High Technology Letters, 2016, 22 (03) : 241 - 247