Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification

被引:0
作者
Deegalla, Sampath [1 ,2 ]
Bostrom, Henrik [2 ]
机构
[1] Stockholm Univ, Dept Comp & Syst Sci, Forum 100, SE-16440 Kista, Sweden
[2] Royal Inst Technol, SE-16440 Kista, Sweden
来源
ICMLA 2006: 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS | 2006年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The computational cost of using nearest neighbor classification often prevents the method from being applied in practice when dealing with high-dimensional data, such as images and micro arrays. One possible solution to this problem is to reduce the dimensionality of the data, ideally without loosing predictive performance. Two different dimensionality reduction methods, principle component analysis (PCA) and random projection (RP), are investigated for this purpose and compared w.r.t. the performance of the resulting nearest neighbor classifier on five image data sets and five micro array data sets. The experiment results demonstrate that PCA outperforms RP for all data sets used in this study. However the experiments also show that PCA is more sensitive to the choice of the number of reduced dimensions. After reaching a peak, the accuracy degrades with the number of dimensions for PCA, while the accuracy for RP increases with the number of dimensions. The experiments also show that the use of PCA and RP may even outperform using the non-reduced feature set (in 9 respectively 6 cases out of 10), hence not only resulting in more efficient, but also more effective, nearest neighbor classification.
引用
收藏
页码:245 / +
页数:3
相关论文
共 50 条
  • [21] An efficient secure k nearest neighbor classification protocol with high-dimensional features
    Sun, Maohua
    Yang, Ruidi
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2020, 35 (11) : 1791 - 1813
  • [22] Random projection ensemble classification with high-dimensional time series
    Zhang, Fuli
    Chan, Kung-Sik
    BIOMETRICS, 2023, 79 (02) : 964 - 974
  • [23] Random projection ensemble conformal prediction for high-dimensional classification
    Qian, Xiaoyu
    Wu, Jinru
    Wei, Ligong
    Lin, Youwu
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2024, 253
  • [24] Hubness-Aware Shared Neighbor Distances for High-Dimensional k-Nearest Neighbor Classification
    Tomasev, Nenad
    Mladenic, Dunja
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT II, 2012, 7209 : 116 - 127
  • [25] Cauchy robust principal component analysis with applications to high-dimensional data sets
    Fayomi, Aisha
    Pantazis, Yannis
    Tsagris, Michail
    Wood, Andrew T. A.
    STATISTICS AND COMPUTING, 2024, 34 (01)
  • [26] Exploring high-dimensional biological data with sparse contrastive principal component analysis
    Boileau, Philippe
    Hejazi, Nima S.
    Dudoit, Sandrine
    BIOINFORMATICS, 2020, 36 (11) : 3422 - 3430
  • [27] Adaptive local Principal Component Analysis improves the clustering of high-dimensional data
    Migenda, Nico
    Moeller, Ralf
    Schenck, Wolfram
    PATTERN RECOGNITION, 2024, 146
  • [28] Cauchy robust principal component analysis with applications to high-dimensional data sets
    Aisha Fayomi
    Yannis Pantazis
    Michail Tsagris
    Andrew T. A. Wood
    Statistics and Computing, 2024, 34
  • [29] High-dimensional principal component analysis with heterogeneous missingness
    Zhu, Ziwei
    Wang, Tengyao
    Samworth, Richard J.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2022, 84 (05) : 2000 - 2031
  • [30] Test for high-dimensional outliers with principal component analysis
    Nakayama, Yugo
    Yata, Kazuyoshi
    Aoshima, Makoto
    JAPANESE JOURNAL OF STATISTICS AND DATA SCIENCE, 2024, 7 (02) : 739 - 766