Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification

被引:0
|
作者
Deegalla, Sampath [1 ,2 ]
Bostrom, Henrik [2 ]
机构
[1] Stockholm Univ, Dept Comp & Syst Sci, Forum 100, SE-16440 Kista, Sweden
[2] Royal Inst Technol, SE-16440 Kista, Sweden
来源
ICMLA 2006: 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS | 2006年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The computational cost of using nearest neighbor classification often prevents the method from being applied in practice when dealing with high-dimensional data, such as images and micro arrays. One possible solution to this problem is to reduce the dimensionality of the data, ideally without loosing predictive performance. Two different dimensionality reduction methods, principle component analysis (PCA) and random projection (RP), are investigated for this purpose and compared w.r.t. the performance of the resulting nearest neighbor classifier on five image data sets and five micro array data sets. The experiment results demonstrate that PCA outperforms RP for all data sets used in this study. However the experiments also show that PCA is more sensitive to the choice of the number of reduced dimensions. After reaching a peak, the accuracy degrades with the number of dimensions for PCA, while the accuracy for RP increases with the number of dimensions. The experiments also show that the use of PCA and RP may even outperform using the non-reduced feature set (in 9 respectively 6 cases out of 10), hence not only resulting in more efficient, but also more effective, nearest neighbor classification.
引用
收藏
页码:245 / +
页数:3
相关论文
共 50 条
  • [1] Random subspace and random projection nearest neighbor ensembles for high dimensional data
    Deegalla, Sampath
    Walgama, Keerthi
    Papapetrou, Panagiotis
    Bostrom, Henrik
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 191
  • [2] Redefining nearest neighbor classification in high-dimensional settings
    Lopez, Julio
    Maldonado, Sebastian
    PATTERN RECOGNITION LETTERS, 2018, 110 : 36 - 43
  • [3] A depth-based nearest neighbor algorithm for high-dimensional data classification
    Harikumar S.
    Aravindakshan Savithri A.
    Kaimal R.
    Turkish Journal of Electrical Engineering and Computer Sciences, 2019, 27 (06): : 4082 - 4101
  • [4] A depth-based nearest neighbor algorithm for high-dimensional data classification
    Harikumar, Sandhya
    Aravindakshan Savithri, Akhil
    Kaimal, Ramachandra
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (06) : 4082 - 4101
  • [5] Principal component analysis for sparse high-dimensional data
    Raiko, Tapani
    Ilin, Alexander
    Karhunen, Juha
    NEURAL INFORMATION PROCESSING, PART I, 2008, 4984 : 566 - 575
  • [6] Fuzzy nearest neighbor clustering of high-dimensional data
    Wang, HB
    Yu, YQ
    Zhou, DR
    Meng, B
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2569 - 2572
  • [7] Sequential random k-nearest neighbor feature selection for high-dimensional data
    Park, Chan Hee
    Kim, Seoung Bum
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (05) : 2336 - 2342
  • [8] Multilevel Functional Principal Component Analysis for High-Dimensional Data
    Zipunnikov, Vadim
    Caffo, Brian
    Yousem, David M.
    Davatzikos, Christos
    Schwartz, Brian S.
    Crainiceanu, Ciprian
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2011, 20 (04) : 852 - 873
  • [9] Nearest neighbor search on vertically partitioned high-dimensional data
    Dellis, E
    Seeger, B
    Vlachou, A
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2005, 3589 : 243 - 253
  • [10] An efficient nearest neighbor search in high-dimensional data spaces
    Lee, DH
    Kim, HJ
    INFORMATION PROCESSING LETTERS, 2002, 81 (05) : 239 - 246