Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification

被引：0

作者：

Deegalla, Sampath ^{[1
,2
]}

Bostrom, Henrik ^{[2
]}

机构：

[1] Stockholm Univ, Dept Comp & Syst Sci, Forum 100, SE-16440 Kista, Sweden

[2] Royal Inst Technol, SE-16440 Kista, Sweden

来源：

ICMLA 2006: 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS | 2006年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The computational cost of using nearest neighbor classification often prevents the method from being applied in practice when dealing with high-dimensional data, such as images and micro arrays. One possible solution to this problem is to reduce the dimensionality of the data, ideally without loosing predictive performance. Two different dimensionality reduction methods, principle component analysis (PCA) and random projection (RP), are investigated for this purpose and compared w.r.t. the performance of the resulting nearest neighbor classifier on five image data sets and five micro array data sets. The experiment results demonstrate that PCA outperforms RP for all data sets used in this study. However the experiments also show that PCA is more sensitive to the choice of the number of reduced dimensions. After reaching a peak, the accuracy degrades with the number of dimensions for PCA, while the accuracy for RP increases with the number of dimensions. The experiments also show that the use of PCA and RP may even outperform using the non-reduced feature set (in 9 respectively 6 cases out of 10), hence not only resulting in more efficient, but also more effective, nearest neighbor classification.

引用

页码：245 / +

页数：3

共 50 条

[1] Random subspace and random projection nearest neighbor ensembles for high dimensional data
Deegalla, Sampath
Walgama, Keerthi
Papapetrou, Panagiotis
Bostrom, Henrik
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 191
[2] Redefining nearest neighbor classification in high-dimensional settings
Lopez, Julio
Maldonado, Sebastian
PATTERN RECOGNITION LETTERS, 2018, 110 : 36 - 43
[3] A depth-based nearest neighbor algorithm for high-dimensional data classification
Harikumar S.
Aravindakshan Savithri A.
Kaimal R.
Turkish Journal of Electrical Engineering and Computer Sciences, 2019, 27 (06): : 4082 - 4101
[4] A depth-based nearest neighbor algorithm for high-dimensional data classification
Harikumar, Sandhya
Aravindakshan Savithri, Akhil
Kaimal, Ramachandra
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (06) : 4082 - 4101
[5] Principal component analysis for sparse high-dimensional data
Raiko, Tapani
Ilin, Alexander
Karhunen, Juha
NEURAL INFORMATION PROCESSING, PART I, 2008, 4984 : 566 - 575
[6] Fuzzy nearest neighbor clustering of high-dimensional data
Wang, HB
Yu, YQ
Zhou, DR
Meng, B
2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2569 - 2572
[7] Sequential random k-nearest neighbor feature selection for high-dimensional data
Park, Chan Hee
Kim, Seoung Bum
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (05) : 2336 - 2342
[8] Multilevel Functional Principal Component Analysis for High-Dimensional Data
Zipunnikov, Vadim
Caffo, Brian
Yousem, David M.
Davatzikos, Christos
Schwartz, Brian S.
Crainiceanu, Ciprian
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2011, 20 (04) : 852 - 873
[9] Nearest neighbor search on vertically partitioned high-dimensional data
Dellis, E
Seeger, B
Vlachou, A
DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2005, 3589 : 243 - 253
[10] An efficient nearest neighbor search in high-dimensional data spaces
Lee, DH
Kim, HJ
INFORMATION PROCESSING LETTERS, 2002, 81 (05) : 239 - 246

← 1 2 3 4 5 →