The effect of principal component analysis on machine learning accuracy with high dimensional spectral data

被引:70
|
作者
Howley, T [1 ]
Madden, MG [1 ]
O'Connell, ML [1 ]
Ryder, AG [1 ]
机构
[1] Natl Univ Ireland Univ Coll Galway, Galway, Ireland
关键词
D O I
10.1007/1-84628-224-1_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents the results of an investigation into the use of machine learning methods for the identification of narcotics from Raman spectra. The classification of spectral data and other high dimensional data, such as images, gene-expression data and spectral data, poses an interesting challenge to machine learning, as the presence of high numbers of redundant or highly correlated attributes can seriously degrade classification accuracy. This paper investigates the use of Principal Component Analysis (PCA) to reduce high dimensional spectral data and to improve the predictive performance of some well known machine learning methods. Experiments are carried out on a high dimensional spectral dataset. These experiments employ the NIPALS (Non-Linear Iterative Partial Least Squares) PCA method, a method that has been used in the field of chemometrics for spectral classification, and is a more efficient alternative than the widely used eigenvector decomposition approach. The experiments show that the use of this PCA method can improve the performance of machine learning in the classification of high dimensionsal data.
引用
收藏
页码:209 / +
页数:3
相关论文
共 50 条
  • [1] The effect of principal component analysis on machine learning accuracy with high-dimensional spectral data
    Howley, Tom
    Madden, Michael G.
    O'Connell, Marie-Louise
    Ryder, Alan G.
    KNOWLEDGE-BASED SYSTEMS, 2006, 19 (05) : 363 - 370
  • [2] High Dimensional Principal Component Analysis with Contaminated Data
    Xu, Huan
    Caramanis, Constantine
    Mannor, Shie
    ITW: 2009 IEEE INFORMATION THEORY WORKSHOP ON NETWORKING AND INFORMATION THEORY, 2009, : 246 - +
  • [3] Principal component analysis for sparse high-dimensional data
    Raiko, Tapani
    Ilin, Alexander
    Karhunen, Juha
    NEURAL INFORMATION PROCESSING, PART I, 2008, 4984 : 566 - 575
  • [4] Multilevel Functional Principal Component Analysis for High-Dimensional Data
    Zipunnikov, Vadim
    Caffo, Brian
    Yousem, David M.
    Davatzikos, Christos
    Schwartz, Brian S.
    Crainiceanu, Ciprian
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2011, 20 (04) : 852 - 873
  • [5] A Machine Learning Approach to Medical Data Identification Through Principal Component Analysis
    Jaques, Lorenzo E.
    Depoian, Arthur C., II
    Xie, Dong
    Bailey, Colleen P.
    Guturu, Parthasarathy
    BIG DATA III: LEARNING, ANALYTICS, AND APPLICATIONS, 2021, 11730
  • [6] Spectral principal component analysis of dynamic process data
    Thornhill, NF
    Shah, SL
    Huang, B
    Vishnubhotla, A
    CONTROL ENGINEERING PRACTICE, 2002, 10 (08) : 833 - 846
  • [7] Enhanced Application of Principal Component Analysis in Machine Learning for Imputation of Missing Traffic Data
    Choi, Yoon-Young
    Shon, Heeseung
    Byon, Young-Ji
    Kim, Dong-Kyu
    Kang, Seungmo
    APPLIED SCIENCES-BASEL, 2019, 9 (10):
  • [8] Principal component spectral analysis
    Guo, Hao
    Marfurt, Kurt J.
    Liu, Jianlei
    GEOPHYSICS, 2009, 74 (04) : P35 - P43
  • [9] Application of Principal Component Analysis to Lubricating Oil Spectral Data
    Tian, Hongxiang
    Liu, Tao
    2009 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT, INNOVATION MANAGEMENT AND INDUSTRIAL ENGINEERING, VOL 1, PROCEEDINGS, 2009, : 286 - 289
  • [10] Principal component analysis of spectral line data: analytic formulation
    Brunt, C. M.
    Heyer, M. H.
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2013, 433 (01) : 117 - 126