Training Data Augmentation with Data Distilled by Principal Component Analysis

被引:0
|
作者
Sirakov, Nikolay Metodiev [1 ]
Shahnewaz, Tahsin [1 ]
Nakhmani, Arie [2 ]
机构
[1] Texas A&M Univ Commerce, Dept Math, Commerce, TX 75429 USA
[2] Univ Alabama Birmingham, Dept Elect & Comp Engn, Birmingham, AL 35294 USA
基金
美国国家卫生研究院;
关键词
data; distillation; augmentation; classification; machine learning; CLASSIFICATION;
D O I
10.3390/electronics13020282
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work develops a new method for vector data augmentation. The proposed method applies principal component analysis (PCA), determines the eigenvectors of a set of training vectors for a machine learning (ML) method and uses them to generate the distilled vectors. The training and PCA-distilled vectors have the same dimension. The user chooses the number of vectors to be distilled and augmented to the set of training vectors. A statistical approach determines the lowest number of vectors to be distilled such that when augmented to the original vectors, the extended set trains an ML classifier to achieve a required accuracy. Hence, the novelty of this study is the distillation of vectors with the PCA method and their use to augment the original set of vectors. The advantage that comes from the novelty is that it increases the statistics of ML classifiers. To validate the advantage, we conducted experiments with four public databases and applied four classifiers: a neural network, logistic regression and support vector machine with linear and polynomial kernels. For the purpose of augmentation, we conducted several distillations, including nested distillation (double distillation). The latter notion means that new vectors were distilled from already distilled vectors. We trained the classifiers with three sets of vectors: the original vectors, original vectors augmented with vectors distilled by PCA and original vectors augmented with distilled PCA vectors and double distilled by PCA vectors. The experimental results are presented in the paper, and they confirm the advantage of the PCA-distilled vectors increasing the classification statistics of ML methods if the distilled vectors augment the original training vectors.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Using the Robust Principal Component Analysis to Identify Incorrect Aerological Data
    A. M. Kozin
    A. D. Lykov
    I. A. Vyazankin
    A. S. Vyazankin
    Russian Meteorology and Hydrology, 2021, 46 : 631 - 639
  • [32] Using the Robust Principal Component Analysis to Identify Incorrect Aerological Data
    Kozin, A. M.
    Lykov, A. D.
    Vyazankin, I. A.
    Vyazankin, A. S.
    RUSSIAN METEOROLOGY AND HYDROLOGY, 2021, 46 (09) : 631 - 639
  • [33] Implementing Principal Component Analysis and Multinomial Logit for Cancer Detection based on Microarray Data Classification
    Khoirunnisa, Azka
    Adiwijaya
    Rohmawati, Aniq A.
    2019 7TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2019, : 167 - 172
  • [34] Training data augmentation for deep learning radio frequency systems
    Clark, William H.
    Hauser, Steven
    Headley, William C.
    Michaels, Alan J.
    JOURNAL OF DEFENSE MODELING AND SIMULATION-APPLICATIONS METHODOLOGY TECHNOLOGY-JDMS, 2021, 18 (03): : 217 - 237
  • [35] Analysis of Training Data Augmentation for Diabetic Foot Ulcer Semantic Segmentation
    Kairys, Arturas
    Raudonis, Vidas
    ELECTRONICS, 2023, 12 (22)
  • [36] A data-driven principal component analysis-support vector machine approach for breast cancer diagnosis: Comparison and application
    Wu, Wen
    Faisal, Shah
    TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2020, 42 (07) : 1301 - 1312
  • [37] Principal component analysis
    Bro, Rasmus
    Smilde, Age K.
    ANALYTICAL METHODS, 2014, 6 (09) : 2812 - 2831
  • [38] L1-Norm Principal-Component Analysis of Complex Data
    Tsagkarakis, Nicholas
    Markopoulos, Panos P.
    Sklivanitis, George
    Pados, Dimitris A.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2018, 66 (12) : 3256 - 3267
  • [39] Data Classification for Gas Sensor Array by Means of Principal Component Analysis and Sparse Bayesian Learning Algorithm
    Wang, Xiaodong
    Ye, Meiying
    ACC 2009: ETP/IITA WORLD CONGRESS IN APPLIED COMPUTING, COMPUTER SCIENCE, AND COMPUTER ENGINEERING, 2009, : 225 - 228
  • [40] New method for spectral data classification: Two-way moving window principal component analysis
    Shinzawa, Hideyuki
    Morita, Shigeaki
    Ozaki, Yukihiro
    Tsenkova, Roumiana
    APPLIED SPECTROSCOPY, 2006, 60 (08) : 884 - 891