Training Data Augmentation with Data Distilled by Principal Component Analysis

被引:0
作者
Sirakov, Nikolay Metodiev [1 ]
Shahnewaz, Tahsin [1 ]
Nakhmani, Arie [2 ]
机构
[1] Texas A&M Univ Commerce, Dept Math, Commerce, TX 75429 USA
[2] Univ Alabama Birmingham, Dept Elect & Comp Engn, Birmingham, AL 35294 USA
基金
美国国家卫生研究院;
关键词
data; distillation; augmentation; classification; machine learning; CLASSIFICATION;
D O I
10.3390/electronics13020282
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work develops a new method for vector data augmentation. The proposed method applies principal component analysis (PCA), determines the eigenvectors of a set of training vectors for a machine learning (ML) method and uses them to generate the distilled vectors. The training and PCA-distilled vectors have the same dimension. The user chooses the number of vectors to be distilled and augmented to the set of training vectors. A statistical approach determines the lowest number of vectors to be distilled such that when augmented to the original vectors, the extended set trains an ML classifier to achieve a required accuracy. Hence, the novelty of this study is the distillation of vectors with the PCA method and their use to augment the original set of vectors. The advantage that comes from the novelty is that it increases the statistics of ML classifiers. To validate the advantage, we conducted experiments with four public databases and applied four classifiers: a neural network, logistic regression and support vector machine with linear and polynomial kernels. For the purpose of augmentation, we conducted several distillations, including nested distillation (double distillation). The latter notion means that new vectors were distilled from already distilled vectors. We trained the classifiers with three sets of vectors: the original vectors, original vectors augmented with vectors distilled by PCA and original vectors augmented with distilled PCA vectors and double distilled by PCA vectors. The experimental results are presented in the paper, and they confirm the advantage of the PCA-distilled vectors increasing the classification statistics of ML methods if the distilled vectors augment the original training vectors.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Addressing the sparsity of laser-induced breakdown spectroscopy data with randomized sparse principal component analysis
    Kepes, Erik
    Vrabel, Jakub
    Porizka, Pavel
    Kaiser, Jozef
    JOURNAL OF ANALYTICAL ATOMIC SPECTROMETRY, 2021, 36 (07) : 1410 - 1421
  • [42] Functional principal component analysis for near-infrared spectral data: a case study on Tricholoma matsutakeis
    Li, Haoran
    Pan, Tianhong
    Li, Yuqiang
    Chen, Shan
    Li, Guoquan
    INTERNATIONAL JOURNAL OF FOOD ENGINEERING, 2020, 16 (08)
  • [43] Assessing mineral profiles for rice flour fraud detection by principal component analysis based data fusion
    Perez-Rodriguez, Michael
    Maia Dirchwolf, Pamela
    Rodriguez-Negrin, Zenaida
    Gerardo Pellerano, Roberto
    FOOD CHEMISTRY, 2021, 339
  • [44] Wasserstein Generative Adversarial Networks Based Data Augmentation for Radar Data Analysis
    Lee, Hansoo
    Kim, Jonggeun
    Kim, Eun Kyeong
    Kim, Sungshin
    APPLIED SCIENCES-BASEL, 2020, 10 (04):
  • [45] SIMULATING DYSARTHRIC SPEECH FOR TRAINING DATA AUGMENTATION IN CLINICAL SPEECH APPLICATIONS
    Jiao, Yishan
    Tu, Ming
    Berisha, Visar
    Liss, Julie
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6009 - 6013
  • [46] Adaptive Data Augmentation Training Method for SAR Military Target Classification
    Chen, Hongren
    Zhu, Daiyin
    Wu, Di
    Lv, Jiming
    Huang, Jiawei
    2024 9TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, ICSIP, 2024, : 256 - 260
  • [47] Data Augmentation for Electrocardiograms
    Raghu, Aniruddh
    Shanmugam, Divya
    Pomerantsev, Eugene
    Guttag, John
    Stultz, Collin M.
    CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 174, 2022, 174 : 282 - 310
  • [48] Noise-free principal component analysis: An efficient dimension reduction technique for high dimensional molecular data
    Rezghi, Mansoor
    Obulkasim, Askar
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (17) : 7797 - 7804
  • [49] APPLICATION OF PRINCIPAL-COMPONENT ANALYSIS ON NEAR-INFRARED SPECTROSCOPIC DATA OF VEGETABLE-OILS FOR THEIR CLASSIFICATION
    SATO, T
    JOURNAL OF THE AMERICAN OIL CHEMISTS SOCIETY, 1994, 71 (03) : 293 - 298
  • [50] Data reconstruction using iteratively reweighted L1-principal component analysis for an electronic nose system
    Jeon, Hong-Min
    Lee, Je-Yeol
    Jeong, Gu-Min
    Choi, Sang-Il
    PLOS ONE, 2018, 13 (07):