Training Data Augmentation with Data Distilled by Principal Component Analysis

被引:0
|
作者
Sirakov, Nikolay Metodiev [1 ]
Shahnewaz, Tahsin [1 ]
Nakhmani, Arie [2 ]
机构
[1] Texas A&M Univ Commerce, Dept Math, Commerce, TX 75429 USA
[2] Univ Alabama Birmingham, Dept Elect & Comp Engn, Birmingham, AL 35294 USA
基金
美国国家卫生研究院;
关键词
data; distillation; augmentation; classification; machine learning; CLASSIFICATION;
D O I
10.3390/electronics13020282
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work develops a new method for vector data augmentation. The proposed method applies principal component analysis (PCA), determines the eigenvectors of a set of training vectors for a machine learning (ML) method and uses them to generate the distilled vectors. The training and PCA-distilled vectors have the same dimension. The user chooses the number of vectors to be distilled and augmented to the set of training vectors. A statistical approach determines the lowest number of vectors to be distilled such that when augmented to the original vectors, the extended set trains an ML classifier to achieve a required accuracy. Hence, the novelty of this study is the distillation of vectors with the PCA method and their use to augment the original set of vectors. The advantage that comes from the novelty is that it increases the statistics of ML classifiers. To validate the advantage, we conducted experiments with four public databases and applied four classifiers: a neural network, logistic regression and support vector machine with linear and polynomial kernels. For the purpose of augmentation, we conducted several distillations, including nested distillation (double distillation). The latter notion means that new vectors were distilled from already distilled vectors. We trained the classifiers with three sets of vectors: the original vectors, original vectors augmented with vectors distilled by PCA and original vectors augmented with distilled PCA vectors and double distilled by PCA vectors. The experimental results are presented in the paper, and they confirm the advantage of the PCA-distilled vectors increasing the classification statistics of ML methods if the distilled vectors augment the original training vectors.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Sample Reduction for Physiological Data Analysis Using Principal Component Analysis in Artificial Neural Network
    Adolfo, Cid Mathew Santiago
    Chizari, Hassan
    Win, Thu Yein
    Al-Majeed, Salah
    APPLIED SCIENCES-BASEL, 2021, 11 (17):
  • [22] Feature Extraction of Autism Gait Data Using Principal Component Analysis and Linear Discriminant Analysis
    Ilias, Suryani
    Tahir, Nooritawati Md
    Jailani, Rozita
    2016 IEEE INDUSTRIAL ELECTRONICS AND APPLICATIONS CONFERENCE (IEACON), 2016, : 275 - 279
  • [23] An introductory review on the application of principal component analysis in the data exploration of the chemical analysis of food samples
    Souza, Anderson Santos
    Bezerra, Marcos Almeida
    Cerqueira, Uillian Mozart Ferreira Mata
    Rodrigues, Caiene Jesus Oliveira
    Santos, Bianca Cotrim
    Novaes, Cleber Galvao
    Almeida, Erica Raina Venancio
    FOOD SCIENCE AND BIOTECHNOLOGY, 2024, 33 (06) : 1323 - 1336
  • [24] Comparison of Machine Learning Classifiers for dimensionally reduced fMRI data using Random Projection and Principal Component Analysis
    Suhaimi, Nur Farahana Mohd
    Htike, Zaw Zaw
    2019 7TH INTERNATIONAL CONFERENCE ON MECHATRONICS ENGINEERING (ICOM), 2019, : 7 - 12
  • [25] Enhancements to a Geographically Weighted Principal Component Analysis in the Context of an Application to an Environmental Data Set
    Harris, Paul
    Clarke, Annemarie
    Juggins, Steve
    Brunsdon, Chris
    Charlton, Martin
    GEOGRAPHICAL ANALYSIS, 2015, 47 (02) : 146 - 172
  • [26] Remote Sensing Data Augmentation Through Adversarial Training
    Lv, Ning
    Ma, Hongxiang
    Chen, Chen
    Pei, Qingqi
    Zhou, Yang
    Xiao, Fenglin
    Li, Ji
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 9318 - 9333
  • [27] Application of Principal Component Analysis to Multispectral Imaging Data for Evaluation of Pigmented Skin Lesions
    Jakovels, Dainis
    Lihacova, Ilze
    Kuzmina, Ilona
    Spigulis, Janis
    BIOPHOTONICS - RIGA 2013, 2013, 9032
  • [28] Fusion of multispectral and panchromatic data using regionally weighted principal component analysis and wavelet
    Jayanth, J.
    Kumar, T. Ashok
    Koliwad, Shivaprakash
    CURRENT SCIENCE, 2018, 115 (10): : 1938 - 1942
  • [29] EPCA—Enhanced Principal Component Analysis for Medical Data Dimensionality Reduction
    Vinutha M.R.
    Chandrika J.
    Krishnan B.
    Kokatnoor S.A.
    SN Computer Science, 4 (3)
  • [30] Training multilayer perceptrons by principal component analysis
    Biehl, M
    Bunzmann, C
    Urbanczik, R
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2001, 302 (1-4) : 56 - 63