Microbial data augmentation combining feature extraction and transformer network

被引:6
作者
Wen, Liu-Ying [1 ]
Chen, Zhu [1 ]
Xie, Xiao-Nan [1 ]
Min, Fan [1 ,2 ]
机构
[1] Southwest Petr Univ, Sch Comp Sci, Chengdu 610500, Peoples R China
[2] Southwest Petr Univ, Inst Artificial Intelligence, Chengdu 610500, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Class imbalance; Data augmentation; Feature extraction; Microbial data; FEATURE-SELECTION;
D O I
10.1007/s13042-023-02047-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Microbial data exhibit high dimensionality, feature sparseness, and class imbalance. Popular data augmentation strategies typically generate intersecting or overlapping samples with low diversity. In this paper, we propose a two-stage data augmentation method to synthesize high-quality samples. First, we train a feature extractor by minimizing the cross-entropy. Positive training instances are oversampled to achieve class balance. Second, a transformer network is trained for data augmentation to balance diversity and discernibility. A dropout technique is designed to randomly set some feature values as missing. Experiments were carried out on 10 microbial datasets. The results show that: (1) the constructed feature extraction neural network can effectively reduce the dimension of the data, make the data no longer sparse, improve the distinguishability of the samples, and obtain more obvious classification boundaries, and (2) The trained data augmentation transformer network with dropout technique can generate high quality samples, improve the performance of the classifier and reduce the cost of misclassification.
引用
收藏
页码:2539 / 2550
页数:12
相关论文
共 40 条
[1]   Impact of fully connected layers on performance of convolutional neural networks for image classification [J].
Basha, S. H. Shabbeer ;
Dubey, Shiv Ram ;
Pulabaigari, Viswanath ;
Mukherjee, Snehasis .
NEUROCOMPUTING, 2020, 378 :112-119
[2]  
Bedi Punam, 2020, Procedia Computer Science, V171, P780, DOI 10.1016/j.procs.2020.04.085
[3]   An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme [J].
Bi, Jingjun ;
Zhang, Chongsheng .
KNOWLEDGE-BASED SYSTEMS, 2018, 158 :81-93
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]   Transformer Interpretability Beyond Attention Visualization [J].
Chefer, Hila ;
Gur, Shir ;
Wolf, Lior .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :782-791
[6]   The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation [J].
Chicco, Davide ;
Warrens, Matthijs J. ;
Jurman, Giuseppe .
PEERJ COMPUTER SCIENCE, 2021,
[7]   Handling imbalanced data for aircraft predictive maintenance using the BACHE algorithm [J].
Dangut, Maren David ;
Skaf, Zakwan ;
Jennions, Ian K. .
APPLIED SOFT COMPUTING, 2022, 123
[8]   Integrated machine learning methods with resampling algorithms for flood susceptibility prediction [J].
Dodangeh, Esmaeel ;
Choubin, Bahram ;
Eigdir, Ahmad Najafi ;
Nabipour, Narjes ;
Panahi, Mehdi ;
Shamshirband, Shahaboddin ;
Mosavi, Amir .
SCIENCE OF THE TOTAL ENVIRONMENT, 2020, 705
[9]   Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE [J].
Douzas, Georgios ;
Bacao, Fernando ;
Last, Felix .
INFORMATION SCIENCES, 2018, 465 :1-20
[10]   Long-Tailed Graph Representation Learning via Dual Cost-Sensitive Graph Convolutional Network [J].
Duan, Yijun ;
Liu, Xin ;
Jatowt, Adam ;
Yu, Hai-tao ;
Lynden, Steven ;
Kim, Kyoung-Sook ;
Matono, Akiyoshi .
REMOTE SENSING, 2022, 14 (14)