Imbalanced biomedical data classification using self-adaptive multilayer ELM combined with dynamic GAN

被引:34
作者
Zhang, Liyuan [1 ]
Yang, Huamin [1 ]
Jiang, Zhengang [1 ]
机构
[1] Changchun Univ Sci & Technol, Sch Comp Sci & Technol, Med Imaging Engn Lab, 7089 Weixing Rd, Changchun, Jilin, Peoples R China
关键词
Imbalanced data classification; Limited biomedical samples; High-dimensional feature; Multilayer ELM; Dynamic GAN; EXTREME LEARNING MACHINES; PROBABILISTIC ATLAS; WEIGHTS;
D O I
10.1186/s12938-018-0604-3
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Background: Imbalanced data classification is an inevitable problem in medical intelligent diagnosis. Most of real-world biomedical datasets are usually along with limited samples and high-dimensional feature. This seriously affects the classification performance of the model and causes erroneous guidance for the diagnosis of diseases. Exploring an effective classification method for imbalanced and limited biomedical dataset is a challenging task. Methods: In this paper, we propose a novel multilayer extreme learning machine(ELM) classification model combined with dynamic generative adversarial net (GAN) to tackle limited and imbalanced biomedical data. Firstly, principal component analysis is utilized to remove irrelevant and redundant features. Meanwhile, more meaningful pathological features are extracted. After that, dynamic GAN is designed to generate the realistic-looking minority class samples, thereby balancing the class distribution and avoiding overfitting effectively. Finally, a self-adaptive multilayer ELM is proposed to classify the balanced dataset. The analytic expression for the numbers of hidden layer and node is determined by quantitatively establishing the relationship between the change of imbalance ratio and the hyper-parameters of the model. Reducing interactive parameters adjustment makes the classification model more robust. Results: To evaluate the classification performanceof the proposed method, numerical experiments are conducted on four real-world biomedical datasets. Theproposed method can generate authentic minority class samples and self-adaptivelyselect the optimal parameters of learning model. By comparing with W-ELM, SMOTE-ELM, and H-ELM methods, the quantitative experimental results demonstrate that our method can achieve better classification performance and higher computational efficiency in terms of ROC, AUC, G-mean, and F-measure metrics. Conclusions: Our study provides an effective solution for imbalanced biomedical data classification under the condition of limited samples and high-dimensional feature. The proposed method could offer a theoretical basis for computer-aided diagnosis. Ithas the potential to be applied in biomedical clinical practice.
引用
收藏
页数:21
相关论文
共 39 条
[1]   Using principal component analysis to estimate a high dimensional factor model with high-frequency data [J].
Ait-Sahalia, Yacine ;
Xiu, Dacheng .
JOURNAL OF ECONOMETRICS, 2017, 201 (02) :384-399
[2]   Biomedical Data Augmentation Using Generative Adversarial Neural Networks [J].
Calimeri, Francesco ;
Marzullo, Aldo ;
Stamile, Claudio ;
Terracina, Giorgio .
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, PT II, 2017, 10614 :626-634
[3]   Ensemble-based hybrid probabilistic sampling for imbalanced data learning in lung nodule CAD [J].
Cao, Peng ;
Yang, Jinzhu ;
Li, Wei ;
Zhao, Dazhe ;
Zaiane, Osmar .
COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2014, 38 (03) :137-150
[4]   Oversampling imbalanced data in the string space [J].
Castellanos, Francisco J. ;
Valero-Mas, Jose J. ;
Calvo-Zaragoza, Jorge ;
Rico-Juan, Juan R. .
PATTERN RECOGNITION LETTERS, 2018, 103 :32-38
[5]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[6]   Machine learning approaches in medical image analysis: From detection to diagnosis [J].
de Bruijne, Marleen .
MEDICAL IMAGE ANALYSIS, 2016, 33 :94-97
[7]   EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling [J].
Galar, Mikel ;
Fernandez, Alberto ;
Barrenechea, Edurne ;
Herrera, Francisco .
PATTERN RECOGNITION, 2013, 46 (12) :3460-3471
[8]   A Novel SMOTE-Based Classification Approach to Online Data Imbalance Problem [J].
Gong, Chunlin ;
Gu, Liangxian .
MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
[9]   Generative Adversarial Networks [J].
Goodfellow, Ian ;
Pouget-Abadie, Jean ;
Mirza, Mehdi ;
Xu, Bing ;
Warde-Farley, David ;
Ozair, Sherjil ;
Courville, Aaron ;
Bengio, Yoshua .
COMMUNICATIONS OF THE ACM, 2020, 63 (11) :139-144
[10]   Learning from class-imbalanced data: Review of methods and applications [J].
Guo Haixiang ;
Li Yijing ;
Shang, Jennifer ;
Gu Mingyun ;
Huang Yuanyue ;
Bing, Gong .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 73 :220-239