Ensemble-based active learning using fuzzy-rough approach for cancer sample classification

被引:18
作者
Kumar, Ansuman [1 ]
Halder, Anindya [1 ]
机构
[1] North Eastern Hill Univ, Dept Comp Applicat, Tura Campus, Chasingre 794002, Meghalaya, India
关键词
Ensemble learning; Active learning; Cancer classification; Gene expression data; Fuzzy set; Rough set; GENE-EXPRESSION DATA; TUMOR CLASSIFICATION; CLUSTER-ANALYSIS; PREDICTION; ALGORITHM; SELECTION; SVM;
D O I
10.1016/j.engappai.2020.103591
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Background and Objective: Classification of cancer from gene expression data is one of the major research areas in the field of machine learning and medical science. Generally, conventional supervised methods are not able to produce desired classification accuracy due to inadequate training samples present in gene expression data to train the system. Ensemble-based active learning technique in this situation can be effective as it determines few informative samples by all the base classifiers and ensemble the decisions of all the base classifiers to get the most informative samples. Most informative samples are labeled by the subject experts and those are added to the training set, which can improve the classification accuracy. Method: We propose a novel ensemble-based active learning using fuzzy-rough approach for cancer sample classification from microarray gene expression data. The proposed method is able to deal with the uncertainty, overlap and indiscernibility usually present in the subtype classes of the gene expression data and can improve the accuracy of the individual base classifier in presence of limited training samples. Results: The proposed method is validated using eight microarray gene expression datasets. The performance of the proposed method in terms of classification accuracy, precision, recall, F-1-measures and kappa is compared with six other methods. The improvements in accuracy achieved by the proposed method compared to its nearest competitive methods are 2.96%, 9.34%, 0.93%, 3.69%, 7.2% and 4.53% respectively for Colon cancer, Prostate cancer, SRBCT, Ovarian cancer, DLBCL and Central nervous system datasets. Results of the paired t-test justify the statistical relevance of the results in favor of the proposed method for most of the datasets. Conclusion: The proposed method is an effective general purpose ensemble-based active learning adopting the fuzzy-rough concept and therefore can be applied for other classification problem in future.
引用
收藏
页数:12
相关论文
共 58 条
[1]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]  
[Anonymous], 2012, Rough-Fuzzy pattern recognition: Applications in bioinformatics and medical imaging
[4]  
[Anonymous], 2011, 2011 IEEE S COMPUTAT
[5]   The power of ensembles for active learning in image classification [J].
Beluch, William H. ;
Genewein, Tim ;
Nuernberger, Andreas ;
Koehler, Jan M. .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :9368-9377
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES [J].
COHEN, J .
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) :37-46
[8]   Boosting for tumor classification with gene expression data [J].
Dettling, M ;
Bühlmann, P .
BIOINFORMATICS, 2003, 19 (09) :1061-1069
[9]   BagBoosting for tumor classification with gene expression data [J].
Dettling, M .
BIOINFORMATICS, 2004, 20 (18) :3583-3593
[10]  
Dettling M, 2002, GENOME BIOL, V3