Adaptive Semi-Supervised Classifier Ensemble for High Dimensional Data Classification

被引:54
作者
Yu, Zhiwen [1 ]
Zhang, Yidong [1 ]
You, Jane [2 ]
Chen, C. L. Philip [3 ,4 ,5 ]
Wong, Hau-San [6 ]
Han, Guoqiang [1 ]
Zhang, Jun [1 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510006, Guangdong, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
[3] Univ Macau, Fac Sci & Technol, Dept Comp & Informat Sci, Macau 99999, Peoples R China
[4] Dalian Maritime Univ, Dalian 116026, Peoples R China
[5] Chinese Acad Sci, Inst Automat, Key Lab Management & Control Complex Syst, Beijing 100080, Peoples R China
[6] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
关键词
Classification; ensemble learning; feature selection; high dimensional data; optimization; semi-supervised learning; FRAMEWORK; TUMOR; ILLUMINATION; CARCINOMAS;
D O I
10.1109/TCYB.2017.2761908
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
High dimensional data classification with very limited labeled training data is a challenging task in the area of data mining. In order to tackle this task, we first propose a feature selection-based semi-supervised classifier ensemble framework (FSCE) to perform high dimensional data classification. Then, we design an adaptive semi-supervised classifier ensemble framework (ASCE) to improve the performance of FSCE. When compared with FSCE, ASCE is characterized by an adaptive feature selection process, an adaptive weighting process (AWP), and an auxiliary training set generation process (ATSGP). The adaptive feature selection process generates a set of compact subspaces based on the selected attributes obtained by the feature selection algorithms, while the AWP associates each basic semi-supervised classifier in the ensemble with a weight value. The ATSGP enlarges the training set with unlabeled samples. In addition, a set of nonparametric tests are adopted to compare multiple semi-supervised classifier ensemble (SSCE) approaches over different datasets. The experiments on 20 high dimensional real-world datasets show that: 1) the two adaptive processes in ASCE are useful for improving the performance of the SSCE approach and 2) ASCE works well on high dimensional datasets with very limited labeled training data, and outperforms most state-of-the-art SSCE approaches.
引用
收藏
页码:366 / 379
页数:14
相关论文
共 74 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]   Semi-Supervised Affinity Propagation with Soft Instance-Level Constraints [J].
Arzeno, Natalia M. ;
Vikalo, Haris .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (05) :1041-1052
[4]   Model-Based Compressive Sensing [J].
Baraniuk, Richard G. ;
Cevher, Volkan ;
Duarte, Marco F. ;
Hegde, Chinmay .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2010, 56 (04) :1982-2001
[5]   Efficient Semi-Supervised Feature Selection: Constraint, Relevance, and Redundancy [J].
Benabdeslem, Khalid ;
Hindawi, Mohammed .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (05) :1131-1143
[6]   Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses [J].
Bhattacharjee, A ;
Richards, WG ;
Staunton, J ;
Li, C ;
Monti, S ;
Vasa, P ;
Ladd, C ;
Beheshti, J ;
Bueno, R ;
Gillette, M ;
Loda, M ;
Weber, G ;
Mark, EJ ;
Lander, ES ;
Wong, W ;
Johnson, BE ;
Golub, TR ;
Sugarbaker, DJ ;
Meyerson, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) :13790-13795
[7]  
Bishop C.M., 1995, Neural networks for pattern recognition
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]   Particle Competition and Cooperation in Networks for Semi-Supervised Learning [J].
Breve, Fabricio ;
Zhao, Liang ;
Quiles, Marcos ;
Pedrycz, Witold ;
Liu, Jiming .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (09) :1686-1698
[10]   Clustering cancer gene expression data: a comparative study [J].
de Souto, Marcilio C. P. ;
Costa, Ivan G. ;
de Araujo, Daniel S. A. ;
Ludermir, Teresa B. ;
Schliep, Alexander .
BMC BIOINFORMATICS, 2008, 9 (1)