Using the two-population genetic algorithm with distance-based k-nearest neighbour voting classifier for high-dimensional data

被引:5
作者
Lee, Chien-Pang [1 ]
Lin, Wen-Shin [2 ]
机构
[1] Natl Kaohsiung Marine Univ, Dept Maritime Informat & Technol, 482,Zhongzhou 3rd Rd, Kaohsiung 805, Taiwan
[2] Natl Pingtung Univ Sci & Technol, Dept Plant Ind, 1 Shuefu Rd, Pingtung 912, Taiwan
关键词
genetic algorithm; k-nearest neighbour; Fisher's least significant difference; outlier detection; high-dimensional data; gene expression data; FEATURE-SELECTION METHOD; PARTIAL LEAST-SQUARES; EXPRESSION DATA; SAMPLE CLASSIFICATION; TUMOR CLASSIFICATION; CLASS PREDICTION; CANCER; REDUCTION; NETWORKS; ENSEMBLE;
D O I
10.1504/IJDMB.2016.075820
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Owing to developments in computer technology, high-dimensional data has become a popular research issue. However, the traditional statistical methods cannot perform well when the variable numbers (p) are greater than the sample size (n). Accordingly, this paper proposes a novel hybrid model that combines statistical methodology with data mining techniques for the classification of high-dimensional data. In the proposed model, the Fisher's least significant difference test was originally used for initial dimension reduction. Subsequently, this paper uses a two-population genetic algorithms and a non-parametric statistics classification method (distance-based k-nearest neighbour voting classifier) to evaluate and to rank the variables' importance. Furthermore, the evaluation of the relevant variables for classification is considered with the outlier detection method. Eight different public gene expression datasets are used to compare the performance of the proposed model with the existing methods. The experimental results indicate that the proposed model performs better than the existing methods in terms of the classification accuracy.
引用
收藏
页码:315 / 331
页数:17
相关论文
共 57 条
[1]  
[Anonymous], 2012, J. Inf. Comput. Sci
[2]  
Armano G, 2011, LECT N BIOINFORMAT, V7036, P191, DOI 10.1007/978-3-642-24855-9_17
[3]   MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia [J].
Armstrong, SA ;
Staunton, JE ;
Silverman, LB ;
Pieters, R ;
de Boer, ML ;
Minden, MD ;
Sallan, SE ;
Lander, ES ;
Golub, TR ;
Korsmeyer, SJ .
NATURE GENETICS, 2002, 30 (01) :41-47
[4]   Gene expression profile class prediction using linear Bayesian classifiers [J].
Asyali, Musa H. .
COMPUTERS IN BIOLOGY AND MEDICINE, 2007, 37 (12) :1690-1699
[5]   Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses [J].
Bhattacharjee, A ;
Richards, WG ;
Staunton, J ;
Li, C ;
Monti, S ;
Vasa, P ;
Ladd, C ;
Beheshti, J ;
Bueno, R ;
Gillette, M ;
Loda, M ;
Weber, G ;
Mark, EJ ;
Lander, ES ;
Wong, W ;
Johnson, BE ;
Golub, TR ;
Sugarbaker, DJ ;
Meyerson, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) :13790-13795
[6]   A novel filter feature selection method for paired microarray expression data analysis [J].
Cao, Zhongbo ;
Wang, Yan ;
Sun, Ying ;
Du, Wei ;
Liang, Yanchun .
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2015, 12 (04) :363-386
[7]   A program to identify prognostic and predictive gene signatures [J].
Chorlton S.D. ;
Hallett R.M. ;
Hassell J.A. .
BMC Research Notes, 7 (1)
[8]   Dimension reduction in binary response regression [J].
Cook, RD ;
Lee, H .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1999, 94 (448) :1187-1200
[9]   Between-group analysis of microarray data [J].
Culhane, AC ;
Perrière, G ;
Considine, EC ;
Cotter, TG ;
Higgins, DG .
BIOINFORMATICS, 2002, 18 (12) :1600-1608
[10]  
De Jong K. A., 1975, ANAL BEHAV CLASS GEN