Gene expression data classification: some distance-based methods

被引:0
作者
Makinde, Olusola Samuel [1 ]
机构
[1] Fed Univ Technol Akure, Dept Stat, Akure, Nigeria
关键词
Distance methods; gene classifier; gene expression; proportion of correct classification; SUPPORT VECTOR MACHINE; CANCER; PREDICTION; CLASSIFIERS;
D O I
暂无
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Micro-array dataset is a classical example of high throughput data characterized with more features (genes) than sample points (gene expression levels). A number of classification techniques have been proposed in literature. Many of these methods are either computationally expensive or perform sub-optimally. In this paper, some distance functions are considered and classification rules based on the distance functions are formulated. The distance functions include average distance measure, distance to component-wise median, distance to mean. We also define a probabilistic approach to classification rules based on two of the distance measures. Gene selection technique based on shrunken centroids regularized discriminant analysis was employed on small round blue cell tissue, colon cancer, lymphoma, prostate cancer and leukaemia data before applying the classification rules. Three simulation studies were performed to mimic gene expression data. The performance of the classification methods mentioned above was compared with performance of some known classification methods in literature. The performance of the distance-based classification methods is competitive with some existing classification methods. Distance based methods implemented in this study are computationally simple and very cheap in terms of computational cost.
引用
收藏
页码:31 / 39
页数:9
相关论文
共 25 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]   The deepest point for distributions in infinite dimensional spaces [J].
Chakraborty, Anirvan ;
Chaudhuri, Probal .
STATISTICAL METHODOLOGY, 2014, 20 :27-39
[4]   Sparse Partial Least Squares Classification for High Dimensional Data [J].
Chung, Dongjun ;
Keles, Sunduz .
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2010, 9 (01)
[5]  
Colak C, 2016, KUWAIT J SCI, V43, P86
[6]  
Cover Thomas M., 1968, P HAW INT C SYST SCI, P413
[7]   Classification of microarrays to nearest centroids [J].
Dabney, AR .
BIOINFORMATICS, 2005, 21 (22) :4148-4154
[8]   Support vector machine classification and validation of cancer tissue samples using microarray expression data [J].
Furey, TS ;
Cristianini, N ;
Duffy, N ;
Bednarski, DW ;
Schummer, M ;
Haussler, D .
BIOINFORMATICS, 2000, 16 (10) :906-914
[9]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[10]   Regularized linear discriminant analysis and its application in microarrays [J].
Guo, Yaqian ;
Hastie, Trevor ;
Tibshirani, Robert .
BIOSTATISTICS, 2007, 8 (01) :86-100