VALIDATION OF CLASSIFICATION MODELS AND DATA REDUCTION METHODS BASED ON GENE EXPRESSION DATA

被引:0
作者
Rafiee, Mohammad [1 ]
Rafiei, Fatemeh [2 ]
Tabatabaei, Seyyed Mohammad [3 ]
AlaviMajd, Hamid [4 ]
Rafiei, Ali [5 ]
Khodakarim, Soheila [6 ,7 ]
机构
[1] Arak Univ Med Sci, Dept Biostat & Epidemiol, Sch Med, Arak, Iran
[2] Univ Tehran Med Sci, Dept Biostat & Epidemiol, Sch Hlth, Sci Res Ctr, Tehran, Iran
[3] Mashhad Univ Med Sci, Dept Med Informat, Sch Med, Mashhad, Razavi Khorasan, Iran
[4] Shahid Beheshti Univ Med Sci, Dept Biostat, Sch Allied Med Sci, Tehran, Iran
[5] Tafresh Univ, Dept Elect Engn, Tafresh, Iran
[6] Shahid Beheshti Univ Med Sci, Dept Epidemiol, Sch Publ Hlth & Safety, Sch Allied Med Sci, Tehran, Iran
[7] Shahid Beheshti Univ Med Sci, Sch Publ Hlth & Safety, Daneshjoo Blvd,Evin Ave, Tehran, Iran
关键词
data mining; data reduction; gene expression; MACHINES;
D O I
10.17654/BS016020079
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Background The microarray technology has provided the simultaneous monitoring of the expression levels for thousands of genes. The analysis of these datasets is a problem in the century of bioinformatics revolution. The classifier methods such as data mining, machine learning and regression have been applied to differentiate between normal and abnormal samples in gene expression datasets, copiously. Method In this study, the classification accuracy of support vector machine (SVM), least square support vector machine (LSSVM), radial base function neural network (RBFNN), Bayesian probit kernel regression (BPKR) and Bayesian logistic kernel regression (BLKR) models on normal and abnormal samples was calculated based on two gene expression datasets and three reduced dimension sets multivariate median gene set analysis (MMGSA), PCA with Karhunen-Loeve transform (PCA-KL) and auto-encoder networks. Results The BKPR method, in full and PCA-KL data with Gaussian and linear kernel, has a high accuracy (up to 94%) and in encoder data with Gaussian kernel has 83% accuracy and in MMGSA data with linear kernel has 92% accuracy. The SVM method in full, PCA-KL and MMGSA data has accuracy up to 94%. The LSSVM method in full and MMGSA data have an acceptable implementation. In MMGSA data, the highest accuracy is 85% related to the SVM method and the BKPR method with Gaussian kernel. Conclusion The MMGSA or other gene set analysis approaches are recommended for data reduction (if needed), because they improve the interpretability of the results, and the BKPR and SVM methods are recommended for classification.
引用
收藏
页码:79 / 90
页数:12
相关论文
共 22 条
[1]   BABELOMICS:: a systems biology perspective in the functional annotation of genome-scale experiments [J].
Al-Shahrour, Fatima ;
Minguez, Pablo ;
Tarraga, Joaquin ;
Montaner, David ;
Alloza, Eva ;
Vaquerizas, Juan M. ;
Conde, Lucia ;
Blaschke, Christian ;
Vera, Javier ;
Dopazo, Joaquin .
NUCLEIC ACIDS RESEARCH, 2006, 34 :W472-W476
[2]  
[Anonymous], 2013, P INT C MACHINE LEAR
[3]   Support vector machine regression (SVR/LS-SVM)-an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data [J].
Balabin, Roman M. ;
Lomakina, Ekaterina I. .
ANALYST, 2011, 136 (08) :1703-1712
[4]   Bayesian binary kernel probit model for microarray based cancer classification and gene selection [J].
Chakraborty, Sounak .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (12) :4198-4209
[5]   Choosing multiple parameters for support vector machines [J].
Chapelle, O ;
Vapnik, V ;
Bousquet, O ;
Mukherjee, S .
MACHINE LEARNING, 2002, 46 (1-3) :131-159
[6]   Monitoring gene expression using DNA microarrays [J].
Harrington, CA ;
Rosenow, C ;
Retief, J .
CURRENT OPINION IN MICROBIOLOGY, 2000, 3 (03) :285-291
[7]   Comparison of four statistical and machine learning methods for crash severity prediction [J].
Iranitalab, Amirfarrokh ;
Khattak, Aemal .
ACCIDENT ANALYSIS AND PREVENTION, 2017, 108 :27-36
[8]   Exploration, normalization, and summaries of high density oligonucleotide array probe level data [J].
Irizarry, RA ;
Hobbs, B ;
Collin, F ;
Beazer-Barclay, YD ;
Antonellis, KJ ;
Scherf, U ;
Speed, TP .
BIOSTATISTICS, 2003, 4 (02) :249-264
[9]   Principal component analysis: a review and recent developments [J].
Jolliffe, Ian T. ;
Cadima, Jorge .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2016, 374 (2065)
[10]   Bioinformatics education dissemination with an evolutionary problem solving perspective [J].
Jungck, John R. ;
Donovan, Samuel S. ;
Weisstein, Anton E. ;
Khiripet, Noppadon ;
Everse, Stephen J. .
BRIEFINGS IN BIOINFORMATICS, 2010, 11 (06) :570-581