Colorectal Cancer Prediction Based on Weighted Gene Co-Expression Network Analysis and Variational Auto-Encoder

被引:32
作者
Ai, Dongmei [1 ,2 ]
Wang, Yuduo [2 ]
Li, Xiaoxin [2 ]
Pan, Hongfei [2 ]
机构
[1] Univ Sci & Technol Beijing, Basic Expt Ctr Nat Sci, Beijing 100083, Peoples R China
[2] Univ Sci & Technol Beijing, Sch Math & Phys, Beijing 100083, Peoples R China
基金
中国国家自然科学基金;
关键词
weighted gene co-expression network analysis; variational autoencoder; colorectal cancer; hub genes; classifier; MICROARRAY EXPRESSION DATA; PACKAGE; SELECTION; REVEALS;
D O I
10.3390/biom10091207
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
An effective feature extraction method is key to improving the accuracy of a prediction model. From the Gene Expression Omnibus (GEO) database, which includes 13,487 genes, we obtained microarray gene expression data for 238 samples from colorectal cancer (CRC) samples and normal samples. Twelve gene modules were obtained by weighted gene co-expression network analysis (WGCNA) on 173 samples. By calculating the Pearson correlation coefficient (PCC) between the characteristic genes of each module and colorectal cancer, we obtained a key module that was highly correlated with CRC. We screened hub genes from the key module by considering module membership, gene significance, and intramodular connectivity. We selected 10 hub genes as a type of feature for the classifier. We used the variational autoencoder (VAE) for 1159 genes with significantly different expressions and mapped the data into a 10-dimensional representation, as another type of feature for the cancer classifier. The two types of features were applied to the support vector machines (SVM) classifier for CRC. The accuracy was 0.9692 with an AUC of 0.9981. The result shows a high accuracy of the two-step feature extraction method, which includes obtaining hub genes by WGCNA and a 10-dimensional representation by variational autoencoder (VAE).
引用
收藏
页码:1 / 11
页数:11
相关论文
共 54 条
[1]   ColoGuideEx: a robust gene classifier specific for stage II colorectal cancer prognosis [J].
Agesen, Trude H. ;
Sveen, Anita ;
Merok, Marianne A. ;
Lind, Guro E. ;
Nesbakken, Arild ;
Skotheim, Rolf I. ;
Lothe, Ragnhild A. .
GUT, 2012, 61 (11) :1560-1567
[2]   Systematic evaluation of supervised classifiers for fecal microbiota-based prediction of colorectal cancer [J].
Ai, Luoyan ;
Tian, Haiying ;
Chen, Zhaofei ;
Chen, Huimin ;
Xu, Jie ;
Fang, Jing-Yuan .
ONCOTARGET, 2017, 8 (06) :9546-9556
[3]   CDH2 and CDH11 act as regulators of stem cell fate decisions [J].
Alimperti, Stella ;
Andreadis, Stelios T. .
STEM CELL RESEARCH, 2015, 14 (03) :270-282
[4]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[5]  
[Anonymous], 2014, ICLR 2014
[6]  
[Anonymous], 2011, LECT NOTES STANFORD
[7]   A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes [J].
Baldi, P ;
Long, AD .
BIOINFORMATICS, 2001, 17 (06) :509-519
[8]  
Bärlund M, 2000, CANCER RES, V60, P5340
[9]   Adjustment of systematic microarray data biases [J].
Benito, M ;
Parker, J ;
Du, Q ;
Wu, JY ;
Xang, D ;
Perou, CM ;
Marron, JS .
BIOINFORMATICS, 2004, 20 (01) :105-114
[10]   Comparison of data-merging methods with SVM attribute selection and classification in breast cancer gene expression [J].
Bevilacqua, Vitoantonio ;
Pannarale, Paolo ;
Abbrescia, Mirko ;
Cava, Claudia ;
Paradiso, Angelo ;
Tommasi, Stefania .
BMC BIOINFORMATICS, 2012, 13