A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma

被引:13
作者
Abdelwahab, Omar [1 ]
Awad, Nourelislam [1 ,2 ]
Elserafy, Menattallah [1 ,3 ]
Badr, Eman [1 ,4 ]
机构
[1] Univ Sci & Technol, Zewail City Sci & Technol, Giza, Egypt
[2] Nile Univ, Ctr Informat Sci, Giza, Egypt
[3] Zewail City Sci & Technol, Helmy Inst Med Sci, Ctr Genom, Giza, Egypt
[4] Cairo Univ, Fac Comp & Artificial Intelligence, Giza, Egypt
关键词
RECURSIVE FEATURE ELIMINATION; GENE-EXPRESSION DATA; POOR-PROGNOSIS; POTENTIAL BIOMARKERS; TUMOR-SUPPRESSOR; IDENTIFICATION; CLASSIFICATION; CELLS; PROGRESSION; PROFILES;
D O I
10.1371/journal.pone.0269126
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Lung cancer (LC) represents most of the cancer incidences in the world. There are many types of LC, but Lung Adenocarcinoma (LUAD) is the most common type. Although RNA-seq and microarray data provide a vast amount of gene expression data, most of the genes are insignificant to clinical diagnosis. Feature selection (FS) techniques overcome the high dimensionality and sparsity issues of the large-scale data. We propose a framework that applies an ensemble of feature selection techniques to identify genes highly correlated to LUAD. Utilizing LUAD RNA-seq data from the Cancer Genome Atlas (TCGA), we employed mutual information (MI) and recursive feature elimination (RFE) feature selection techniques along with support vector machine (SVM) classification model. We have also utilized Random Forest (RF) as an embedded FS technique. The results were integrated and candidate biomarker genes across all techniques were identified. The proposed framework has identified 12 potential biomarkers that are highly correlated with different LC types, especially LUAD. A predictive model has been trained utilizing the identified biomarker expression profiling and performance of 97.99% was achieved. In addition, upon performing differential gene expression analysis, we could find that all 12 genes were significantly differentially expressed between normal and LUAD tissues, and strongly correlated with LUAD according to previous reports. We here propose that using multiple feature selection methods effectively reduces the number of identified biomarkers and directly affects their biological relevance.
引用
收藏
页数:23
相关论文
共 82 条
[21]   Identification of novel gene expression signature in lung adenocarcinoma by using next-generation sequencing data and bioinformatics analysis [J].
Hsu, Ya-Ling ;
Hung, Jen-Yu ;
Lee, Yen-Lung ;
Chen, Feng-Wei ;
Chang, Kuo-Feng ;
Chang, Wei-An ;
Tsai, Ying-Ming ;
Chong, Inn-Wen ;
Kuo, Po-Lin .
ONCOTARGET, 2017, 8 (62) :104831-104854
[22]   SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier [J].
Huang, Mei-Ling ;
Hung, Yung-Hsiang ;
Lee, W. M. ;
Li, R. K. ;
Jiang, Bo-Ru .
SCIENTIFIC WORLD JOURNAL, 2014,
[23]   Applications of Support Vector Machine (SVM) Learning in Cancer Genomics [J].
Huang, Shujun ;
Cai, Nianguang ;
Pacheco, Pedro Penzuti ;
Narandes, Shavira ;
Wang, Yang ;
Xu, Wayne .
CANCER GENOMICS & PROTEOMICS, 2018, 15 (01) :41-51
[24]   Feature clustering based support vector machine recursive feature elimination for gene selection [J].
Huang, Xiaojuan ;
Zhang, Li ;
Wang, Bangjun ;
Li, Fanzhang ;
Zhang, Zhao .
APPLIED INTELLIGENCE, 2018, 48 (03) :594-607
[25]   Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies [J].
Jeanmougin, Marine ;
de Reynies, Aurelien ;
Marisa, Laetitia ;
Paccard, Caroline ;
Nuel, Gregory ;
Guedj, Mickael .
PLOS ONE, 2010, 5 (09) :1-9
[26]   SMAD6 Contributes to Patient Survival in Non-Small Cell Lung Cancer and Its Knockdown Reestablishes TGF-β Homeostasis in Lung Cancer Cells [J].
Jeon, Hyo-Sung ;
Dracheva, Tatiana ;
Yang, Sei-Hoon ;
Meerzaman, Daoud ;
Fukuoka, Junya ;
Shakoori, Abbas ;
Shilo, Konstantin ;
Travis, William D. ;
Jen, Jin .
CANCER RESEARCH, 2008, 68 (23) :9686-9692
[27]   miR-22 enhances the radiosensitivity of small-cell lung cancer by targeting the WRNIP1 [J].
Jiang, Wenhua ;
Han, Xuemei ;
Wang, Jingrui ;
Wang, Lin ;
Xu, Zanmei ;
Wei, Qiao ;
Zhang, Wenpan ;
Wang, Haitao .
JOURNAL OF CELLULAR BIOCHEMISTRY, 2019, 120 (10) :17650-17661
[28]   Biological classification with RNA-seq data: Can alternatively spliced transcript expression enhance machine learning classifiers? [J].
Johnson, Nathan T. ;
Dhroso, Andi ;
Hughes, Katelyn J. ;
Korkin, Dmitry .
RNA, 2018, 24 (09) :1119-1132
[29]   Survey of Methods Used for Differential Expression Analysis on RNA Seq Data [J].
Joshi, Reema ;
Sarmah, Rosy .
BIOLOGICALLY INSPIRED TECHNIQUES IN MANY-CRITERIA DECISION MAKING, 2020, 10 :226-239
[30]   Characterization of TNNC1 as a Novel Tumor Suppressor of Lung Adenocarcinoma [J].
Kim, Suyeon ;
Kim, Jaewon ;
Jung, Yeonjoo ;
Jun, Yukyung ;
Jung, Yeonhwa ;
Lee, Hee-Young ;
Keum, Juhee ;
Park, Byung Jo ;
Lee, Jinseon ;
Kim, Jhingook ;
Lee, Sanghyuk ;
Kim, Jaesang .
MOLECULES AND CELLS, 2020, 43 (07) :619-631