Gene-Network-Based Feature Set (GNFS) for Expression-Based Cancer Classification

被引:3
作者
Doungpan, Narumol [1 ]
Engchuan, Worrawat [2 ]
Meechai, Asawin [3 ]
Fong, Simon [4 ]
Chan, Jonathan H. [2 ]
机构
[1] King Mongkuts Univ Technol Thonburi, Fac Engn, Biol Engn Program, Bangkok 10140, Thailand
[2] King Mongkuts Univ Technol Thonburi, Sch Informat Technol, Data Sci & Engn Lab, Bangkok 10140, Thailand
[3] King Mongkuts Univ Technol Thonburi, Fac Engn, Dept Chem Engn, Bangkok 10140, Thailand
[4] Univ Macau, Fac Sci & Technol, Dept Comp & Informat Sci, Macau, Peoples R China
关键词
Gene Expression Analysis; Gene Set; Gene Network; Classification; Feature Selection; Breast Cancer; Lung Cancer; Colorectal Cancer; MICROARRAY DATA; SELECTION; DIAGNOSIS; SUBNETWORKS; CARCINOMAS; PREDICTION; SIGNATURE; RELEVANCE; PATHWAYS; MARKERS;
D O I
10.1166/jmihi.2016.1806
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Identification of cancer biomarker using gene expression data is a challenging task. Many strategies have been proposed to identify signature genes for distinguishing cancer from normal cells. Recently, ANOVA-based Feature Set (AFS) has been used to successfully identify the gene sets as markers from multiclass gene expression data. Nevertheless, AFS does not take network data into consideration, resulting in gene-set markers that may not be functionally related to the cancer. Thus, in this work, a gene-set-based biomarker identification method termed Gene-Network-based Feature Set (GNFS) is proposed by integrating gene-set topology derived from expression data with network data. For each gene-set, GNFS identifies a subnetwork as a marker by superimposing those genes onto the network obtained from pathway data and gene-gene relationship, and applying greedy search to identify gene subnetworks. Then, the representative level of each gene-set or gene-set activity is calculated based on the best subnetwork and utilized in cancer classification to evaluate the potentiality of the identified markers. In a comparative study, the classification performance of GNFS is benchmarked against two existing methods, i.e., AFS and Paired Fuzzy SNet (PFSNet). Besides, the identified markers are validated using the online text-mining tool HugeNavigator. The results show that the use of GNFS provides more biologically significant markers while maintaining comparable classification performance.
引用
收藏
页码:1093 / 1101
页数:9
相关论文
共 51 条
[1]   Pulmonary Nodules Detection and Classification Using Hybrid Features from Computerized Tomographic Images [J].
Akram, Sheeraz ;
Javed, Muhammad Younus ;
Akram, M. Usman ;
Qamar, Usman ;
Hassan, Ali .
JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2016, 6 (01) :252-259
[2]  
[Anonymous], 2014, Cancer Facts and Figures 2014
[3]   Emergence of scaling in random networks [J].
Barabási, AL ;
Albert, R .
SCIENCE, 1999, 286 (5439) :509-512
[4]   Masses Classification Using Discrete Cosine Transform and Wavelet-Based Directional Filter Bank for Breast Cancer Diagnosis [J].
Berber, Mohammed A. ;
Alqahtani, Awatif ;
Hussain, Muhammad .
JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2016, 6 (01) :117-124
[5]  
Bouckaert RR, 2004, LECT NOTES ARTIF INT, V3056, P3
[6]  
Chan JH, 2011, 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), P3293, DOI 10.1109/IJCNN.2011.6033658
[7]   Identifying protein interaction subnetworks by a bagging Markov random field-based method [J].
Chen, Li ;
Xuan, Jianhua ;
Riggins, Rebecca B. ;
Wang, Yue ;
Clarke, Robert .
NUCLEIC ACIDS RESEARCH, 2013, 41 (02) :e42
[8]   Identifying cancer biomarkers by network-constrained support vector machines [J].
Chen, Li ;
Xuan, Jianhua ;
Riggins, Rebecca B. ;
Clarke, Robert ;
Wang, Yue .
BMC SYSTEMS BIOLOGY, 2011, 5
[9]   Network-based classification of breast cancer metastasis [J].
Chuang, Han-Yu ;
Lee, Eunjung ;
Liu, Yu-Tsueng ;
Lee, Doheon ;
Ideker, Trey .
MOLECULAR SYSTEMS BIOLOGY, 2007, 3 (1)
[10]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411