CogNet: classification of gene expression data based on ranked active-subnetwork- oriented KEGG pathway enrichment analysis

被引:25
作者
Yousef, Malik [1 ,2 ]
Ulgen, Ege [3 ]
Sezerman, Osman Ugur [3 ]
机构
[1] Zefat Acad Coll, Galilee Digital Hlth Res Ctr GDH, Safed, Israel
[2] Zefat Acad Coll, Dept Informat Syst, Safed, Israel
[3] Acibadem Mehmet Ali Aydinlar Univ, Sch Med, Dept Biostat & Med Informat, Istanbul, Turkey
关键词
Classification; Gene expression; Enrichment analysis; KEGG pathway; Rank; Machine learning; Bioinformatics; Data science; Data mining; Genomics; SELECTION; KNOWLEDGE;
D O I
10.7717/peerj-cs.336
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of the traditional gene selection approaches are borrowed from other fields such as statistics and computer science, However, they do not prioritize biologically relevant genes since the ultimate goal is to determine features that optimize model performance metrics not to build a biologically meaningful model. Therefore, there is an imminent need for new computational tools that integrate the biological knowledge about the data in the process of gene selection and machine learning. Integrative gene selection enables incorporation of biological domain knowledge from external biological resources. In this study, we propose a new computational approach named CogNet that is an integrative gene selection tool that exploits biological knowledge for grouping the genes for the computational modeling tasks of ranking and classification. In CogNet, the pathfindR serves as the biological grouping tool to allow the main algorithm to rank active-subnetwork-oriented KEGG pathway enrichment analysis results to build a biologically relevant model. CogNet provides a list of significant KEGG pathways that can classify the data with a very high accuracy. The list also provides the genes belonging to these pathways that are differentially expressed that are used as features in the classification problem. The list facilitates deep analysis and better interpretability of the role of KEGG pathways in classification of the data thus better establishing the biological relevance of these differentially expressed genes. Even though the main aim of our study is not to improve the accuracy of any existing tool, the performance of the CogNet outperforms a similar approach called maTE while obtaining similar performance compared to other similar tools including SVM-RCE. CogNet was tested on 13 gene expression datasets concerning a variety of diseases.
引用
收藏
页数:20
相关论文
共 30 条
[1]   Unsupervised gene selection using biological knowledge : application in sample clustering [J].
Acharya, Sudipta ;
Saha, Sriparna ;
Nikhil, N. .
BMC BIOINFORMATICS, 2017, 18
[2]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[3]   Towards knowledge-based gene expression data mining [J].
Bellazzi, Riccardo ;
Zupan, Blaz .
JOURNAL OF BIOMEDICAL INFORMATICS, 2007, 40 (06) :787-802
[4]   KNIME:: The Konstanz Information Miner [J].
Berthold, Michael R. ;
Cebron, Nicolas ;
Dill, Fabian ;
Gabriel, Thomas R. ;
Koetter, Tobias ;
Meinl, Thorsten ;
Ohl, Peter ;
Sieb, Christoph ;
Thiel, Kilian ;
Wiswedel, Bernd .
DATA ANALYSIS, MACHINE LEARNING AND APPLICATIONS, 2008, :319-326
[5]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[6]  
Clough E, 2016, METHODS MOL BIOL, V1418, P93, DOI 10.1007/978-1-4939-3578-9_5
[7]   Mutual enrichment in aggregated ranked lists with applications to gene expression regulation [J].
Cohn-Alperovich, Dalia ;
Rabner, Alona ;
Kifer, IIona ;
Mandel-Gutfreund, Yael ;
Yakhini, Zohar .
BIOINFORMATICS, 2016, 32 (17) :464-472
[8]   Recursive Cluster Elimination Based Support Vector Machine for Disease State Prediction Using Resting State Functional and Effective Brain Connectivity [J].
Deshpande, Gopikrishna ;
Li, Zhihao ;
Santhanam, Priya ;
Coles, Claire D. ;
Lynch, Mary Ellen ;
Hamann, Stephan ;
Hu, Xiaoping .
PLOS ONE, 2010, 5 (12)
[9]   An integrative gene selection with association analysis for microarray data classification [J].
Fang, Ong Huey ;
Mustapha, Norwati ;
Sulaiman, Md. Nasir .
INTELLIGENT DATA ANALYSIS, 2014, 18 (04) :739-758
[10]   Feature clustering and ranking for selecting stable features from high dimensional remotely sensed data [J].
Harris, Dugal ;
Van Niekerk, Adriaan .
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2018, 39 (23) :8934-8949