GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning

被引:10
作者
Ersoz, Nur Sebnem [1 ]
Bakir-Gungor, Burcu [2 ,3 ]
Yousef, Malik [4 ,5 ]
机构
[1] Abdullah Gul Univ, Grad Sch Engn & Sci, Dept Bioengn, Kayseri, Turkiye
[2] Abdullah Gul Univ, Fac Engn, Dept Comp Engn, Kayseri, Turkiye
[3] Abdullah Gul Univ, Fac Life & Nat Sci, Dept Bioengn, Kayseri, Turkiye
[4] Zefat Acad Coll, Dept Informat Syst, Safed, Israel
[5] Zefat Acad Coll, Galilee Digital Hlth Res Ctr GDH, Safed, Israel
关键词
gene ontology; gene expression data analysis; machine learning; feature selection; enrichment analysis; feature scoring; feature grouping; classification; FEATURE-SELECTION; CANCER; SURVIVAL; ANILLIN;
D O I
10.3389/fgene.2023.1139082
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Introduction: Identifying significant sets of genes that are up/downregulated under specific conditions is vital to understand disease development mechanisms at the molecular level. Along this line, in order to analyze transcriptomic data, several computational feature selection (i.e., gene selection) methods have been proposed. On the other hand, uncovering the core functions of the selected genes provides a deep understanding of diseases. In order to address this problem, biological domain knowledge-based feature selection methods have been proposed. Unlike computational gene selection approaches, these domain knowledge-based methods take the underlying biology into account and integrate knowledge from external biological resources. Gene Ontology (GO) is one such biological resource that provides ontology terms for defining the molecular function, cellular component, and biological process of the gene product.Methods: In this study, we developed a tool named GeNetOntology which performs GO-based feature selection for gene expression data analysis. In the proposed approach, the process of Grouping, Scoring, and Modeling (G-S-M) is used to identify significant GO terms. GO information has been used as the grouping information, which has been embedded into a machine learning (ML) algorithm to select informative ontology terms. The genes annotated with the selected ontology terms have been used in the training part to carry out the classification task of the ML model. The output is an important set of ontologies for the two-class classification task applied to gene expression data for a given phenotype.Results: Our approach has been tested on 11 different gene expression datasets, and the results showed that GeNetOntology successfully identified important disease-related ontology terms to be used in the classification model.Discussion: GeNetOntology will assist geneticists and scientists to identify a range of disease-related genes and ontologies in transcriptomic data analysis, and it will also help doctors design diagnosis platforms and improve patient treatment plans.
引用
收藏
页数:19
相关论文
共 96 条
[51]   Role of RhoC in cancer cell migration [J].
Lou, Yingyue ;
Jiang, Yuhan ;
Liang, Zhen ;
Liu, Bingzhang ;
Li, Tian ;
Zhang, Duo .
CANCER CELL INTERNATIONAL, 2021, 21 (01)
[52]   Control of Epithelial Cell Migration and Invasion by the IKKβ- and CK1α-Mediated Degradation of RAPGEF2 [J].
Magliozzi, Roberto ;
Low, Teck Yew ;
Weijts, Bart G. M. W. ;
Cheng, Tianhong ;
Spanjaard, Emma ;
Mohammed, Shabaz ;
van Veen, Anouk ;
Ovaa, Huib ;
de Rooij, Johan ;
Zwartkruis, Fried J. T. ;
Bos, Johannes L. ;
de Bruin, Alain ;
Heck, Albert J. R. ;
Guardavaccaro, Daniele .
DEVELOPMENTAL CELL, 2013, 27 (05) :574-585
[53]  
MedlinePlus Genetics, 2022, MedlinePlus genetics
[54]   PTEN: multiple functions in human malignant tumors [J].
Milella, Michele ;
Falcone, Italia ;
Conciatori, Fabiana ;
Incani, Ursula Cesta ;
Del Curatolo, Anais ;
Inzerilli, Nicola ;
Nuzzo, Carmen M. A. ;
Vaccaro, Vanja ;
Vari, Sabrina ;
Cognetti, Francesco ;
Ciuffreda, Ludovica .
FRONTIERS IN ONCOLOGY, 2015, 5
[55]   Chromatin remodeling in Cancer: A Gateway to regulate gene Transcription [J].
Nair, Sujit S. ;
Kumar, Rakesh .
MOLECULAR ONCOLOGY, 2012, 6 (06) :611-619
[56]   NBL1 and anillin (ANLN) genes over-expression in pancreatic carcinoma [J].
Olakowski, Marek ;
Tyszkiewicz, Tomasz ;
Jarzab, Michal ;
Krol, Robert ;
Oczko-Wojciechowska, Malgorzata ;
Kowalska, Malgorzata ;
Kowal, Monika ;
Gala, Grzegorz M. ;
Kajor, Maciej ;
Lange, Dariusz ;
Chmielik, Ewa ;
Gubala, Elzbieta ;
Lampe, Pawe ;
Jarzab, Barbara .
FOLIA HISTOCHEMICA ET CYTOBIOLOGICA, 2009, 47 (02) :249-255
[57]  
Pedregosa F, 2011, J MACH LEARN RES, V12, P2825
[58]   Integrative biomarker detection on high-dimensional gene expression data sets: a survey on prior knowledge approaches [J].
Perscheid, Cindy .
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
[59]   Integrative Gene Selection on Gene Expression Data: Providing Biological Context to Traditional Approaches [J].
Perscheid, Cindy ;
Grasnick, Bastien ;
Uflacker, Matthias .
JOURNAL OF INTEGRATIVE BIOINFORMATICS, 2019, 16 (01)
[60]   In vitro Cell Migration, Invasion, and Adhesion Assays: From Cell Imaging to Data Analysis [J].
Pijuan, Jordi ;
Barcelo, Carla ;
Moreno, David F. ;
Maiques, Oscar ;
Siso, Pol ;
Marti, Rosa M. ;
Macia, Anna ;
Panosa, Anais .
FRONTIERS IN CELL AND DEVELOPMENTAL BIOLOGY, 2019, 7