Ontology based text mining of gene-phenotype associations: application to candidate gene prediction

被引:6
作者
Kafkas, Senay [1 ]
Hoehndorf, Robert [1 ]
机构
[1] King Abdullah Univ Sci & Technol, Computat Biosci Res Ctr, Comp Elect & Math Sci & Engn Div, Thuwal 23955, Saudi Arabia
来源
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2019年
关键词
GENOME; SIMILARITY; MOUSE; TOOL;
D O I
10.1093/database/baz019
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Gene-phenotype associations play an important role in understanding the disease mechanisms which is a requirement for treatment development. A portion of gene-phenotype associations are observed mainly experimentally and made publicly available through several standard resources such as MGI. However, there is still a vast amount of gene-phenotype associations buried in the biomedical literature. Given the large amount of literature data, we need automated text mining tools to alleviate the burden in manual curation of gene-phenotype associations and to develop comprehensive resources. In this study, we present an ontology-based approach in combination with statistical methods to text mine gene-phenotype associations from the literature. Our method achieved AUC values of 0.90 and 0.75 in recovering known gene-phenotype associations from HPO and MGI respectively. We posit that candidate genes and their relevant diseases should be expressed with similar phenotypes in publications. Thus, we demonstrate the utility of our approach by predicting disease candidate genes based on the semantic similarities of phenotypes associated with genes and diseases. To the best of our knowledge, this is the first study using an ontology based approach to extract gene-phenotype associations from the literature. We evaluated our disease candidate prediction model on the gene-disease associations from MGI. Our model achieved AUC values of 0.90 and 0.87 on OMIM (human) and MGI (mouse) datasets of gene-disease associations respectively. Our manual analysis on the text mined data revealed that our method can accurately extract gene-phenotype associations which are not currently covered by the existing public gene-phenotype resources. Overall, results indicate that our method can precisely extract known as well as new gene-phenotype associations from literature. All the data and methods are available at https://github.com/bio-ontology-research-group/genepheno.
引用
收藏
页数:9
相关论文
共 38 条
  • [1] Integrating phenotype ontologies with PhenomeNET
    Angel Rodriguez-Garcia, Miguel
    Gkoutos, Georgios V.
    Schofield, Paul N.
    Hoehndorf, Robert
    [J]. JOURNAL OF BIOMEDICAL SEMANTICS, 2017, 8
  • [2] Arkasosy B., 2013, THESIS
  • [3] UniProt: the universal protein knowledgebase
    Bateman, Alex
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Alpi, Emanuele
    Antunes, Ricardo
    Bely, Benoit
    Bingley, Mark
    Bonilla, Carlos
    Britto, Ramona
    Bursteinas, Borisas
    Bye-A-Jee, Hema
    Cowley, Andrew
    Da Silva, Alan
    De Giorgi, Maurizio
    Dogan, Tunca
    Fazzini, Francesco
    Castro, Leyla Garcia
    Figueira, Luis
    Garmiri, Penelope
    Georghiou, George
    Gonzalez, Daniel
    Hatton-Ellis, Emma
    Li, Weizhong
    Liu, Wudong
    Lopez, Rodrigo
    Luo, Jie
    Lussi, Yvonne
    MacDougall, Alistair
    Nightingale, Andrew
    Palka, Barbara
    Pichler, Klemens
    Poggioli, Diego
    Pundir, Sangya
    Pureza, Luis
    Qi, Guoying
    Rosanoff, Steven
    Saidi, Rabie
    Sawford, Tony
    Shypitsyna, Aleksandra
    Speretta, Elena
    Turner, Edward
    Tyagi, Nidhi
    Volynkin, Vladimir
    Wardell, Tony
    Warner, Kate
    Watkins, Xavier
    Zaru, Rossana
    Zellner, Hermann
    Xenarios, Ioannis
    [J]. NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) : D158 - D169
  • [4] Bordag S, 2008, LECT NOTES COMPUT SC, V4919, P52, DOI 10.1007/978-3-540-78135-6_5
  • [5] Semantic prioritization of novel causative genomic variants
    Boudellioua, Imane
    Razali, Rozaimi B. Mahamad
    Kulmanov, Maxat
    Hashish, Yasmeen
    Bajic, Vladimir B.
    Goncalves-Serra, Eva
    Schoenmakers, Nadia
    Gkoutos, Georgios V.
    Schofield, Paul N.
    Hoehndorf, Robert
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (04)
  • [6] Church K.W., 1990, WORD ASS NORMS MUTUA, V16, P22
  • [7] An introduction to ROC analysis
    Fawcett, Tom
    [J]. PATTERN RECOGNITION LETTERS, 2006, 27 (08) : 861 - 874
  • [8] The anatomy of phenotype ontologies: principles, properties and applications
    Gkoutos, Georgios V.
    Schofield, Paul N.
    Hoehndorf, Robert
    [J]. BRIEFINGS IN BIOINFORMATICS, 2018, 19 (05) : 1008 - 1021
  • [9] Using ontologies to describe mouse phenotypes
    Georgios V Gkoutos
    Eain CJ Green
    Ann-Marie Mallon
    John M Hancock
    Duncan Davidson
    [J]. Genome Biology, 6 (1)
  • [10] Hamosh A, 2000, HUM MUTAT, V15, P57, DOI 10.1002/(SICI)1098-1004(200001)15:1<57::AID-HUMU12>3.0.CO