Integration and publication of heterogeneous text-mined relationships on the Semantic Web

被引:23
作者
Coulet A. [1 ,2 ,3 ]
Garten Y. [2 ,3 ]
Dumontier M. [4 ]
Altman R.B. [2 ,3 ,5 ]
Musen M.A. [2 ]
Shah N.H. [2 ]
机构
[1] Campus Scientifique, LORIA - INRIA Nancy - Grand-Est, Vandoeuvre-lès-Nancy Cedex
[2] Stanford University, Department of Medicine, 300 Pasteur Drive, Mail Code 5110, Stanford, 94305, CA
[3] Stanford University, Department of Genetics, Mail Code 5120, Stanford, 94305, CA
[4] Carleton University, Department of Biology, 1125 Colonel By Drive, Ottawa, K1S5B6, ON
[5] Stanford University, Department of Bioengineering, 318 Campus Drive, Mail Code 5444, Stanford, 94305, CA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Warfarin; Dependency Graph; Relationship Type; Link Open Data; MEDLINE Abstract;
D O I
10.1186/2041-1480-2-S2-S10
中图分类号
学科分类号
摘要
Background: Advances in Natural Language Processing (NLP) techniques enable the extraction of fine-grained relationships mentioned in biomedical text. The variability and the complexity of natural language in expressing similar relationships causes the extracted relationships to be highly heterogeneous, which makes the construction of knowledge bases difficult and poses a challenge in using these for data mining or question answering. Results: We report on the semi-automatic construction of the PHARE relationship ontology (the PHArmacogenomic RElationships Ontology) consisting of 200 curated relations from over 40,000 heterogeneous relationships extracted via text-mining. These heterogeneous relations are then mapped to the PHARE ontology using synonyms, entity descriptions and hierarchies of entities and roles. Once mapped, relationships can be normalized and compared using the structure of the ontology to identify relationships that have similar semantics but different syntax. We compare and contrast the manual procedure with a fully automated approach using WordNet to quantify the degree of integration enabled by iterative curation and refinement of the PHARE ontology. The result of such integration is a repository of normalized biomedical relationships, named PHARE-KB, which can be queried using Semantic Web technologies such as SPARQL and can be visualized in the form of a biological network. Conclusions: The PHARE ontology serves as a common semantic framework to integrate more than 40,000 relationships pertinent to pharmacogenomics. The PHARE ontology forms the foundation of a knowledge base named PHARE-KB. Once populated with relationships, PHARE-KB (i) can be visualized in the form of a biological network to guide human tasks such as database curation and (ii) can be queried programmatically to guide bioinformatics applications such as the prediction of molecular interactions. PHARE is available at http://purl.bioontology.org/ontology/PHARE. © 2011 Coulet et al; licensee BioMed Central Ltd.
引用
收藏
相关论文
共 18 条
[1]  
Groth P., Gibson A., Velterop J., The anatomy of a nanopublication, Information Services and Use, 30, 1-2, pp. 51-56, (2010)
[2]  
Klein T., Chang J., Cho M., Easton K., Fergerson K., Hewett M., Lin Z., Liu Y., Liu S., Oliver D., Rubin D., Shafa F., Stuart J., Altman R.B., Integrating genotype and phenotype information: An overview of the PharmGKB project, The Pharmacogenomics Journal, 1, 3, pp. 167-170, (2001)
[3]  
Garten Y., Coulet A., Altman R., Recent progress in automatically extracting information from the pharmacogenomic literature, Pharmacogenomics, 11, 10, pp. 1467-1489, (2010)
[4]  
Hunter L., Lu Z., Firby J., Baumgartner W.A., Johnson H.L., Ogren P., Cohen K., OpenDMAP: An open-source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression, BMC Bioinformatics, 9, (2008)
[5]  
Friedman C., Kra P., Yu H., Krauthammer M., Rzhetsky A., Genies: a natural-language processing system for the extraction of molecular pathways from journal articles, Bioinformatics, 17, pp. S74-S82, (2001)
[6]  
Saric J., Jensen L.J., Ouzounova R., Rojas I., Bork P., Extraction of regulatory gene/protein networks from medline, Bioinformatics, 22, 6, pp. 645-650, (2006)
[7]  
Ciaramita M., Gangemi A., Ratsch E., Saric J., Rojas I., Unsupervised learning of semantic relations between concepts of a molecular biology ontology, pp. 659-664, (2005)
[8]  
Ramakrishnan C., Mendes P., Wang S., Sheth A., Unsupervised Discovery of Compound Entities for Relationship Extraction, pp. 146-155, (2008)
[9]  
Tari L., Answar S., Liang S., Cai J., Baral C., Discovering drug interactions: a text-mining and reasoning approach based on properties of drug metabolism, Bioinformatics, 26, 18, pp. i547-i553, (2010)
[10]  
Manning C.D., Schutze H., Foundations of Statistical Natural Language Processing., (1999)