Automated extraction and semantic analysis of mutation impacts from the biomedical literature

被引:23
作者
Naderi, Nona [1 ]
Witte, Rene [1 ]
机构
[1] Concordia Univ, Semant Software Lab, Dept Comp Sci & Software Engn, Montreal, PQ H2K 3V4, Canada
来源
BMC GENOMICS | 2012年 / 13卷
基金
加拿大自然科学与工程研究理事会;
关键词
SITE-DIRECTED MUTAGENESIS; L-XYLULOSE REDUCTASE; DEHYDROGENASE; IDENTIFICATION; INFORMATION; RESIDUES; ENZYMES; TEXT;
D O I
10.1186/1471-2164-13-S4-S10
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Mutations as sources of evolution have long been the focus of attention in the biomedical literature. Accessing the mutational information and their impacts on protein properties facilitates research in various domains, such as enzymology and pharmacology. However, manually curating the rich and fast growing repository of biomedical literature is expensive and time-consuming. As a solution, text mining approaches have increasingly been deployed in the biomedical domain. While the detection of single-point mutations is well covered by existing systems, challenges still exist in grounding impacts to their respective mutations and recognizing the affected protein properties, in particular kinetic and stability properties together with physical quantities. Results: We present an ontology model for mutation impacts, together with a comprehensive text mining system for extracting and analysing mutation impact information from full-text articles. Organisms, as sources of proteins, are extracted to help disambiguation of genes and proteins. Our system then detects mutation series to correctly ground detected impacts using novel heuristics. It also extracts the affected protein properties, in particular kinetic and stability properties, as well as the magnitude of the effects and validates these relations against the domain ontology. The output of our system can be provided in various formats, in particular by populating an OWL-DL ontology, which can then be queried to provide structured information. The performance of the system is evaluated on our manually annotated corpora. In the impact detection task, our system achieves a precision of 70.4%-71.1%, a recall of 71.3%-71.5%, and grounds the detected impacts with an accuracy of 76.5%-77%. The developed system, including resources, evaluation data and end-user and developer documentation is freely available under an open source license at http://www.semanticsoftware.info/open-mutation-miner. Conclusion: We present Open Mutation Miner (OMM), the first comprehensive, fully open-source approach to automatically extract impacts and related relevant information from the biomedical literature. We assessed the performance of our work on manually annotated corpora and the results show the reliability of our approach. The representation of the extracted information into a structured format facilitates knowledge management and aids in database curation and correction. Furthermore, access to the analysis results is provided through multiple interfaces, including web services for automated data integration and desktop-based solutions for end user interactions.
引用
收藏
页数:17
相关论文
共 31 条
  • [11] KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes
    Heinen, Stephanie
    Thielen, Bernhard
    Schomburg, Dietmar
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [12] Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors
    Horn, F
    Lau, AL
    Cohen, FE
    [J]. BIOINFORMATICS, 2004, 20 (04) : 557 - 568
  • [13] Stabilization of quaternary structure of water-soluble quinoprotein glucose dehydrogenase
    Igarashi, S
    Sode, K
    [J]. MOLECULAR BIOTECHNOLOGY, 2003, 24 (02) : 97 - 103
  • [14] Identification of amino acid residues involved in substrate recognition of L-xylulose reductase by site-directed mutagenesis
    Ishikura, S
    Isaji, T
    Usami, N
    Nakagawa, J
    El-Kabbani, O
    Hara, A
    [J]. CHEMICO-BIOLOGICAL INTERACTIONS, 2003, 143 : 543 - 550
  • [15] Kanagasabai Rajaraman, 2007, J Bioinform Comput Biol, V5, P1319, DOI 10.1142/S0219720007003119
  • [16] Algorithms and semantic infrastructure for mutation impact extraction and grounding
    Laurila, Jonas B.
    Naderi, Nona
    Witte, Rene
    Riazanov, Alexandre
    Kouznetsov, Alexandre
    Baker, Christopher J. O.
    [J]. BMC GENOMICS, 2010, 11
  • [17] Automatic extraction of protein point mutations using a Graph Bigram association
    Lee, Lawrence C.
    Horn, Florence
    Cohen, Fred E.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (02) : 184 - 198
  • [18] INCREASE IN CATALYTIC ACTIVITY AND THERMOSTABILITY OF THE XYLANASE-A OF STREPTOMYCES-LIVIDANS-1326 BY SITE-SPECIFIC MUTAGENESIS
    MOREAU, A
    SHARECK, F
    KLUEPFEL, D
    MOROSOLI, R
    [J]. ENZYME AND MICROBIAL TECHNOLOGY, 1994, 16 (05) : 420 - 424
  • [19] NADERI N, 2010, P ACL WORKSH BIOM NA, P128
  • [20] Naderi N, 2011, THESIS CONCORDIA U