Automated assessment of biological database assertions using the scientific literature

被引:0
|
作者
Bouadjenek, Mohamed Reda [1 ]
Zobel, Justin [2 ]
Verspoor, Karin [2 ]
机构
[1] Univ Toronto, Dept Mech & Ind Engn, Toronto, ON M5S 3G8, Canada
[2] Univ Melbourne, Sch Comp & Informat Syst, Melbourne, Vic 3010, Australia
基金
澳大利亚研究理事会;
关键词
Data Analysis; Data Quality; Biological Databases; Data Cleansing; PROTEIN; CHALLENGES; NAMES;
D O I
10.1186/s12859-019-2801-x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundThe large biological databases such as GenBank contain vast numbers of records, the content of which is substantively based on external resources, including published literature. Manual curation is used to establish whether the literature and the records are indeed consistent. We explore in this paper an automated method for assessing the consistency of biological assertions, to assist biocurators, which we call BARC, Biocuration tool for Assessment of Relation Consistency. In this method a biological assertion is represented as a relation between two objects (for example, a gene and a disease); we then use our novel set-based relevance algorithm SaBRA to retrieve pertinent literature, and apply a classifier to estimate the likelihood that this relation (assertion) is correct.ResultsOur experiments on assessing gene-disease relations and protein-protein interactions using the PubMed Central collection show that BARC can be effective at assisting curators to perform data cleansing. Specifically, the results obtained showed that BARC substantially outperforms the best baselines, with an improvement of F-measure of 3.5% and 13%, respectively, on gene-disease relations and protein-protein interactions. We have additionally carried out a feature analysis that showed that all feature types are informative, as are all fields of the documents.ConclusionsBARC provides a clear benefit for the biocuration community, as there are no prior automated tools for identifying inconsistent assertions in large-scale biological databases.
引用
收藏
页数:22
相关论文
共 26 条
  • [1] Automated assessment of biological database assertions using the scientific literature
    Mohamed Reda Bouadjenek
    Justin Zobel
    Karin Verspoor
    BMC Bioinformatics, 20
  • [2] Learning Biological Sequence Types Using the Literature
    Bouadjenek, Mohamed Reda
    Verspoor, Karin
    Zobel, Justin
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 1991 - 1994
  • [3] Automated detection of records in biological sequence databases that are inconsistent with the literature
    Bouadjenek, Mohamed Reda
    Verspoor, Karin
    Zobel, Justin
    JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 71 : 229 - 240
  • [4] Literature classification for semi-automated updating of biological knowledgebases
    Lars Rønn Olsen
    Ulrich Johan Kudahl
    Ole Winther
    Vladimir Brusic
    BMC Genomics, 14
  • [5] LLM based Biological Named Entity Recognition from Scientific Literature
    Jung, Sung Jae
    Kim, Hajung
    Jang, Kyoung Sang
    2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 433 - 435
  • [6] Competence of medicinal plant database using data mining algorithms for large biological databases
    Krishnamoorthy M.
    Karthikeyan R.
    Measurement: Sensors, 2022, 24
  • [7] On using remote user defined functions as wrappers for biological database interoperability
    Chen, LY
    Jamil, HM
    INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2003, 12 (02) : 161 - 195
  • [8] GDB:: A tool to build deductive rules using a fuzzy relational database with scientific data
    Morales, R.
    Blanco, I.
    Pons, O.
    Rodriguez, J.
    FUZZY SETS AND SYSTEMS, 2008, 159 (12) : 1577 - 1596
  • [9] Knowledge Discovery in a Facility Condition Assessment Database Using Text Clustering
    Ng, H. S.
    Toukourou, A.
    Soibelman, L.
    JOURNAL OF INFRASTRUCTURE SYSTEMS, 2006, 12 (01) : 50 - 59
  • [10] Automated Health Alerts Using In-Home Sensor Data for Embedded Health Assessment
    Skubic, Marjorie
    Guevara, Rainer Dane
    Rantz, Marilyn
    IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE, 2015, 3