Automated assessment of biological database assertions using the scientific literature

被引:0
|
作者
Bouadjenek, Mohamed Reda [1 ]
Zobel, Justin [2 ]
Verspoor, Karin [2 ]
机构
[1] Univ Toronto, Dept Mech & Ind Engn, Toronto, ON M5S 3G8, Canada
[2] Univ Melbourne, Sch Comp & Informat Syst, Melbourne, Vic 3010, Australia
基金
澳大利亚研究理事会;
关键词
Data Analysis; Data Quality; Biological Databases; Data Cleansing; PROTEIN; CHALLENGES; NAMES;
D O I
10.1186/s12859-019-2801-x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundThe large biological databases such as GenBank contain vast numbers of records, the content of which is substantively based on external resources, including published literature. Manual curation is used to establish whether the literature and the records are indeed consistent. We explore in this paper an automated method for assessing the consistency of biological assertions, to assist biocurators, which we call BARC, Biocuration tool for Assessment of Relation Consistency. In this method a biological assertion is represented as a relation between two objects (for example, a gene and a disease); we then use our novel set-based relevance algorithm SaBRA to retrieve pertinent literature, and apply a classifier to estimate the likelihood that this relation (assertion) is correct.ResultsOur experiments on assessing gene-disease relations and protein-protein interactions using the PubMed Central collection show that BARC can be effective at assisting curators to perform data cleansing. Specifically, the results obtained showed that BARC substantially outperforms the best baselines, with an improvement of F-measure of 3.5% and 13%, respectively, on gene-disease relations and protein-protein interactions. We have additionally carried out a feature analysis that showed that all feature types are informative, as are all fields of the documents.ConclusionsBARC provides a clear benefit for the biocuration community, as there are no prior automated tools for identifying inconsistent assertions in large-scale biological databases.
引用
收藏
页数:22
相关论文
共 26 条
  • [21] Assessment of the Feasibility of automated, real-time clinical decision support in the emergency department using electronic health record data
    Perry, Warren M.
    Hossain, Rubayet
    Taylor, Richard A.
    BMC EMERGENCY MEDICINE, 2018, 18
  • [22] Assessment of the Feasibility of automated, real-time clinical decision support in the emergency department using electronic health record data
    Warren M. Perry
    Rubayet Hossain
    Richard A. Taylor
    BMC Emergency Medicine, 18
  • [23] Systematic literature review of life cycle sustainability assessment system for residential buildings: using bibliometric analysis 2000-2020
    Bhyan, Parul
    Shrivastava, Bhavna
    Kumar, Nand
    ENVIRONMENT DEVELOPMENT AND SUSTAINABILITY, 2023, 25 (12) : 13637 - 13665
  • [24] Assessment of dietary sodium intake using a food frequency questionnaire and 24-hour urinary sodium excretion: a systematic literature review
    McLean, Rachael M.
    Farmer, Victoria L.
    Nettleton, Alice
    Cameron, Claire M.
    Cook, Nancy R.
    Campbell, Norman R. C.
    JOURNAL OF CLINICAL HYPERTENSION, 2017, 19 (12) : 1214 - 1230
  • [26] Development of a semi-automated MHC-associated peptide proteomics (MAPPs) method using streptavidin bead-based immunoaffinity capture and nano LC-MS/MS to support immunogenicity risk assessment in drug development
    Lee, M. Violet
    Saad, Ola M.
    Wong, Sylvia
    LaMar, Jason
    Kamen, Lynn
    Ordonia, Ben
    Melendez, Rachel
    Hassanzadeh, Azadeh
    Chung, Shan
    Kaur, Surinder
    FRONTIERS IN IMMUNOLOGY, 2023, 14