BioReader: a text mining tool for performing classification of biomedical literature

被引:46
|
作者
Simon, Christian [1 ]
Davidsen, Kristian [2 ]
Hansen, Christina [2 ]
Seymour, Emily [3 ]
Barnkob, Mike Bogetofte [4 ]
Olsen, Lars Ronn [2 ]
机构
[1] Univ Copenhagen, Novo Nordisk Ctr Prot Res, Dis Syst Biol, DK-2200 Copenhagen, Denmark
[2] Tech Univ Denmark, Dept Hlth Technol, DK-2800 Lyngby, Denmark
[3] La Jolla Inst Allergy & Immunol, La Jolla, CA 92037 USA
[4] Univ Oxford, Radcliffe Dept Med, Weatherall Inst Mol Med, MRC Human Immunol Unit, Oxford OX3 9DU, England
关键词
Database curation; Text mining; Machine learning; Biological databases; Literature survey; PubMed; Document classification; IMMUNE EPITOPE DATABASE;
D O I
10.1186/s12859-019-2607-x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundScientific data and research results are being published at an unprecedented rate. Many database curators and researchers utilize data and information from the primary literature to populate databases, form hypotheses, or as the basis for analyses or validation of results. These efforts largely rely on manual literature surveys for collection of these data, and while querying the vast amounts of literature using keywords is enabled by repositories such as PubMed, filtering relevant articles from such query results can be a non-trivial and highly time consuming task.ResultsWe here present a tool that enables users to perform classification of scientific literature by text mining-based classification of article abstracts. BioReader (Biomedical Research Article Distiller) is trained by uploading article corpora for two training categories - e.g. one positive and one negative for content of interest - as well as one corpus of abstracts to be classified and/or a search string to query PubMed for articles. The corpora are submitted as lists of PubMed IDs and the abstracts are automatically downloaded from PubMed, preprocessed, and the unclassified corpus is classified using the best performing classification algorithm out of ten implemented algorithms.ConclusionBioReader supports data and information collection by implementing text mining-based classification of primary biomedical literature in a web interface, thus enabling curators and researchers to take advantage of the vast amounts of data and information in the published literature. BioReader outperforms existing tools with similar functionalities and expands the features used for mining literature in database curation efforts. The tool is freely available as a web service at http://www.cbs.dtu.dk/services/BioReader
引用
收藏
页数:6
相关论文
共 50 条
  • [1] BioReader: a text mining tool for performing classification of biomedical literature
    Christian Simon
    Kristian Davidsen
    Christina Hansen
    Emily Seymour
    Mike Bogetofte Barnkob
    Lars Rønn Olsen
    BMC Bioinformatics, 19
  • [2] Biomedical literature mining for text classification and construction of gene networks
    Antonakaki, Despoina
    Kanterakis, Alexandros
    Potamias, George
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 3955 : 469 - 473
  • [3] Text mining the biomedical literature
    Pertsemlidis, A
    BIOPHYSICAL JOURNAL, 2002, 82 (01) : 168A - 168A
  • [4] BioClass: A Tool for Biomedical Text Classification
    Romero, R.
    Seara Vieira, A.
    Iglesias, E. L.
    Borrajo, L.
    8TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS (PACBB 2014), 2014, 294 : 243 - 251
  • [5] Tools for Text Mining over Biomedical Literature
    Rinaldi, Fabio
    Schneider, Gerold
    Kaljurand, Kaarel
    Hess, Michael
    ECAI 2006, PROCEEDINGS, 2006, 141 : 825 - +
  • [6] Text Mining for Discovering Implicit Relationships in Biomedical Literature
    Petric, Ingrid
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2010, 34 (02): : 261 - 262
  • [7] Large-Scale Text Mining of Biomedical Literature
    Ginter, Filip
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2013, (116): : 43 - 44
  • [8] TMT-HCC: A tool for text mining the biomedical literature for hepatocellular carcinoma (HCC) biomarkers identification
    Seoud, Rania A. Abul
    Mabrouk, Mai S.
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2013, 112 (03) : 640 - 648
  • [9] Text mining biomedical literature for constructing gene regulatory networks
    Yong-Ling Song
    Su-Shing Chen
    Interdisciplinary Sciences: Computational Life Sciences, 2009, 1 : 179 - 186
  • [10] Terminological resources for text mining over biomedical scientific literature
    Rinaldi, Fabio
    Kaljurand, Kaarel
    Saetre, Rune
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2011, 52 (02) : 107 - 114