@Note: A workbench for Biomedical Text Mining

被引:25
作者
Lourenco, Analia [2 ]
Carreira, Rafael [1 ,2 ]
Carneiro, Sonia [2 ]
Maia, Paulo [1 ,2 ]
Glez-Pena, Daniel [3 ]
Fdez-Riverola, Florentino [3 ]
Ferreira, Eugenio C. [2 ]
Rocha, Isabel [2 ]
Rocha, Miguel [1 ]
机构
[1] Univ Minho, Dept Informat CCTC, P-4710057 Braga, Portugal
[2] Univ Minho, Ctr Biol Engn, IBB, P-4710057 Braga, Portugal
[3] Univ Vigo, Escuela Super Ingn Informat, Dept Informat, Orense 32004, Spain
关键词
Biomedical Text Mining; Named Entity Recognition; Information Retrieval; Information Extraction; Literature curation; Semantic annotation; Component-based software development; OF-SPEECH TAGGER; INFORMATION EXTRACTION; GENE; PERFORMANCE; DOCUMENTS;
D O I
10.1016/j.jbi.2009.04.002
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature. However, most efforts have addressed the benchmarking of new algorithms rather than user operational needs. Bridging the gap between BioTM researchers and biologists' needs is crucial to solve real-world problems and promote further research. We present @Note, a platform for BioTM that aims at the effective translation of the advances between three distinct classes of users: biologists, text miners and software developers. Its main functional contributions are the ability to process abstracts and full-texts; an information retrieval module enabling PubMed search and journal crawling; a pre-processing module with PDF-to-text conversion, tokenisation and stopword removal; a semantic annotation schema: a lexicon-based annotator: a user-friendly annotation view that allows to correct annotations and a Text Mining Module supporting dataset preparation and algorithm evaluation. @Note improves the interoperability, modularity and flexibility when integrating in-home and open-source third-party components. Its component-based architecture allows the rapid development of new applications, emphasizing the principles of transparency and simplicity of use. Although it is still on-going, it has already allowed the development of applications that are currently being used. (C) 2009 Elsevier Inc. All rights reserved.
引用
收藏
页码:710 / 720
页数:11
相关论文
共 51 条
[1]   Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks [J].
Abi-Haidar, Alaa ;
Kaur, Jasleen ;
Maguitman, Ana ;
Radivojac, Predrag ;
Rechtsteiner, Andreas ;
Verspoor, Karin ;
Wang, Zhiping ;
Rocha, Luis M. .
GENOME BIOLOGY, 2008, 9
[2]   Summarization from medical documents: a survey [J].
Afantenos, S ;
Karkaletsis, V ;
Stamatopoulos, P .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2005, 33 (02) :157-177
[3]   Text mining and its potential applications in systems biology [J].
Ananiadou, Sophia ;
Kell, Douglas B. ;
Tsujii, Jun-ichi .
TRENDS IN BIOTECHNOLOGY, 2006, 24 (12) :571-579
[4]  
[Anonymous], Data Mining Practical Machine Learning Tools and Techniques with Java
[5]  
[Anonymous], 2004, ARTIFICIAL INTELLIGE
[6]   Extracting and characterizing gene-drug relationships from the literature [J].
Chang, JT ;
Altman, RB .
PHARMACOGENETICS, 2004, 14 (09) :577-586
[7]  
Chaussabel D, 2002, GENOME BIOL, V3
[8]   Automatic document classification of biological literature [J].
Chen, David ;
Muller, Hans-Michael ;
Sternberg, Paul W. .
BMC BIOINFORMATICS, 2006, 7 (1)
[9]   Content-rich biological network constructed by mining PubMed abstracts [J].
Chen, H ;
Sharp, BM .
BMC BIOINFORMATICS, 2004, 5 (1)
[10]   Gene name ambiguity of eukaryotic nomenclatures [J].
Chen, LF ;
Liu, HF ;
Friedman, C .
BIOINFORMATICS, 2005, 21 (02) :248-256