ChemSpot: a hybrid system for chemical named entity recognition

被引:169
作者
Rocktaschel, Tim [1 ]
Weidlich, Michael [1 ]
Leser, Ulf [1 ]
机构
[1] Humboldt Univ, Dept Comp Sci, D-12489 Berlin, Germany
关键词
BIOMEDICAL TEXT; RECONSTRUCTION; IDENTIFICATION; DICTIONARY;
D O I
10.1093/bioinformatics/bts183
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The accurate identification of chemicals in text is important for many applications, including computer-assisted reconstruction of metabolic networks or retrieval of information about substances in drug development. But due to the diversity of naming conventions and traditions for such molecules, this task is highly complex and should be supported by computational tools. Results: We present ChemSpot, a named entity recognition (NER) tool for identifying mentions of chemicals in natural language texts, including trivial names, drugs, abbreviations, molecular formulas and International Union of Pure and Applied Chemistry entities. Since the different classes of relevant entities have rather different naming characteristics, ChemSpot uses a hybrid approach combining a Conditional Random Field with a dictionary. It achieves an F-1 measure of 68.1% on the SCAI corpus, outperforming the only other freely available chemical NER tool, OSCAR4, by 10.8 percentage points.
引用
收藏
页码:1633 / 1640
页数:8
相关论文
共 37 条
[1]  
Alex Beatrice, 2008, Pac Symp Biocomput, P556
[2]   Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy [J].
Alexopoulou, Dimitra ;
Andreopoulos, Bill ;
Dietze, Heiko ;
Doms, Andreas ;
Gandon, Fabien ;
Hakenberg, Joerg ;
Khelif, Khaled ;
Schroeder, Michael ;
Waechter, Thomas .
BMC BIOINFORMATICS, 2009, 10
[3]   Text mining and its potential applications in systems biology [J].
Ananiadou, Sophia ;
Kell, Douglas B. ;
Tsujii, Jun-ichi .
TRENDS IN BIOTECHNOLOGY, 2006, 24 (12) :571-579
[4]  
Aronson AR, 2001, J AM MED INFORM ASSN, P17
[5]   Mining chemical structural information from the drug literature [J].
Banville, DL .
DRUG DISCOVERY TODAY, 2006, 11 (1-2) :35-42
[6]   Using the reconstructed genome-scale human metabolic network to study physiology and pathology [J].
Bordbar, A. ;
Palsson, B. O. .
JOURNAL OF INTERNAL MEDICINE, 2012, 271 (02) :131-141
[7]   Name=Struct: A practical approach to the sorry state of real-life chemical nomenclature [J].
Brecher, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (06) :943-950
[8]  
Buyko E., 2006, Proceedings of the Joint BioLINK-Bio-Ontologies Meeting, Fortaleza, Brasil, P65
[9]   A survey of current work in biomedical text mining [J].
Cohen, AM ;
Hersh, WR .
BRIEFINGS IN BIOINFORMATICS, 2005, 6 (01) :57-71
[10]  
Corbett P, 2006, LECT NOTES COMPUT SC, V4216, P107