ChemSpot: a hybrid system for chemical named entity recognition

被引:161
|
作者
Rocktaschel, Tim [1 ]
Weidlich, Michael [1 ]
Leser, Ulf [1 ]
机构
[1] Humboldt Univ, Dept Comp Sci, D-12489 Berlin, Germany
关键词
BIOMEDICAL TEXT; RECONSTRUCTION; IDENTIFICATION; DICTIONARY;
D O I
10.1093/bioinformatics/bts183
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The accurate identification of chemicals in text is important for many applications, including computer-assisted reconstruction of metabolic networks or retrieval of information about substances in drug development. But due to the diversity of naming conventions and traditions for such molecules, this task is highly complex and should be supported by computational tools. Results: We present ChemSpot, a named entity recognition (NER) tool for identifying mentions of chemicals in natural language texts, including trivial names, drugs, abbreviations, molecular formulas and International Union of Pure and Applied Chemistry entities. Since the different classes of relevant entities have rather different naming characteristics, ChemSpot uses a hybrid approach combining a Conditional Random Field with a dictionary. It achieves an F-1 measure of 68.1% on the SCAI corpus, outperforming the only other freely available chemical NER tool, OSCAR4, by 10.8 percentage points.
引用
收藏
页码:1633 / 1640
页数:8
相关论文
共 50 条
  • [1] A Hybrid Named Entity Recognition System for Aviation Text
    Bharathi, A.
    Ramdin, Robin
    Babu, Preeja
    Menon, Vijay Krishna
    Jayaramakrishnan, Chandrasekhar
    Lakshmikumar, Sudarsan
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (01)
  • [2] AN AUTOMATED SYSTEM FOR TAMIL NAMED ENTITY RECOGNITION USING HYBRID APPROACH
    Jeyashenbagavalli, N.
    Srinivasagan, K. G.
    Suganthi, S.
    2014 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING APPLICATIONS (ICICA 2014), 2014, : 435 - 439
  • [3] A Hybrid Method for Persian Named Entity Recognition
    Ahmadi, Farid
    Moradi, Hamed
    2015 7th Conference on Information and Knowledge Technology (IKT), 2015,
  • [4] A hybrid approach for Chinese named entity recognition
    Fang, XS
    Sheng, HY
    DISCOVERY SCIENCE, PROCEEDINGS, 2002, 2534 : 297 - 301
  • [5] A Hybrid Approach for Persian Named Entity Recognition
    Hamed Moradi
    Farid Ahmadi
    Mohammad-Reza Feizi-Derakhshi
    Iranian Journal of Science and Technology, Transactions A: Science, 2017, 41 : 215 - 222
  • [6] A hybrid model for Chinese named entity recognition
    Sun, Xiao
    Huang, Degen
    RECENT ADVANCE OF CHINESE COMPUTING TECHNOLOGIES, 2007, : 232 - 237
  • [7] A Hybrid Approach for Persian Named Entity Recognition
    Moradi, Hamed
    Ahmadi, Farid
    Feizi-Derakhshi, Mohammad-Reza
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY TRANSACTION A-SCIENCE, 2017, 41 (A1): : 215 - 222
  • [8] A hybrid approach to Arabic named entity recognition
    Shaalan, Khaled
    Oudah, Mai
    JOURNAL OF INFORMATION SCIENCE, 2014, 40 (01) : 67 - 87
  • [9] Named Entity Recognition in Assamese: A Hybrid Approach
    Sharma, Padmaja
    Sharma, Utpal
    Kalita, Jugal
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 2114 - 2120
  • [10] Metabolite Named Entity Recognition: A Hybrid Approach
    Kongburan, Wutthipong
    Padungweang, Praisan
    Krathu, Worarat
    Chan, Jonathan H.
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT I, 2016, 9947 : 451 - 460