Chemical named entities recognition: a review on approaches and applications

被引:82
作者
Eltyeb, Safaa [1 ,2 ]
Salim, Naomie [1 ]
机构
[1] Univ Teknol Malaysia, Fac Comp, Johor Baharu, Malaysia
[2] Sudan Univ Sci & Technol, Coll Comp Sci & Informat Technol, Khartoum, Sudan
来源
JOURNAL OF CHEMINFORMATICS | 2014年 / 6卷
关键词
Chemical entities; Information extraction; Chemical names; INFORMATION; IDENTIFICATION; EXTRACTION; MOLECULES; PROTEINS; DRUGS;
D O I
10.1186/1758-2946-6-17
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The rapid increase in the flow rate of published digital information in all disciplines has resulted in a pressing need for techniques that can simplify the use of this information. The chemistry literature is very rich with information about chemical entities. Extracting molecules and their related properties and activities from the scientific literature to "text mine" these extracted data and determine contextual relationships helps research scientists, particularly those in drug development. One of the most important challenges in chemical text mining is the recognition of chemical entities mentioned in the texts. In this review, the authors briefly introduce the fundamental concepts of chemical literature mining, the textual contents of chemical documents, and the methods of naming chemicals in documents. We sketch out dictionary-based, rule-based and machine learning, as well as hybrid chemical named entity recognition approaches with their applied solutions. We end with an outlook on the pros and cons of these approaches and the types of chemical entities extracted.
引用
收藏
页数:12
相关论文
共 72 条
  • [1] Abacha A.B., 2010, SEMANTIC MINING BIOM
  • [2] Akhondi SA, 2013, BIONLP 2007, V2, P113
  • [3] Reconstrucition of chemical molecules from images
    Algorri, Maria-Elena
    Zimmermann, Marc
    Friedrich, Christoph M.
    Akle, Santiago
    Hofmann-Apitius, Martin
    [J]. 2007 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-16, 2007, : 4609 - +
  • [4] [Anonymous], 2013, BIOCREATIVE CHALL EV
  • [5] [Anonymous], P 4 BIOCR CHALL EV W
  • [6] [Anonymous], P 4 BIOCR CHALL EV W
  • [7] Ayodele T.O., 2010, Types of machine learning algorithms", in
  • [8] Mining connections between chemicals, proteins, and diseases extracted from Medline annotations
    Baker, Nancy C.
    Hemminger, Bradley M.
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2010, 43 (04) : 510 - 519
  • [9] Mining chemical structural information from the drug literature
    Banville, DL
    [J]. DRUG DISCOVERY TODAY, 2006, 11 (1-2) : 35 - 42
  • [10] Batista-Navarro R.T., 2013, P 4 BIOCREATIVE CHAL, V2, P55