Multi-perspective and Domain Specific Tagging of Chemical Documents

被引:0
作者
Deepika, S. S. [1 ]
Geetha, T. V. [1 ]
Sridhar, Rajeswari [1 ]
机构
[1] Anna Univ, Dept Comp Sci & Engn, Coll Engn, Madras, Tamil Nadu, India
来源
DATA SCIENCE ANALYTICS AND APPLICATIONS, DASAA 2017 | 2018年 / 804卷
关键词
Domain-specific tagging; Chemical entity tagging Chemical entity recognition; Domain-specific search; BIOMEDICAL TEXT; NAMES;
D O I
10.1007/978-981-10-8603-8_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text document search typically retrieves documents by performing an exact match based on keywords. In all domains the exact match may not yield good performance as the morpheme or structure of the words has not been considered for the search. This problem becomes significant in the research field of chemistry, where the user could search using a keyword and the document could contain the keyword as a part of the chemical name. For example, the chemical name pentanone contains ketone functional group in it, which can be found by doing a morphemic analysis with the help of chemical nomenclature. Each of the chemical names contains a lot of information about the chemical compound for which it is being named. Hence, the chemical names in the document need to be tagged with all its possible meaningful morphemes to have efficient performance. A multi-perspective and domain specific tagging system was designed based on the available chemical nomenclature, considering the type of bond, number of carbon atoms and the functional group of the chemical entity. The tagging system begins with extraction of the chemical names in the document based on morphological and domain specific features. Based on these features and the contextual knowledge, models were created by designing a linear-chain conditional random field of order two, and they serve as a baseline for the chemical entity extraction process. A morphemic or structural analysis of the extracted named entity was done for the multi-perspective tagging system.
引用
收藏
页码:72 / 85
页数:14
相关论文
共 26 条
  • [1] Reconstrucition of chemical molecules from images
    Algorri, Maria-Elena
    Zimmermann, Marc
    Friedrich, Christoph M.
    Akle, Santiago
    Hofmann-Apitius, Martin
    [J]. 2007 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-16, 2007, : 4609 - +
  • [2] [Anonymous], 2001, CONDITIONAL RANDOM F
  • [3] A survey of current work in biomedical text mining
    Cohen, AM
    Hersh, WR
    [J]. BRIEFINGS IN BIOINFORMATICS, 2005, 6 (01) : 57 - 71
  • [4] Corbett P., 2008, BMC BIOINFORM S11, P54
  • [5] Corbett P., 2007, P WORKSHOP BIONLP 20, P57
  • [6] de Matos A.Paula., 2010, J CHEMINFORMATICS, V2, P6, DOI DOI 10.1186/1758-2946-2-S1-P6
  • [7] Chemical named entities recognition: a review on approaches and applications
    Eltyeb, Safaa
    Salim, Naomie
    [J]. JOURNAL OF CHEMINFORMATICS, 2014, 6
  • [8] Friedrich C.M., 2006, P 2 INT S SEMANTIC M, V7, P85
  • [9] Grego Tiago, 2012, ISRN Bioinform, V2012, P619427, DOI 10.5402/2012/619427
  • [10] ChemicalTagger: A tool for semantic text-mining in chemistry
    Hawizy, Lezan
    Jessop, David M.
    Adams, Nico
    Murray-Rust, Peter
    [J]. JOURNAL OF CHEMINFORMATICS, 2011, 3