Information Retrieval and Text Mining Technologies for Chemistry

被引:191
作者
Krallinger, Martin [1 ]
Rabal, Obdulia [2 ]
Lourenco, Analia [3 ,4 ,5 ]
Oyarzabal, Julen [2 ]
Valencia, Alfonso [6 ,7 ,8 ]
机构
[1] Spanish Natl Canc Res Ctr, Struct Biol & BioComp Programme, Struct Computat Biol Grp, C Melchor Fernandez Almagro 3, E-28029 Madrid, Spain
[2] Univ Navarra, Small Mol Discovery Platform, Mol Therapeut Program, Ctr Appl Med Res CIMA, Ave Pio 12 55, E-31008 Pamplona, Spain
[3] Univ Vigo, Dept Comp Sci, ESEI, Edificio Politecn,Campus Univ As Lagoas S-N, E-32004 Orense, Spain
[4] Ctr Singular Invest Galicia, Ctr Invest Biomed, Campus Univ Lagoas Marcosende, E-36310 Vigo, Spain
[5] Univ Minho, CEB Ctr Biol Engn, Campus Gualtar, P-4710057 Braga, Portugal
[6] BSC, CNS, Life Sci Dept, C Jordi Girona 29-31, E-08034 Barcelona, Spain
[7] Joint BSC IRB CRG Program Computat Biol, Parc Cient Barcelona,C Baldiri Reixac 10, E-08028 Barcelona, Spain
[8] ICREA, Passeig Lluis Co 23, E-08010 Barcelona, Spain
基金
欧盟地平线“2020”;
关键词
NAMED-ENTITY RECOGNITION; ADVERSE DRUG EVENTS; ORGANIC-CHEMICAL NOMENCLATURE; COMPUTATIONAL-LINGUISTICS TECHNIQUES; ELECTRONIC PATIENT RECORDS; CONDITIONAL RANDOM-FIELDS; MEDICAL SUBJECT-HEADINGS; SUPPORT VECTOR MACHINE; LARGE-SCALE EXTRACTION; LINE NOTATION SLN;
D O I
10.1021/acs.chemrev.6b00851
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
引用
收藏
页码:7673 / 7761
页数:89
相关论文
共 727 条
[1]  
Abacha AsmaBen., 2011, P BIONLP 2011 WORKSH, P56
[2]   Can literature analysis identify innovation drivers in drug discovery? [J].
Agarwal, Pankaj ;
Searls, David B. .
NATURE REVIEWS DRUG DISCOVERY, 2009, 8 (11) :865-878
[3]   Literature mining in support of drug discovery [J].
Agarwal, Pankaj ;
Searls, David B. .
BRIEFINGS IN BIOINFORMATICS, 2008, 9 (06) :479-492
[4]  
Agarwala R, 2018, NUCLEIC ACIDS RES, V46, pD8, DOI [10.1093/nar/gks1189, 10.1093/nar/gkx1095, 10.1093/nar/gkq1172]
[5]   Advanced biological and chemical discovery (ABCD): Centralizing discovery knowledge in an inherently decentralized world [J].
Agrafiotis, Dimitris K. ;
Alex, Simson ;
Dai, Heng ;
Derkinderen, An ;
Farnum, Michael ;
Gates, Peter ;
Izrailev, Sergei ;
Jaeger, Edward P. ;
Konstant, Paul ;
Leung, Albert ;
Lobanov, Victor S. ;
Marichal, Patrick ;
Martin, Douglas ;
Rassokhin, Dmitrii N. ;
Shemanarev, Maxim ;
Skalkin, Andrew ;
Stong, John ;
Tabruyn, Tom ;
Vermeiren, Marleen ;
Wan, Jackson ;
Xu, Xiang Yang ;
Yao, Xiang .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (06) :1999-2014
[6]   Efficient Substructure Searching of Large Chemical Libraries: The ABCD Chemical Cartridge [J].
Agrafiotis, Dimitris K. ;
Lobanov, Victor S. ;
Shemanarev, Maxim ;
Rassokhin, Dmitrii N. ;
Izrailev, Sergei ;
Jaeger, Edward P. ;
Alex, Simson ;
Farnum, Michael .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2011, 51 (12) :3113-3130
[7]  
Ahlers CB, 2007, PACIFIC SYMPOSIUM ON BIOCOMPUTING 2007, P209
[8]   EXTRACTION OF CHEMICAL-REACTION INFORMATION FROM PRIMARY JOURNAL TEXT [J].
AI, CS ;
BLOWER, PE ;
LEDWITH, RH .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1990, 30 (02) :163-169
[9]  
Akhondi SA, 2016, DATABASE, V2016
[10]   Recognition of chemical entities: combining dictionary-based and grammar-based approaches [J].
Akhondi, Saber A. ;
Hettne, Kristina M. ;
van der Horst, Eelke ;
van Mulligen, Erik M. ;
Kors, Jan A. .
JOURNAL OF CHEMINFORMATICS, 2015, 7