Chemical Name to Structure: OPSIN, an Open Source Solution

被引:142
作者
Lowe, Daniel M. [1 ]
Corbett, Peter T. [1 ]
Murray-Rust, Peter [1 ]
Glen, Robert C. [1 ]
机构
[1] Univ Cambridge, Dept Chem, Unilever Ctr Mol Sci Informat, Cambridge CB2 1EW, England
关键词
COMPUTER TRANSLATION; BASIC PRINCIPLES; NOMENCLATURE; INFORMATION; CHEMISTRY; REVISION; LANGUAGE; GRAMMAR; SYSTEM; RULES;
D O I
10.1021/ci100384d
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
We have produced an open source, freely available, algorithm (Open Parser for Systematic IUPAC Nomenclature, OPSIN) that interprets the majority of organic chemical nomenclature in a fast and precise manner. This has been achieved using an approach based on a regular grammar. This grammar is used to guide tokenization, a potentially difficult problem in chemical names. From the parsed chemical name, an XML parse tree is constructed that is operated on in a stepwise manner until the structure has been reconstructed from the name. Results from OPSIN on various computer generated name/structure pair sets are presented. These show exceptionally high. precision (99.8%+) and, when using general organic chemical nomenclature, high recall (98.7-99.2%). This software can serve as the basis for future open source developments of chemical name interpretation.
引用
收藏
页码:739 / 753
页数:15
相关论文
共 41 条
[1]  
Adams S., JNI INCHI
[2]   Experimental data checker: better information for organic chemists [J].
Adams, SE ;
Goodman, JM ;
Kidd, RJ ;
McNaught, AD ;
Murray-Rust, P ;
Norton, FR ;
Townsend, JA ;
Waudby, CA .
ORGANIC & BIOMOLECULAR CHEMISTRY, 2004, 2 (21) :3067-3070
[3]  
[Anonymous], STRUCT NAM
[4]  
[Anonymous], ACD NAM
[5]  
[Anonymous], NAMEXPERT
[6]  
[Anonymous], NOM ORG CHEM
[7]  
[Anonymous], IUPAC DRAWIT
[8]  
[Anonymous], NAM STRUCT
[9]  
[Anonymous], OPSIN WEB INT
[10]  
[Anonymous], 2005, Nomenclature of Inorganic Chemistry: IUPAC Recommendations 2005