Concept-Based Semi-Automatic Classification of Drugs

被引:20
作者
Gurulingappa, Harsha [1 ,2 ]
Kolarik, Corinna [1 ]
Hofmann-Apitius, Martin [1 ,2 ]
Fluck, Juliane [1 ]
机构
[1] Fraunhofer Inst Algorithms & Sci Comp, D-53754 Schloss Birlinghoven, Sankt Augustin, Germany
[2] Bonn Aachen Int Ctr Informat Technol B IT, D-53113 Bonn, Germany
关键词
Learning systems - Classification (of information);
D O I
10.1021/ci9000844
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
The anatomical therapeutic chemical (ATC) classification System maintained by the World Health Organization provides a global standard for the classification of medical substances and serves as a source for drug repurposing research. Nevertheless, it lacks several drugs that are major players in the global drug market. In order to establish classifications for yet unclassified drugs. this paper presents a newly developed approach based on a combination of information extraction (IE) and machine learning (ML) techniques. Most of the information about drugs is published in the scientific articles. Therefore, an IE-based framework is employed to extract terms from free text that express drug's chemical, pharmacological, therapeutic, and systemic effects, The extracted terms are used as features within a ML framework to predict Putative ATC class labels for unclassified drugs. The system was tested on a portion of ATC containing drugs with an indication on the cardiovascular system, The class prediction turned out to be successful with the best predictive accuracy of 89.47% validated by a 100-fold bootstrapping of the training set and an accuracy of 77.12% on an independent test set. The presented concept-based classification system outperformed state-of-the-art classification methods based on chemical structure properties.
引用
收藏
页码:1986 / 1992
页数:7
相关论文
共 27 条
[1]  
ALMUBAID H, 2006, IEEE C EV COMP, V5726, P5733
[2]  
[Anonymous], A practical guide to support vector classification
[3]  
[Anonymous], Journal of machine learning research
[4]  
Aronson AR, 2001, J AM MED INFORM ASSN, P17
[5]   Drug target identification using side-effect similarity [J].
Campillos, Monica ;
Kuhn, Michael ;
Gavin, Anne-Claude ;
Jensen, Lars Juhl ;
Bork, Peer .
SCIENCE, 2008, 321 (5886) :263-266
[6]  
CHEN GY, 2006, INT C PATTERN RECOGN, V2, P614
[7]   SuperPred:: drug classification and target prediction [J].
Dunkel, Mathias ;
Guenther, Stefan ;
Ahmed, Jessica ;
Wittig, Burghardt ;
Preissner, Robert .
NUCLEIC ACIDS RESEARCH, 2008, 36 :W55-W59
[8]   ProMiner: rule-based protein and gene entity recognition [J].
Hanisch, D ;
Fundel, K ;
Mevissen, HT ;
Zimmer, R ;
Fluck, J .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[9]  
Huang J, 2003, THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P553
[10]   Identification of new drug classification terms in textual resources [J].
Kolarik, Corinna ;
Hofmann-Apitius, Martin ;
Zimmermann, Marc ;
Fluck, Juliane .
BIOINFORMATICS, 2007, 23 (13) :I264-I272