The One-Class Classification Approach to Data Description and to Models Applicability Domain

被引:48
作者
Baskin, Igor I. [1 ]
Kireeva, Natalia [2 ]
Varnek, Alexandre [2 ]
机构
[1] Moscow MV Lomonosov State Univ, Dept Chem, Moscow 119991, Russia
[2] Univ Strasbourg, CNRS, Lab Infochim, UMR 7177, F-67000 Strasbourg, France
关键词
One-class classification approach; Models applicability domain; Structure-property relationships; Structure-activity relationships; NOVELTY DETECTION; SUPPORT; OUTLIERS;
D O I
10.1002/minf.201000063
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
In this paper, we associate an applicability domain (AD) of QSAR/QSPR models with the area in the input (descriptor) space in which the density of training data points exceeds a certain threshold. It could be proved that the predictive performance of the models (built on the training set) is larger for the test compounds inside the high density area, than for those outside this area. Instead of searching a decision surface separating high and low density areas in the input space, the one-class classification 1-SVM approach looks for a hyperplane in the associated feature space. Unlike other reported in the literature AD definitions, this approach: (i) is purely "data-based", i.e. it assigns the same AD to all models built on the same training set, (ii) provides results that depend only on the initial descriptors pool generated for the training set, (iii) can be used for the huge number of descriptors, as well as in the framework of structured kernel-based approaches, e.g., chemical graph kernels. The developed approach has been applied to improve the performance of QSPR models for stability constants of the complexes of organic ligands with alkaline-earth metals in water.
引用
收藏
页码:581 / 587
页数:7
相关论文
共 37 条
[1]  
[Anonymous], 1973, Pattern Classification and Scene Analysis
[2]  
BALDWIN RW, 1989, TARG DIAG T, V2, P53
[3]   Support vector clustering [J].
Ben-Hur, A ;
Horn, D ;
Siegelmann, HT ;
Vapnik, V .
JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (02) :125-137
[4]   NOVELTY DETECTION AND NEURAL-NETWORK VALIDATION [J].
BISHOP, CM .
IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1994, 141 (04) :217-222
[5]  
Bishop CM., 1995, NEURAL NETWORKS PATT
[6]  
Breunig M.M., 2000, P ACM SIGMOD 2000 IN
[7]   ART 2-A - AN ADAPTIVE RESONANCE ALGORITHM FOR RAPID CATEGORY LEARNING AND RECOGNITION [J].
CARPENTER, GA ;
GROSSBERG, S ;
ROSEN, DB .
NEURAL NETWORKS, 1991, 4 (04) :493-504
[8]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[9]   A stepwise approach for defining the applicability domain of SAR and QSAR models [J].
Dimitrov, S ;
Dimitrova, G ;
Pavlov, T ;
Dimitrova, N ;
Patlewicz, G ;
Niemela, J ;
Mekenyan, O .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2005, 45 (04) :839-849
[10]   Estimation of the applicability domain of kernel-based machine learning models for virtual screening [J].
Fechner, Nikolas ;
Jahn, Andreas ;
Hinselmann, Georg ;
Zell, Andreas .
JOURNAL OF CHEMINFORMATICS, 2010, 2