A support vector machines approach for virtual screening of active compounds of single and multiple mechanisms from large libraries at an improved hit-rate and enrichment factor

被引:75
作者
Han, L. Y. [1 ]
Ma, X. H. [1 ]
Lin, H. H. [1 ]
Jia, J. [1 ]
Zhu, F. [1 ]
Xue, Y. [3 ]
Li, Z. R. [3 ]
Cao, Z. W. [2 ]
Ji, Z. L. [4 ]
Chen, Y. Z. [1 ,2 ]
机构
[1] Natl Univ Singapore, Dept Pharm, Bioinformat & Drug Design Grp, Singapore 117543, Singapore
[2] Shanghai Ctr Bioinformat Technol, Shanghai 201203, Peoples R China
[3] Sichuan Univ, Coll Chem, Chengdu 610064, Peoples R China
[4] Xiamen Univ, Sch Life Sci, Bioinformat Res Grp, Xiamen 361005, Fujian Province, Peoples R China
关键词
computer aided dug design; drug discovery high-throughput screening; lead discovery; machine learning method; virtual screening;
D O I
10.1016/j.jmgm.2007.12.002
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Support vector machines (SVM) and other machine-learning (ML) methods have been explored as ligand-based virtual screening (VS) tools for facilitating lead discovery. While exhibiting good hit selection performance, in screening large compound libraries, these methods tend to produce lower hit-rate than those of the best performing VS tools, partly because their training-sets contain limited spectrum of inactive compounds. We tested whether the performance of SVM can be improved by using training-sets of diverse inactive compounds. In retrospective database screening of active compounds of single mechanism (HIV protease inhibitors, DHFR inhibitors, dopamine antagonists) and multiple mechanisms (CNS active agents) from large libraries of 2.986 million compounds, the yields, hit-rates, and enrichment factors of our SVM models are 52.4-78.0%, 4.7-73.8%, and 214-10,543, respectively, compared to those of 62-95%, 0.65-35%, and 20-1200 by structure-based VS and 55-81%, 0.2-0.7%, and 110-795 by other ligand-based VS tools in screening libraries of >= 1 million compounds. The hit-rates are comparable and the enrichment factors are substantially better than the best results of other VS tools. 24.3-87.6% of the predicted hits are outside the known hit families. SVM appears to be potentially useful for facilitating lead discovery in VS of large compound libraries. (C) 2007 Elsevier Inc. All rights reserved.
引用
收藏
页码:1276 / 1286
页数:11
相关论文
共 84 条
  • [1] Adler CH, 2000, NEUROLOGY, V55, pS9
  • [2] High-throughput docking as a source of novel drug leads
    Alvarez, JC
    [J]. CURRENT OPINION IN CHEMICAL BIOLOGY, 2004, 8 (04) : 365 - 370
  • [3] *AM SOC HLTH SYST, 2001, BETH AHFS DRUG INF
  • [4] NIPALSTREE:: A new hierarchical clustering approach for large compound libraries and its application to virtual screening
    Boecker, Alexander
    Schneider, Gisbert
    Teckentrup, Andreas
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (06) : 2220 - 2229
  • [5] A 3D QSAR study on a set of dopamine D4 receptor antagonists
    Boström, J
    Böhm, M
    Gundertofte, K
    Klebe, G
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (03): : 1020 - 1027
  • [6] Comparison of support vector machine and artificial neural network systems for drug/nondrug classification
    Byvatov, E
    Fechner, U
    Sadowski, J
    Schneider, G
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (06): : 1882 - 1889
  • [7] SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence
    Cai, CZ
    Han, LY
    Ji, ZL
    Chen, X
    Chen, YZ
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3692 - 3697
  • [8] CHEN B, 2007, J COMPUT AID MOL DES
  • [9] Virtual screening using binary kernel discrimination: Effect of noisy training data and the optimization of performance
    Chen, BN
    Harrison, RF
    Pasupa, K
    Willett, P
    Wilton, DJ
    Wood, DJ
    Lewell, XQ
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (02) : 478 - 486
  • [10] Prediction of MHC-binding peptides of flexible lengths from sequence-derived structural and physicochemical properties
    Cui, J.
    Han, L. Y.
    Lin, H. H.
    Zhang, H. L.
    Tang, Z. Q.
    Zheng, C. J.
    Cao, Z. W.
    Chen, Y. Z.
    [J]. MOLECULAR IMMUNOLOGY, 2007, 44 (05) : 866 - 877