Practical Outcomes of Applying Ensemble Machine Learning Classifiers to High-Throughput Screening (HTS) Data Analysis and Screening

被引:15
作者
Simmons, Kirk [1 ]
Kinney, John [2 ]
Owens, Aaron [3 ]
Kleier, Daniel A. [4 ]
Bloch, Karen [3 ]
Argentar, Dave [5 ]
Walsh, Alicia [3 ]
Vaidyanathan, Ganesh [6 ]
机构
[1] Simmons Consulting, Titusville, NJ 08560 USA
[2] DuPont Stine Haskell Res Labs, Newark, DE 19711 USA
[3] DuPont Engn Res & Technol, Wilmington, DE 19880 USA
[4] Drexel Univ, Philadelphia, PA 19104 USA
[5] Sun Edge LLC, Bear, DE 19701 USA
[6] Quantum Leap Innovat, Newark, DE 19711 USA
关键词
D O I
10.1021/ci800164u
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Over the years numerous papers have presented the effectiveness of various machine learning methods in analyzing drug, discovery biological screening data. The predictive performance of models developed using these methods has traditionally been evaluated by assessing performance of the developed models against a portion of the data randomly selected for holdout. It has been our experience that Such assessments, while widely practiced, result in an optimistic assessment. This paper describes the development of a series of ensemble-based decision tree models, shares our experience at various stages in the model development process. and presents the impact of such models when they are applied to vendor offerings and the forecasted compounds are acquired and screened in the relevant assays. We have seen that well developed models can significantly increase the hit-rates observed in FITS campaigns.
引用
收藏
页码:2196 / 2206
页数:11
相关论文
共 37 条
[1]  
*ACC INC, CER 2 VERS CCO
[2]   On the use of neural network ensembles in QSAR and QSPR [J].
Agrafiotis, DK ;
Cedeño, W ;
Lobanov, VS .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (04) :903-911
[3]   Ensemble of linear models for predicting drug properties [J].
Arodz, T ;
Yuen, DA ;
Dudek, AZ .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (01) :416-423
[4]   Selected concepts and investigations in compound classification, molecular descriptor analysis, and virtual screening [J].
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (02) :233-245
[5]   Contemporary QSAR classifiers compared [J].
Bruce, Craig L. ;
Melville, James L. ;
Pickett, Stephen D. ;
Hirst, Jonathan D. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (01) :219-227
[6]   ATOM PAIRS AS MOLECULAR-FEATURES IN STRUCTURE ACTIVITY STUDIES - DEFINITION AND APPLICATIONS [J].
CARHART, RE ;
SMITH, DH ;
VENKATARAGHAVAN, R .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1985, 25 (02) :64-73
[7]   Ensemble feature selection: Consistent descriptor subsets for multiple QSAR models [J].
Dutta, Debojyoti ;
Guha, Rajarshi ;
Wild, David ;
Chen, Ting .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (03) :989-997
[8]  
*EDUSOFT, MOLCONNZ VERS 3 50
[9]  
*EDUSOFT, MOLCONNZMANUAL VERS
[10]   Virtual screening of Chinese herbs with random forest [J].
Ehrman, Thomas M. ;
Barlow, David J. ;
Hylands, Peter J. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (02) :264-278