Are Bigger Data Sets Better for Machine Learning? Fusing Single-Point and Dual-Event Dose Response Data for Mycobacterium tuberculosis

被引:30
作者
Ekins, Sean [1 ,2 ]
Freundlich, Joel S. [3 ,4 ]
Reynolds, Robert C. [5 ]
机构
[1] Collaborat Chem, Fuquay Varina, NC 27526 USA
[2] Collaborat Drug Discovery, Burlingame, CA 94010 USA
[3] Rutgers State Univ, New Jersey Med Sch, Dept Med, Ctr Emerging & Reemerging Pathogens, Newark, NJ 07103 USA
[4] Rutgers State Univ, New Jersey Med Sch, Dept Pharmacol & Physiol, Newark, NJ 07103 USA
[5] Univ Alabama Birmingham, Dept Chem, Coll Arts & Sci, Birmingham, AL 35294 USA
关键词
DRUG DISCOVERY; BAYESIAN MODELS; CHEMISTRY DATABASES; PHENOTYPIC SCREENS; LEAD OPTIMIZATION; BIOLOGY; METABOLISM; ABSORPTION; CANDIDATES; CHALLENGES;
D O I
10.1021/ci500264r
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Tuberculosis is a major, neglected disease for which the quest to find new treatments continues. There is an abundance of data from large phenotypic screens in the public domain against Mycobacterium tuberculosis (Mtb). Since machine learning methods can learn from past data, we were interested in addressing whether more data builds better models. We now describe using Bayesian machine learning to assess whether we can improve our models by combining the large quantities of single-point data with the much smaller (higher quality) dual-event data sets, which use both dose response data for both whole-cell antitubercular activity and Vero cell cytotoxicity. We have evaluated 12 models ranging from different single-point, dual-event dose response, single-point and dual-event dose response as well as combined data sets for three distinct data sets from the same laboratory. We used a fourth data set of active and inactive compounds from the same group as well as a smaller set of 177 active compounds from GlaxoSmithKline as test sets. Our data suggest combining single-point with dual-event dose response data does not diminish the internal or external predictive ability of the models based on the receiver operator curve (ROC) for these models (internal ROC range 0.83-0.91, external ROC range 0.62-0.83) compared to the orders of magnitude smaller dual-event models (internal ROC range 0.6-0.83 and external ROC 0.54-0.83). In conclusion, models developed with 1200-5000 compounds appear to be as predictive as those generated with 25 000-350 000 molecules. Our results have implications for justifying further high-throughput screening versus focused testing based on model predictions.
引用
收藏
页码:2157 / 2165
页数:9
相关论文
共 58 条
[1]   High-throughput screening for inhibitors of Mycobacterium tuberculosis H37Rv [J].
Ananthan, Subramaniam ;
Faaleolea, Ellen R. ;
Goldman, Robert C. ;
Hobrath, Judith V. ;
Kwong, Cecil D. ;
Laughon, Barbara E. ;
Maddry, Joseph A. ;
Mehta, Alka ;
Rasmussen, Lynn ;
Reynolds, Robert C. ;
Secrist, John A., III ;
Shindo, Nice ;
Showe, Dustin N. ;
Sosa, Melinda I. ;
Suling, William J. ;
White, E. Lucile .
TUBERCULOSIS, 2009, 89 (05) :334-353
[2]   A diarylquinoline drug active on the ATP synthase of Mycobacterium tuberculosis [J].
Andries, K ;
Verhasselt, P ;
Guillemont, J ;
Göhlmann, HWH ;
Neefs, JM ;
Winkler, H ;
Van Gestel, J ;
Timmerman, P ;
Zhu, M ;
Lee, E ;
Williams, P ;
de Chaffoy, D ;
Huitric, E ;
Hoffner, S ;
Cambau, E ;
Truffot-Pernot, C ;
Lounis, N ;
Jarlier, V .
SCIENCE, 2005, 307 (5707) :223-227
[3]  
[Anonymous], GLOB TUB REP 2013
[4]   New small-molecule synthetic antimycobacterials [J].
Ballell, L ;
Field, RA ;
Duncan, K ;
Young, RJ .
ANTIMICROBIAL AGENTS AND CHEMOTHERAPY, 2005, 49 (06) :2153-2163
[5]   Fueling Open-Source Drug Discovery: 177 Small-Molecule Leads against Tuberculosis [J].
Ballell, Lluis ;
Bates, Robert H. ;
Young, Rob J. ;
Alvarez-Gomez, Daniel ;
Alvarez-Ruiz, Emilio ;
Barroso, Vanessa ;
Blanco, Delia ;
Crespo, Benigno ;
Escribano, Jaime ;
Gonzalez, Ruben ;
Lozano, Sonia ;
Huss, Sophie ;
Santos-Villarejo, Angel ;
Julio Martin-Plaza, Jose ;
Mendoza, Alfonso ;
Jose Rebollo-Lopez, Maria ;
Remuinan-Blanco, Modesto ;
Luis Lavandera, Jose ;
Perez-Herran, Esther ;
Javier Gamo-Benito, Francisco ;
Francisco Garcia-Bustos, Jose ;
Barros, David ;
Castro, Julia P. ;
Cammack, Nicholas .
CHEMMEDCHEM, 2013, 8 (02) :313-321
[6]   Use of genomics and combinatorial chemistry in the development of new antimycobacterial drugs [J].
Barry, CE ;
Slayden, RA ;
Sampson, AE ;
Lee, RE .
BIOCHEMICAL PHARMACOLOGY, 2000, 59 (03) :221-231
[7]   The properties of known drugs .1. Molecular frameworks [J].
Bemis, GW ;
Murcko, MA .
JOURNAL OF MEDICINAL CHEMISTRY, 1996, 39 (15) :2887-2893
[8]   Analysis of pharmacology data and the prediction of adverse drug reactions and off-target effects from chemical structure [J].
Bender, Andreas ;
Scheiber, Josef ;
Glick, Meir ;
Davies, John W. ;
Azzaoui, Kamal ;
Hamon, Jacques ;
Urban, Laszlo ;
Whitebread, Steven ;
Jenkins, Jeremy L. .
CHEMMEDCHEM, 2007, 2 (06) :861-873
[9]   Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence [J].
Cole, ST ;
Brosch, R ;
Parkhill, J ;
Garnier, T ;
Churcher, C ;
Harris, D ;
Gordon, SV ;
Eiglmeier, K ;
Gas, S ;
Barry, CE ;
Tekaia, F ;
Badcock, K ;
Basham, D ;
Brown, D ;
Chillingworth, T ;
Connor, R ;
Davies, R ;
Devlin, K ;
Feltwell, T ;
Gentles, S ;
Hamlin, N ;
Holroyd, S ;
Hornby, T ;
Jagels, K ;
Krogh, A ;
McLean, J ;
Moule, S ;
Murphy, L ;
Oliver, K ;
Osborne, J ;
Quail, MA ;
Rajandream, MA ;
Rogers, J ;
Rutter, S ;
Seeger, K ;
Skelton, J ;
Squares, R ;
Squares, S ;
Sulston, JE ;
Taylor, K ;
Whitehead, S ;
Barrell, BG .
NATURE, 1998, 393 (6685) :537-+
[10]   A medicinal chemists' guide to the unique difficulties of lead optimization for tuberculosis [J].
Dartois, Veronique ;
Barry, Clifton E., III .
BIOORGANIC & MEDICINAL CHEMISTRY LETTERS, 2013, 23 (17) :4741-4750