Machine Learning to Predict Toxicity of Compounds

被引:8
作者
Grenet, Ingrid [1 ]
Yin, Yonghua [2 ]
Comet, Jean-Paul [1 ]
Gelenbe, Erol [1 ,2 ]
机构
[1] Univ Cote dAzur, CNRS, UMR 7271, I3S Lab, CS 40121, F-06903 Sophia Antipolis, France
[2] Imperial Coll, Dept Elect & Elect Engn, Intelligent Syst & Networks Grp, London, England
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I | 2018年 / 11139卷
关键词
Machine learning; Toxicity; QSAR; Data augmentation; FUNCTION APPROXIMATION;
D O I
10.1007/978-3-030-01418-6_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Toxicology studies are subject to several concerns, and they raise the importance of an early detection of the potential for toxicity of chemical compounds which is currently evaluated through in vitro assays assessing their bioactivity, or using costly and ethically questionable in vivo tests on animals. Thus we investigate the prediction of the bioactivity of chemical compounds from their physico-chemical structure, and propose that it be automated using machine learning (ML) techniques based on data from in vitro assessment of several hundred chemical compounds. We provide the results of tests with this approach using several ML techniques, using both a restricted dataset and a larger one. Since the available empirical data is unbalanced, we also use data augmentation techniques to improve the classification accuracy, and present the resulting improvements.
引用
收藏
页码:335 / 345
页数:11
相关论文
共 28 条
[1]   Applying support vector machines to imbalanced datasets [J].
Akbani, R ;
Kwek, S ;
Japkowicz, N .
MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 :39-50
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[4]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[5]  
Chollet F., 2015, about us
[6]   Video quality and traffic QoS in learning-based subsampled and receiver-interpolated video sequences [J].
Cramer, CE ;
Gelenbe, E .
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2000, 18 (02) :150-167
[7]   The ToxCast program for prioritizing toxicity testing of environmental chemicals [J].
Dix, David J. ;
Houck, Keith A. ;
Martin, Matthew T. ;
Richard, Ann M. ;
Setzer, R. Woodrow ;
Kavlock, Robert J. .
TOXICOLOGICAL SCIENCES, 2007, 95 (01) :5-12
[8]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232
[9]  
GELENBE E, 1990, CR ACAD SCI II, V310, P177
[10]   A class of genetic algorithms with analytical solution [J].
Gelenbe, E .
ROBOTICS AND AUTONOMOUS SYSTEMS, 1997, 22 (01) :59-64