Machine learning methods for multi-walled carbon nanotubes (MWCNT) genotoxicity prediction

被引:34
作者
Kotzabasaki, Marianna [1 ]
Sotiropoulos, Iason [1 ]
Charitidis, Costas [1 ]
Sarimveis, Haralambos [1 ]
机构
[1] Natl Tech Univ Athens, Sch Chem Engn, 9 Heroon Polytech St,Zografou Campus, Athens 15780, Greece
来源
NANOSCALE ADVANCES | 2021年 / 3卷 / 11期
基金
欧盟地平线“2020”;
关键词
APPLICABILITY DOMAIN; TOXICITY ASSESSMENT; PULMONARY TOXICITY; RESPONSES; MODEL;
D O I
10.1039/d0na00600a
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Multi-walled carbon nanotubes (MWCNTs) are made of multiple single-walled carbon nanotubes (SWCNTs) which are nested inside one another forming concentric cylinders. These nanomaterials are widely used in industrial and biomedical applications, due to their unique physicochemical characteristics. However, previous studies have shown that exposure to MWCNTs may lead to toxicity and some of the physicochemical properties of MWCNTs can influence their toxicological profiles. In silico modelling can be applied as a faster and less costly alternative to experimental (in vivo and in vitro) testing for the hazard characterization of MWCNTs. This study aims at developing a fully validated predictive nanoinformatics model based on statistical and machine learning approaches for the accurate prediction of genotoxicity of different types of MWCNTs. Towards this goal, a number of different computational workflows were designed, combining unsupervised (Principal Component Analysis, PCA) and supervised classification techniques (Support Vectors Machine, "SVM", Random Forest, "RF", Logistic Regression, "LR" and Naive Bayes, "NB") and Bayesian optimization. The Recursive Feature Elimination (RFE) method was applied for selecting the most important variables. An RF model using only three features was selected as the most efficient for predicting the genotoxicity of MWCNTs, exhibiting 80% accuracy on external validation and high classification probabilities. The most informative features selected by the model were "Length", "Zeta average" and "Purity".
引用
收藏
页码:3167 / 3176
页数:10
相关论文
共 46 条
[1]   Principal component analysis [J].
Abdi, Herve ;
Williams, Lynne J. .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04) :433-459
[2]  
[Anonymous], 2006, Off J Eur Union, VL136, P3
[3]   Case studies putting the decision-making framework for the grouping and testing of nanomaterials (DF4nanoGrouping) into practice [J].
Arts, Josje H. E. ;
Irfan, Muhammad-Adeel ;
Keene, Athena M. ;
Kreiling, Reinhard ;
Lyon, Delina ;
Maier, Monika ;
Michel, Karin ;
Neubauer, Nicole ;
Petry, Thomas ;
Sauer, Ursula G. ;
Warheit, David ;
Wiench, Karin ;
Wohileben, Wendel ;
Landsiedel, Robert .
REGULATORY TOXICOLOGY AND PHARMACOLOGY, 2016, 76 :234-261
[4]  
Aschberger K, 2019, COMPUT TOXICOL, V9, P22, DOI DOI 10.1016/J.COMTOX.2018.10.001
[5]  
Betro B., 1991, J. Glob. Optim, V1, P1, DOI DOI 10.1007/BF00120661
[6]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
Cramer, 2002, SSRN ELECT J, P167
[9]   Using recursive feature elimination in random forest to account for correlated variables in high dimensional data [J].
Darst, Burcu F. ;
Malecki, Kristen C. ;
Engelman, Corinne D. .
BMC GENETICS, 2018, 19
[10]  
European Parliament and the Council of the European Union, 2009, Regulation(EC)No.1223/2009, Off. J. Eur. Union, VL342, P59