Recursive feature elimination in random forest classification supports nanomaterial grouping

被引:71
作者
Bahl, Aileen [1 ,2 ]
Hellack, Bryan [3 ]
Balas, Mihaela [4 ]
Dinischiotu, Anca [4 ]
Wiemann, Martin [5 ]
Brinkmann, Joep [6 ]
Luch, Andreas [1 ]
Renard, Bernhard Y. [2 ]
Haase, Andrea [1 ]
机构
[1] German Fed Inst Risk Assessment BfR, Dept Chem & Prod Safety, Berlin, Germany
[2] RKI, Bioinformat Unit MF 1, Berlin, Germany
[3] Inst Energy & Environm Technol eV IUTA, Duisburg, Germany
[4] Univ Bucharest, Bucharest, Romania
[5] IBE R&D Inst Lung Hlth gGmbH, Munster, Germany
[6] Evonik Resource Efficiency GmbH, Hanau, Germany
关键词
Random forest; Recursive feature elimination; Feature selection; Principal component analysis; Machine learning; Nanomaterial grouping; Toxicity prediction; Physico-chemical properties; RISK-ASSESSMENT; TOXICITY; NANOPARTICLES; CYTOTOXICITY; NANOSCALE; PARTICLES; FRAMEWORK; MECHANISMS; HEALTH; SILVER;
D O I
10.1016/j.impact.2019.100179
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Nanomaterials (NMs) can be produced in numerous different variants of the same chemical substance. An in-depth safety assessment for each variant by generating test data will simply not be feasible. Thus, NM grouping approaches that would significantly reduce the time and amount of testing for novel NMs are urgently needed. However, identifying structurally similar NM variants remains challenging as many physico-chemical properties could be relevant. Here, we aimed at emphasizing on the value of machine learning models in the process of NM grouping by considering a case study on eleven selected, well-characterized NMs. To that end, we linked physico-chemical properties of these NMs to characterized hallmarks for inhalation toxicity. We applied unsupervised and supervised machine learning techniques to determine which combination of properties is most predictive. First, we assessed NM similarity in an unsupervised manner using principal component analysis (PCA) followed by subsequent superposition of activity labels combined with a k-nearest neighbors approach. Then, we used random forests (RFs) as a supervised machine learning technique which directly uses the knowledge on the activity class in the process of defining NM similarity. Thus, similarity was defined only on those properties showing the highest correlation with the activity and therefore had the highest discriminative power. In order to improve the performance, we then used recursive feature elimination (RFE) to delete uninformative features biasing the results. The best performance was achieved by the reduced RF model based on RFE where a balanced accuracy of 0.82 was obtained. Out of eleven different properties we determined zeta potential, redox potential and dissolution rate to have the strongest predicting impact on biological NM activity in the present dataset. Though the dataset is too small with respect to the number of NMs studied and the applicability domain is expected to be very limited due to the fact that only few material classes were covered, our study demonstrates how machine learning and feature selection methods can be implemented for identifying the most relevant physico-chemical NM properties with respect to toxicity. We suggest that once the most relevant properties have been detected in a model built on a sufficient number of different NMs and across multiple NM classes, they should obtain special emphasis in future grouping approaches.
引用
收藏
页数:12
相关论文
共 56 条
  • [1] Enriched random forests
    Amaratunga, Dhammika
    Cabrera, Javier
    Lee, Yung-Seop
    [J]. BIOINFORMATICS, 2008, 24 (18) : 2010 - 2014
  • [2] [Anonymous], PATTERN RECOGN LETT
  • [3] [Anonymous], SAFETY NANOMATERIALS
  • [4] [Anonymous], SOP DISPERSION
  • [5] [Anonymous], SOPS
  • [6] [Anonymous], BENCH DOS TECHN GUID
  • [7] [Anonymous], 2003, Manual for setting up, using, and understanding random forest V4.0
  • [8] [Anonymous], GUID GROUP CHEM
  • [9] Case studies putting the decision-making framework for the grouping and testing of nanomaterials (DF4nanoGrouping) into practice
    Arts, Josje H. E.
    Irfan, Muhammad-Adeel
    Keene, Athena M.
    Kreiling, Reinhard
    Lyon, Delina
    Maier, Monika
    Michel, Karin
    Neubauer, Nicole
    Petry, Thomas
    Sauer, Ursula G.
    Warheit, David
    Wiench, Karin
    Wohileben, Wendel
    Landsiedel, Robert
    [J]. REGULATORY TOXICOLOGY AND PHARMACOLOGY, 2016, 76 : 234 - 261
  • [10] A decision-making framework for the grouping and testing of nanomaterials (DF4nanoGrouping)
    Arts, Josje H. E.
    Hadi, Mackenzie
    Irfan, Muhammad-Adeel
    Keene, Athena M.
    Kreiling, Reinhard
    Lyon, Delina
    Maier, Monika
    Michel, Karin
    Petry, Thomas
    Sauer, Ursula G.
    Warheit, David
    Wiench, Karin
    Wohlleben, Wendel
    Landsiedel, Robert
    [J]. REGULATORY TOXICOLOGY AND PHARMACOLOGY, 2015, 71 (02) : S1 - S27