A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model

被引:52
|
作者
Judson, Richard [1 ]
Elloumi, Fathi [1 ]
Setzer, R. Woodrow [1 ]
Li, Zhen [2 ]
Shah, Imran [1 ]
机构
[1] US EPA, Natl Ctr Computat Toxicol, Off Res & Dev, Res Triangle Pk, NC 27711 USA
[2] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA
关键词
D O I
10.1186/1471-2105-9-241
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Bioactivity profiling using high-throughput in vitro assays can reduce the cost and time required for toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in high-dimensional bioactivity space that predict tissue, organ or whole animal toxicological endpoints. Supervised machine learning is a powerful approach to discover combinatorial relationships in complex in vitro/in vivo datasets. We present a novel model to simulate complex chemical-toxicology data sets and use this model to evaluate the relative performance of different machine learning (ML) methods. Results: The classification performance of Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Naive Bayes (NB), Recursive Partitioning and Regression Trees (RPART), and Support Vector Machines (SVM) in the presence and absence of filter-based feature selection was analyzed using K-way cross-validation testing and independent validation on simulated in vitro assay data sets with varying levels of model complexity, number of irrelevant features and measurement noise. While the prediction accuracy of all ML methods decreased as non-causal (irrelevant) features were added, some ML methods performed better than others. In the limit of using a large number of features, ANN and SVM were always in the top performing set of methods while RPART and KNN (k = 5) were always in the poorest performing set. The addition of measurement noise and irrelevant features decreased the classification accuracy of all ML methods, with LDA suffering the greatest performance degradation. LDA performance is especially sensitive to the use of feature selection. Filter-based feature selection generally improved performance, most strikingly for LDA. Conclusion: We have developed a novel simulation model to evaluate machine learning methods for the analysis of data sets in which in vitro bioassay data is being used to predict in vivo chemical toxicology. From our analysis, we can recommend that several ML methods, most notably SVM and ANN, are good candidates for use in real world applications in this area.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Multi-scale tensor network architecture for machine learning
    Reyes, J.A.
    Stoudenmire, E.M.
    Machine Learning: Science and Technology, 2021, 2 (03):
  • [42] A Stable Multi-Scale Kernel for Topological Machine Learning
    Reininghaus, Jan
    Huber, Stefan
    Bauer, Ulrich
    Kwitt, Roland
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 4741 - 4748
  • [43] Multi-scale frequency domain learning for texture classification
    Zang, Liguang
    Li, Yuancheng
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025, 16 (02) : 947 - 958
  • [44] Multi-Task Learning Model Based on Multi-Scale CNN and LSTM for Sentiment Classification
    Jin, Ning
    Wu, Jiaxian
    Ma, Xiang
    Yan, Ke
    Mo, Yuchang
    IEEE ACCESS, 2020, 8 : 77060 - 77072
  • [45] Medulloblastoma Tumor Classification using Deep Transfer Learning with Multi-Scale EfficientNets
    Bengs, Marcel
    Bockmayr, Michael
    Schueller, Ulrich
    Schlaefer, Alexander
    MEDICAL IMAGING 2021 - DIGITAL PATHOLOGY, 2021, 11603
  • [46] Machine Learning Classification Model Comparison
    Giudici, Paolo
    Gramegna, Alex
    Raffinetti, Emanuela
    SOCIO-ECONOMIC PLANNING SCIENCES, 2023, 87
  • [47] Comparison of Machine Learning Algorithms for Soil Type Classification
    Harlianto, Pramudyana Agus
    Setiawan, Noor Akhmad
    Adji, Teguh Bharata
    2017 3RD INTERNATIONAL CONFERENCE ON SCIENCE AND TECHNOLOGY - COMPUTER (ICST), 2017, : 7 - 10
  • [48] Comparison of Machine Learning Algorithms for Shelter Animal Classification
    Mitrovic, Katarina
    Milosevic, Danijela
    Greconici, Marian
    IEEE 13TH INTERNATIONAL SYMPOSIUM ON APPLIED COMPUTATIONAL INTELLIGENCE AND INFORMATICS (SACI 2019), 2019, : 211 - 216
  • [49] Comparison of Machine Learning Algorithms on Noisy Data
    Oreski, Dijana
    Visnjic, Dunja
    Kadoic, Nikola
    CENTRAL EUROPEAN CONFERENCE ON INFORMATION AND INTELLIGENT SYSTEMS, CECIIS, 2023, : 383 - 389
  • [50] A comparison of generic machine learning algorithms for image classification
    Marée, R
    Geurts, P
    Visimberga, G
    Piater, J
    Wehenkel, L
    RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XX, 2004, : 169 - 182