Curated Data In - Trustworthy In Silico Models Out: The Impact of Data Quality on the Reliability of Artificial Intelligence Models as Alternatives to Animal Testing

被引:27
作者
Alves, Vinicius M. [1 ]
Auerbach, Scott S. [2 ]
Kleinstreuer, Nicole [3 ]
Rooney, John P. [4 ]
Muratov, Eugene N. [5 ,6 ]
Rusyn, Ivan [7 ]
Tropsha, Alexander [5 ]
Schmitt, Charles [1 ]
机构
[1] NIEHS, Off Data Sci, Div Natl Toxicol Program DNTP, Durham, NC 27560 USA
[2] NIEHS, Toxinformat Grp, Predict Toxicol Branch, DNTP, Durham, NC 27560 USA
[3] NIEHS, Natl Toxicol Program Interagcy Ctr Evaluat Altern, Sci Directors Off, DNTP, Durham, NC 27560 USA
[4] Integrated Lab Syst LLC, Morrisville, NC USA
[5] Univ N Carolina, UNC Eshelman Sch Pharm, Lab Mol Modeling, Chapel Hill, NC 27599 USA
[6] Univ Fed Paraiba, Dept Pharmaceut Sci, Joao Pessoa, Paraiba, Brazil
[7] Texas A&M Univ, Coll Vet Med & Biomed Sci, Dept Vet Integrat Biosci, College Stn, TX USA
来源
ATLA-ALTERNATIVES TO LABORATORY ANIMALS | 2021年 / 49卷 / 03期
关键词
artificial intelligence; data curation; data quality; data reproducibility; QSAR; QSAR; PREDICTION; REPRODUCIBILITY; TOXICOLOGY; TOXICITY; STRATEGY; VERIFY; BEWARE; CHEMBL; TRUST;
D O I
10.1177/02611929211029635
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
New Approach Methodologies (NAMs) that employ artificial intelligence (AI) for predicting adverse effects of chemicals have generated optimistic expectations as alternatives to animal testing. However, the major underappreciated challenge in developing robust and predictive AI models is the impact of the quality of the input data on the model accuracy. Indeed, poor data reproducibility and quality have been frequently cited as factors contributing to the crisis in biomedical research, as well as similar shortcomings in the fields of toxicology and chemistry. In this article, we review the most recent efforts to improve confidence in the robustness of toxicological data and investigate the impact that data curation has on the confidence in model predictions. We also present two case studies demonstrating the effect of data curation on the performance of AI models for predicting skin sensitisation and skin irritation. We show that, whereas models generated with uncurated data had a 7-24% higher correct classification rate (CCR), the perceived performance was, in fact, inflated owing to the high number of duplicates in the training set. We assert that data curation is a critical step in building computational models, to help ensure that reliable predictions of chemical toxicity are achieved through use of the models.
引用
收藏
页码:73 / 82
页数:10
相关论文
共 84 条
  • [11] [Anonymous], 1992, OECD Guidelines for the Testing of Chemicals
  • [12] Prediction of toxicity from chemical structure
    Barratt, MD
    [J]. CELL BIOLOGY AND TOXICOLOGY, 2000, 16 (01) : 1 - 13
  • [13] An integrated chemical environment with tools for chemical safety testing
    Bell, Shannon
    Abedini, Jaleh
    Ceger, Patricia
    Chang, Xiaoqing
    Cook, Bethany
    Karmaus, Agnes L.
    Lea, Isabel
    Mansouri, Kamel
    Phillips, Jason
    McAfee, Eric
    Rai, Ruhi
    Rooney, John
    Sprankle, Catherine
    Tandon, Arpit
    Allen, David
    Casey, Warren
    Kleinstreuer, Nicole
    [J]. TOXICOLOGY IN VITRO, 2020, 67
  • [14] An open source chemical structure curation pipeline using RDKit
    Bento, A. Patricia
    Hersey, Anne
    Felix, Eloy
    Landrum, Greg
    Gaulton, Anna
    Atkinson, Francis
    Bellis, Louisa J.
    De Veij, Marleen
    Leach, Andrew R.
    [J]. JOURNAL OF CHEMINFORMATICS, 2020, 12 (01)
  • [15] KNIME:: The Konstanz Information Miner
    Berthold, Michael R.
    Cebron, Nicolas
    Dill, Fabian
    Gabriel, Thomas R.
    Koetter, Tobias
    Meinl, Thorsten
    Ohl, Peter
    Sieb, Christoph
    Thiel, Kilian
    Wiswedel, Bernd
    [J]. DATA ANALYSIS, MACHINE LEARNING AND APPLICATIONS, 2008, : 319 - 326
  • [16] Borba JVVB., 2020, CHEMRXIV13283930
  • [17] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [18] Development of a curated Hershberger database
    Browne, P.
    Kleinstreuer, N. C.
    Ceger, P.
    Deisenroth, C.
    Baker, N.
    Markey, K.
    Thomas, R. S.
    Judson, R. J.
    Casey, W.
    [J]. REPRODUCTIVE TOXICOLOGY, 2018, 81 : 259 - 271
  • [19] Chakravarti D., 2018, COMPUTERIZED CHEM TO
  • [20] QSAR Modeling: Where Have You Been? Where Are You Going To?
    Cherkasov, Artem
    Muratov, Eugene N.
    Fourches, Denis
    Varnek, Alexandre
    Baskin, Igor I.
    Cronin, Mark
    Dearden, John
    Gramatica, Paola
    Martin, Yvonne C.
    Todeschini, Roberto
    Consonni, Viviana
    Kuz'min, Victor E.
    Cramer, Richard
    Benigni, Romualdo
    Yang, Chihae
    Rathman, James
    Terfloth, Lothar
    Gasteiger, Johann
    Richard, Ann
    Tropsha, Alexander
    [J]. JOURNAL OF MEDICINAL CHEMISTRY, 2014, 57 (12) : 4977 - 5010