Data-Driven Quantitative Structure-Activity Relationship Modeling for Human Carcinogenicity by Chronic Oral Exposure

被引:18
作者
Chung, Elena [1 ]
Russo, Daniel P. [1 ]
Ciallella, Heather L. [2 ]
Wang, Yu-Tang [3 ]
Wu, Min [4 ]
Aleksunes, Lauren M. [5 ]
Zhu, Hao [1 ]
机构
[1] Rowan Univ, Dept Chem & Biochem, Glassboro, NJ 08028 USA
[2] Cuyahoga Cty Med Examiners Off, Dept Toxicol, Cleveland, OH 44106 USA
[3] Chinese Acad Agr Sci, Minist Agr, Inst Agroprod Proc Sci & Technol, Key Lab Agroprod Proc, Beijing 100193, Peoples R China
[4] China Pharmaceut Univ, Sch Life Sci & Technol, Nanjing 210009, Peoples R China
[5] Rutgers State Univ, Ernest Mario Sch Pharm, Dept Pharmacol & Toxicol, Piscataway, NJ 08854 USA
基金
英国科研创新办公室;
关键词
quantitative structure-activity relationships; models; carcinogens; big data; data mining; machine learning; THROUGHPUT SCREENING DATA; RISK-ASSESSMENT; INHALATION EXPOSURE; RECEPTOR-BINDING; READ-ACROSS; QSAR; TOXICITY; CANCER; TOXICOLOGY; VALIDATION;
D O I
10.1021/acs.est.3c00648
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Traditional methodologies for assessing chemical toxicity are expensive and time-consuming. Computational modeling approaches have emerged as low-cost alternatives, especially those used to develop quantitative structure-activity relationship (QSAR) models. However, conventional QSAR models have limited training data, leading to low predictivity for new compounds. We developed a data-driven modeling approach for constructing carcinogenicityrelated models and used these models to identify potential new human carcinogens. To this goal, we used a probe carcinogen dataset from the US Environmental Protection Agency's Integrated Risk Information System (IRIS) to identify relevant PubChem bioassays. Responses of 25 PubChem assays were significantly relevant to carcinogenicity. Eight assays inferred carcinogenicity predictivity and were selected for QSAR model training. Using 5 machine learning algorithms and 3 types of chemical fingerprints, 15 QSAR models were developed for each PubChem assay dataset. These models showed acceptable predictivity during 5-fold cross-validation (average CCR = 0.71). Using our QSAR models, we can correctly predict and rank 342 IRIS compounds' carcinogenic potentials (PPV = 0.72). The models predicted potential new carcinogens, which were validated by a literature search. This study portends an automated technique that can be applied to prioritize potential toxicants using validated QSAR models based on extensive training sets from public data resources.
引用
收藏
页码:6573 / 6588
页数:16
相关论文
共 128 条
[21]   Chloroform inhalation exposure conditions necessary to initiate liver toxicity in female B6C3F1 mice [J].
Constan, AA ;
Wong, BA ;
Everitt, JI ;
Butterworth, BE .
TOXICOLOGICAL SCIENCES, 2002, 66 (02) :201-208
[22]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[23]   The ToxCast program for prioritizing toxicity testing of environmental chemicals [J].
Dix, David J. ;
Houck, Keith A. ;
Martin, Matthew T. ;
Richard, Ann M. ;
Setzer, R. Woodrow ;
Kavlock, Robert J. .
TOXICOLOGICAL SCIENCES, 2007, 95 (01) :5-12
[24]   Influence of functional group substitutions on the carcinogenicity of anthraquinone in rats and mice: Analysis of long-term bioassays by the National Cancer Institute and the National Toxicology Program [J].
Doi, AM ;
Irwin, RD ;
Bucher, JR .
JOURNAL OF TOXICOLOGY AND ENVIRONMENTAL HEALTH-PART B-CRITICAL REVIEWS, 2005, 8 (02) :109-126
[25]  
ECHA, About us
[26]  
European Commission, EU PEST DAT
[27]  
Felton J., 1990, Chemical carcinogenesis and mutagenesis I, P471
[28]  
Fjodorova N., CHEM CENT J
[29]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[30]  
Golbraikh A., 2012, Handbook of Computational Chemistry, P1309, DOI [DOI 10.1007/978-94-007-0711-5_37, 10.1007/978-94-007-0711-5 53, DOI 10.1007/978-94-007-0711-553]