EPA's DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research

被引:118
作者
Grulke C.M. [1 ]
Williams A.J. [1 ]
Thillanadarajah I. [2 ]
Richard A.M. [1 ]
机构
[1] National Center for Computational Toxicology, Office of Research & Development, US Environmental Protection Agency, Mail Drop D143-02, Research Triangle Park, 27711, NC
[2] Senior Environmental Employment Program, US Environmental Protection Agency, Research Triangle Park, 27711, NC
关键词
Chemistry database; Computational toxicology; Data quality; DSSTox; Environmental science; QSAR; Structure curation;
D O I
10.1016/j.comtox.2019.100096
中图分类号
学科分类号
摘要
The US Environmental Protection Agency's (EPA) Distributed Structure-Searchable Toxicity (DSSTox) database, launched publicly in 2004, currently exceeds 875 K substances spanning hundreds of lists of interest to EPA and environmental researchers. From its inception, DSSTox has focused curation efforts on resolving chemical identifier errors and conflicts in the public domain towards the goal of assigning accurate chemical structures to data and lists of importance to the environmental research and regulatory community. Accurate structure-data associations, in turn, are necessary inputs to structure-based predictive models supporting hazard and risk assessments. In 2014, the legacy, manually curated DSSTox_V1 content was migrated to a MySQL data model, with modern cheminformatics tools supporting both manual and automated curation processes to increase efficiencies. This was followed by sequential auto-loads of filtered portions of three public datasets: EPA's Substance Registry Services (SRS), the National Library of Medicine's ChemID, and PubChem. This process was constrained by a key requirement of uniquely mapped identifiers (i.e., CAS RN, name and structure) for each substance, rejecting content where any two identifiers were conflicted either within or across datasets. This rejected content highlighted the degree of conflicting, inaccurate substance-structure ID mappings in the public domain, ranging from 12% (within EPA SRS) to 49% (across ChemID and PubChem). Substances successfully added to DSSTox from each auto-load were assigned to one of five qc_levels, conveying curator confidence in each dataset. This process enabled a significant expansion of DSSTox content to provide better coverage of the chemical landscape of interest to environmental scientists, while retaining focus on the accuracy of substance-structure-data associations. Currently, DSSTox serves as the core foundation of EPA's CompTox Chemicals Dashboard [https://comptox.epa.gov/dashboard], which provides public access to DSSTox content in support of a broad range of modeling and research activities within EPA and, increasingly, across the field of computational toxicology. © 2019
引用
收藏
相关论文
共 40 条
  • [1] Richard A.M., DSSTox Website launch: improving public access to databases for building structure-toxicity prediction models, Preclinica, 2, pp. 103-108, (2004)
  • [2] Richard A.M., Yang C., Judson R.S., Toxicity data informatics: supporting a new paradigm for toxicity prediction, Toxicol. Mech. Methods, 18, 2-3, pp. 103-118, (2008)
  • [3] Richard A.M., Gold L.S., Nicklaus M.C., Chemical structure indexing of toxicity data on the internet: moving toward a flat world, Curr. Opin. Drug Discov. Dev., 9, 3, pp. 314-325, (2006)
  • [4] Richard A.M., Williams C.R., Distributed structure-searchable toxicity (DSSTox) public database network: a proposal, Mutat. Res., 499, 1, pp. 27-52, (2002)
  • [5] Bolton E.E., Wang Y., Thiessen P.A., (2008)
  • [6] Kaiser J., NIH gears up for chemical genomics, Science, 304, 5678, (2004)
  • [7] Pence H.E., Williams A., ChemSpider: an online chemical information resource, J. Chem. Educ., 87, 11, pp. 1123-1124, (2010)
  • [8] Hahnke V.D., Kim S., Bolton E.E., PubChem chemical structure standardization, J. Cheminf., 10, 1, (2018)
  • [9] Karapetyan K., Batchelor C., Sharpe D., Tkachenko V., Williams A.J., The chemical validation and standardization platform (CVSP): large-scale automated validation of chemical structure datasets, J. Cheminf., 7, 1, (2015)
  • [10] Council N.R., Toxicity Testing in the 21st Century: A Vision and a Strategy, (2007)