A CTD-Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug-disease and drug-phenotype interactions

被引:69
作者
Davis, Allan Peter [1 ]
Wiegers, Thomas C. [1 ]
Roberts, Phoebe M. [2 ]
King, Benjamin L. [3 ]
Lay, Jean M. [1 ]
Lennon-Hopkins, Kelley [1 ]
Sciaky, Daniela [1 ]
Johnson, Robin [3 ]
Keating, Heather [3 ]
Greene, Nigel [4 ]
Hernandez, Robert [5 ]
McConnell, Kevin J. [6 ]
Enayetallah, Ahmed E. [7 ]
Mattingly, Carolyn J. [1 ]
机构
[1] N Carolina State Univ, Dept Biol Sci, Raleigh, NC 27695 USA
[2] Pfizer Inc, Computat Sci Ctr Emphasis, Cambridge, MA 02139 USA
[3] MDI Biol Lab, Dept Bioinformat, Salsbury Cove, ME 04672 USA
[4] Pfizer Inc, Compound Safety Predict, Groton, CT 06340 USA
[5] Pfizer Inc, Computat Sci Ctr Emphasis, Sandwich CT13 9NJ, Kent, England
[6] Pfizer Inc, Computat Sci Ctr Emphasis, Groton, CT 06340 USA
[7] Pfizer Inc, Drug Safety Res & Dev, Groton, CT 06340 USA
来源
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2013年
关键词
COMPARATIVE TOXICOGENOMICS DATABASE; PROFILES; RESOURCE; TOOL;
D O I
10.1093/database/bat080
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Improving the prediction of chemical toxicity is a goal common to both environmental health research and pharmaceutical drug development. To improve safety detection assays, it is critical to have a reference set of molecules with well-defined toxicity annotations for training and validation purposes. Here, we describe a collaboration between safety researchers at Pfizer and the research team at the Comparative Toxicogenomics Database (CTD) to text mine and manually review a collection of 88 629 articles relating over 1 200 pharmaceutical drugs to their potential involvement in cardiovascular, neurological, renal and hepatic toxicity. In 1 year, CTD biocurators curated 2 54 173 toxicogenomic interactions (1 52 173 chemical-disease, 58 572 chemical-gene, 5 345 gene-disease and 38 083 phenotype interactions). All chemical-gene-disease interactions are fully integrated with public CTD, and phenotype interactions can be downloaded. We describe Pfizer's text-mining process to collate the articles, and CTD's curation strategy, performance metrics, enhanced data content and new module to curate phenotype information. As well, we show how data integration can connect phenotypes to diseases. This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug-disease events. The availability of these detailed, contextualized, high-quality annotations curated from seven decades' worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival. This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities.
引用
收藏
页数:16
相关论文
共 32 条
  • [1] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [2] A computational approach to chemical etiologies of diabetes
    Audouze, Karine
    Brunak, Soren
    Grandjean, Philippe
    [J]. SCIENTIFIC REPORTS, 2013, 3
  • [3] Burge S., 2012, Database
  • [4] Adverse Drug Events: Database Construction and in Silico Prediction
    Cheng, Feixiong
    Li, Weihua
    Wang, Xichuan
    Zhou, Yadi
    Wu, Zengrui
    Shen, Jie
    Tang, Yun
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2013, 53 (04) : 744 - 752
  • [5] Prediction of Polypharmacological Profiles of Drugs by the Integration of Chemical, Side Effect, and Therapeutic Space
    Cheng, Feixiong
    Li, Weihua
    Wu, Zengrui
    Wang, Xichuan
    Zhang, Chen
    Li, Jie
    Liu, Guixia
    Tang, Yun
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2013, 53 (04) : 753 - 762
  • [6] Generation and application of drug indication inference models using typed network motif comparison analysis
    Choi, Jaejoon
    Kim, Kwangmin
    Song, Min
    Lee, Doheon
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2013, 13
  • [7] Davis AS, 2012, PLOS ONE, V7, DOI [10.1371/journal.pone.0047149, 10.1371/journal.pone.0046524, 10.1371/journal.pone.0047981]
  • [8] The Comparative Toxicogenomics Database facilitates identification and understanding of chemical-gene-disease associations: arsenic as a case study
    Davis, Allan P.
    Murphy, Cynthia G.
    Rosenstein, Michael C.
    Wiegers, Thomas C.
    Mattingly, Carolyn J.
    [J]. BMC MEDICAL GENOMICS, 2008, 1 (1)
  • [9] Text Mining Effectively Scores and Ranks the Literature for Improving Chemical-Gene-Disease Curation at the Comparative Toxicogenomics Database
    Davis, Allan Peter
    Wiegers, Thomas C.
    Johnson, Robin J.
    Lay, Jean M.
    Lennon-Hopkins, Kelley
    Saraceni-Richards, Cynthia
    Sciaky, Daniela
    Murphy, Cynthia Grondin
    Mattingly, Carolyn J.
    [J]. PLOS ONE, 2013, 8 (04):
  • [10] The curation paradigm and application tool used for manual curation of the scientific literature at the Comparative Toxicogenomics Database
    Davis, Allan Peter
    Wiegers, Thomas C.
    Murphy, Cynthia G.
    Mattingly, Carolyn J.
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2011,