Associating biological context with protein-protein interactions through text mining at PubMed scale

被引:2
|
作者
Sosa, Daniel N. [1 ]
Hintzen, Rogier [2 ]
Xiong, Betty [1 ]
de Giorgio, Alex [2 ]
Fauqueur, Julien [2 ]
Davies, Mark [2 ]
Lever, Jake [3 ]
Altman, Russ B. [4 ,5 ]
机构
[1] Stanford Univ, Dept Biomed Data Sci, Stanford, CA USA
[2] BenevolentAI, London, England
[3] Univ Glasgow, Glasgow, Scotland
[4] Stanford Univ, Dept Bioengn, Stanford, CA 94305 USA
[5] Stanford Univ, Dept Genet, Stanford, CA USA
关键词
Literature-based discovery; NLP; Knowledge graphs; Cellular biology; Artificial intelligence;
D O I
10.1016/j.jbi.2023.104474
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Inferring knowledge from known relationships between drugs, proteins, genes, and diseases has great potential for clinical impact, such as predicting which existing drugs could be repurposed to treat rare diseases. Incorporating key biological context such as cell type or tissue of action into representations of extracted biomedical knowledge is essential for principled pharmacological discovery. Existing global, literature-derived knowledge graphs of interactions between drugs, proteins, genes, and diseases lack this essential information. In this study, we frame the task of associating biological context with protein-protein interactions extracted from text as a classification task using syntactic, semantic, and novel meta-discourse features. We introduce the Insider corpora, which are automatically generated PubMed-scale corpora for training classifiers for the context association task. These corpora are created by searching for precise syntactic cues of cell type and tissue relevancy to extracted regulatory relations. We report F1 scores of 0.955 and 0.862 for identifying relevant cell types and tissues, respectively, for our identified relations. By classifying with this framework, we demonstrate that the problem of context association can be addressed using intuitive, interpretable features. We demonstrate the potential of this approach to enrich text-derived knowledge bases with biological detail by incorporating cell type context into a protein-protein network for dengue fever.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Integrating protein-protein interactions and text mining for protein function prediction
    Jaeger, Samira
    Gaudan, Sylvain
    Leser, Ulf
    Rebholz-Schuhmann, Dietrich
    BMC BIOINFORMATICS, 2008, 9 (Suppl 8)
  • [2] Integrating protein-protein interactions and text mining for protein function prediction
    Samira Jaeger
    Sylvain Gaudan
    Ulf Leser
    Dietrich Rebholz-Schuhmann
    BMC Bioinformatics, 9
  • [3] Mining new protein-protein interactions
    Mamitsuka, H
    IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE, 2005, 24 (03): : 103 - 108
  • [4] Mining from protein-protein interactions
    Mamitsuka, Hiroshi
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 2 (05) : 400 - 410
  • [5] Mining literature for protein-protein interactions
    Marcotte, EM
    Xenarios, I
    Eisenberg, D
    BIOINFORMATICS, 2001, 17 (04) : 359 - 363
  • [6] Finding the evidence for protein-protein interactions from PubMed abstracts
    Jang, Hyunchul
    Lim, Jaesoo
    Lim, Joon-Ho
    Park, Soo-Jun
    Lee, Kyu-Chul
    Park, Seon-Hee
    BIOINFORMATICS, 2006, 22 (14) : E220 - E226
  • [7] Validating text mining results on protein-protein interactions using gene expression profiles
    Zhou, Deyu
    He, Yulan
    Kwoh, Chee Keong
    2006 INTERNATIONAL CONFERENCE ON BIOMEDICAL AND PHARMACEUTICAL ENGINEERING, VOLS 1 AND 2, 2006, : 577 - +
  • [8] Protein Function Assignment through Mining Cross-Species Protein-Protein Interactions
    Chen, Xue-wen
    Liu, Mei
    Ward, Robert
    PLOS ONE, 2008, 3 (02):
  • [9] Data mining methods for protein-protein interactions
    Nafar, Zahra
    Golshani, Ashkan
    2006 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-5, 2006, : 2090 - +
  • [10] Predicting protein-protein interactions by association mining
    Kotlyar, M
    Jurisica, I
    INFORMATION SYSTEMS FRONTIERS, 2006, 8 (01) : 37 - 46