Associating biological context with protein-protein interactions through text mining at PubMed scale

被引：2

作者：

Sosa, Daniel N. ^{[1
]}

Hintzen, Rogier ^{[2
]}

Xiong, Betty ^{[1
]}

de Giorgio, Alex ^{[2
]}

Fauqueur, Julien ^{[2
]}

Davies, Mark ^{[2
]}

Lever, Jake ^{[3
]}

Altman, Russ B. ^{[4
,5
]}

机构：

[1] Stanford Univ, Dept Biomed Data Sci, Stanford, CA USA

[2] BenevolentAI, London, England

[3] Univ Glasgow, Glasgow, Scotland

[4] Stanford Univ, Dept Bioengn, Stanford, CA 94305 USA

[5] Stanford Univ, Dept Genet, Stanford, CA USA

来源：

JOURNAL OF BIOMEDICAL INFORMATICS | 2023年 / 145卷

关键词：

Literature-based discovery; NLP; Knowledge graphs; Cellular biology; Artificial intelligence;

D O I：

10.1016/j.jbi.2023.104474

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Inferring knowledge from known relationships between drugs, proteins, genes, and diseases has great potential for clinical impact, such as predicting which existing drugs could be repurposed to treat rare diseases. Incorporating key biological context such as cell type or tissue of action into representations of extracted biomedical knowledge is essential for principled pharmacological discovery. Existing global, literature-derived knowledge graphs of interactions between drugs, proteins, genes, and diseases lack this essential information. In this study, we frame the task of associating biological context with protein-protein interactions extracted from text as a classification task using syntactic, semantic, and novel meta-discourse features. We introduce the Insider corpora, which are automatically generated PubMed-scale corpora for training classifiers for the context association task. These corpora are created by searching for precise syntactic cues of cell type and tissue relevancy to extracted regulatory relations. We report F1 scores of 0.955 and 0.862 for identifying relevant cell types and tissues, respectively, for our identified relations. By classifying with this framework, we demonstrate that the problem of context association can be addressed using intuitive, interpretable features. We demonstrate the potential of this approach to enrich text-derived knowledge bases with biological detail by incorporating cell type context into a protein-protein network for dengue fever.

引用

页数：12

共 50 条

[41] Protein-protein interactions in the membrane: Sequence, structural, and biological motifs
Moore, David T.
Berger, Bryan W.
DeGrado, William F.
STRUCTURE, 2008, 16 (07) : 991 - 1001
[42] Protein-protein interactions
Creeth, J. M.
NATURE, 1981, 294 (5839) : 384 - 384
[43] PIE: an online prediction system for protein-protein interactions from text
Kim, Sun
Shin, Soo-Yong
Lee, In-Hee
Kim, Soo-Jin
Sriram, Ram
Zhang, Byoung-Tak
NUCLEIC ACIDS RESEARCH, 2008, 36 : W411 - W415
[44] Protein-protein interactions
Alexov, Emil
CURRENT PHARMACEUTICAL BIOTECHNOLOGY, 2008, 9 (02) : 55 - 56
[45] Protein-protein interactions
Janin, Joel
Bonvin, Alexandre M. J. J.
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2013, 23 (06) : 859 - 861
[46] PROTEIN-PROTEIN INTERACTIONS
ONCLEY, JL
ELLENBOGEN, E
GITLIN, D
GURD, FRN
JOURNAL OF PHYSICAL CHEMISTRY, 1952, 56 (01): : 85 - 92
[47] Protein-Protein Interactions
LC GC NORTH AMERICA, 2009, : 20 - 20
[48] Protein-Protein Interactions
Netterwald, James
GENETIC ENGINEERING & BIOTECHNOLOGY NEWS, 2010, 30 (05): : 1 - +
[49] Protein-protein interactions
Mayer, BJ
METHODS, 2001, 24 (03) : 191 - 193
[50] Protein-protein interactions
Chene, Patrick
DRUGS OF THE FUTURE, 2007, 32 : 3 - 3

← 1 2 3 4 5 →