A semantic similarity based methodology for predicting protein-protein interactions: Evaluation with P53-interacting kinases

被引:2
作者
Cox, Steven [1 ]
Dong, Xialan [2 ,3 ]
Rai, Ruhi [1 ]
Christopherson, Laura [1 ]
Zheng, Weifan [2 ,3 ,4 ]
Tropsha, Alexander [1 ,4 ]
Schmitt, Charles [1 ]
机构
[1] Univ N Carolina, Renaissance Comp Inst RENCI, Chapel Hill, NC 27599 USA
[2] North Carolina Cent Univ, Coll Hlth & Sci, Dept Pharmaceut Sci, Lab Mol Informat & Data Sci, Durham, NC 27707 USA
[3] North Carolina Cent Univ, Coll Hlth & Sci, BRITE Inst, Durham, NC 27707 USA
[4] Univ N Carolina, UNC Eshelman Sch Pharm, Chapel Hill, NC 27599 USA
基金
美国国家卫生研究院;
关键词
Text mining; Word2Vec; Semantic similarity; Drug repurposing; Protein protein interaction;
D O I
10.1016/j.jbi.2020.103579
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Biomedical literature contains unstructured, rich information regarding proteins, ligands, diseases as well as biological pathways in which they are involved. Systematically analyzing such textual corpus has the potential for biomedical discovery of new protein-protein interactions and hidden drug indications. For this purpose, we have investigated a methodology that is based on a well-established text mining tool, Word2Vec, for the analysis of PubMed full text articles to derive word embeddings, and the use of a simple semantic similarity comparison either by itself or in conjunction with k-Nearest Neighbor (kNN) technique for the prediction of new relationships. To test this methodology, three lines of retrospective analyses of a dataset with known P53-interacting proteins have been conducted. First, we demonstrated that Word2Vec semantic similarity can infer functional relatedness among all kinases known to interact with P53. Second, in a series of time-split experiments, we demonstrated that both a simple similarity comparison and kNN models built with papers published up to a certain year were able to discover P53 interactors described in later publications. Third, in a different scenario of time-split experiments, we examined the predictions of P53-interacting proteins based on the kNN models built on data prior to a certain split year for different time ranges past that year, and found that the cumulative number of correct predictions was indeed increasing with time. We conclude that text mining of research papers in the PubMed literature based on Word2Vec analysis followed by a simple similarity comparison or kNN modeling affords excellent predictions of protein-protein interactions between P53 and kinases, and should have wide applications in translational biomedical studies such as repurposing of existing drugs, drug-drug interaction, and elucidation of mechanisms of action for drugs.
引用
收藏
页数:9
相关论文
共 27 条
[1]   Untitled [J].
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D1-D1
[2]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[3]   Discovering novel protein-protein interactions by measuring the protein semantic similarity from the biomedical literature [J].
Chiang, Jung-Hsien ;
Ju, Jiun-Huang .
JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2014, 12 (06)
[4]   Bio-SimVerb and Bio-SimLex: wide-coverage evaluation sets of word similarity in biomedicine [J].
Chiu, Billy ;
Pyysalo, Sampo ;
Vulic, Ivan ;
Korhonen, Anna .
BMC BIOINFORMATICS, 2018, 19
[5]   G2Vec: Distributed gene representations for identification of cancer prognostic genes [J].
Choi, Jonghwan ;
Oh, Ilhwan ;
Seo, Sangmin ;
Ahn, Jaegyoon .
SCIENTIFIC REPORTS, 2018, 8
[6]  
Corrado G., 2013, WORKSH P INT C LEARN, V1301, P3781
[7]   Watson: Beyond Jeopardy! [J].
Ferrucci, David ;
Levas, Anthony ;
Bagchi, Sugato ;
Gondek, David ;
Mueller, Erik T. .
ARTIFICIAL INTELLIGENCE, 2013, 199 :93-105
[8]   Large-Scale Discovery of Disease-Disease and Disease-Gene Associations [J].
Gligorijevic, Djordje ;
Stojanovic, Jelena ;
Djuric, Nemanja ;
Radosavljevic, Vladan ;
Grbovic, Mihajlo ;
Kulathinal, Rob J. ;
Obradovic, Zoran .
SCIENTIFIC REPORTS, 2016, 6
[9]   A Shortest Dependency Path Based Convolutional Neural Network for Protein-Protein Relation Extraction [J].
Hua, Lei ;
Quan, Chanqin .
BIOMED RESEARCH INTERNATIONAL, 2016, 2016
[10]   Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition [J].
Jaeger, Sabrina ;
Fulle, Simone ;
Turk, Samo .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2018, 58 (01) :27-35