Identification of all-against-all protein-protein interactions based on deep hash learning

被引:5
作者
Jiang, Yue [1 ]
Wang, Yuxuan [3 ]
Shen, Lin [1 ,2 ]
Adjeroh, Donald A. [4 ]
Liu, Zhidong [2 ,3 ]
Lin, Jie [1 ]
机构
[1] Fujian Normal Univ, Coll Comp & Cyber Secur, Fuzhou 350108, Peoples R China
[2] Capital Med Univ, Beijing Chest Hosp, Dept Thorac Surg 2, Beijing 101149, Peoples R China
[3] Thorac Tumor Res Inst, Beijing 101149, Peoples R China
[4] West Virginia Univ, Lane Dept Comp Sci & Elect Engn, Morgantown, WV 26506 USA
基金
美国国家科学基金会;
关键词
Protein-protein interaction; Deep learning; Binary hash code; Binary search; Hamming distance; PREDICTION; SEQUENCE; EXTRACTION; DATABASE; COMPLEX;
D O I
10.1186/s12859-022-04811-x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Protein-protein interaction (PPI) is vital for life processes, disease treatment, and drug discovery. The computational prediction of PPI is relatively inexpensive and efficient when compared to traditional wet-lab experiments. Given a new protein, one may wish to find whether the protein has any PPI relationship with other existing proteins. Current computational PPI prediction methods usually compare the new protein to existing proteins one by one in a pairwise manner. This is time consuming. Results: In this work, we propose a more efficient model, called deep hash learning protein-and-protein interaction (DHL-PPI), to predict all-against-all PPI relationships in a database of proteins. First, DHL-PPI encodes a protein sequence into a binary hash code based on deep features extracted from the protein sequences using deep learning techniques. This encoding scheme enables us to turn the PPI discrimination problem into a much simpler searching problem. The binary hash code for a protein sequence can be regarded as a number. Thus, in the pre-screening stage of DHL-PPI, the string matching problem of comparing a protein sequence against a database with M proteins can be transformed into a much more simpler problem: to find a number inside a sorted array of length M. This pre-screening process narrows down the search to a much smaller set of candidate proteins for further confirmation. As a final step, DHL-PPI uses the Hamming distance to verify the final PPI relationship. Conclusions: The experimental results confirmed that DHL-PPI is feasible and effective. Using a dataset with strictly negative PPI examples of four species, DHL-PPI is shown to be superior or competitive when compared to the other state-of-the-art methods in terms of precision, recall or F1 score. Furthermore, in the prediction stage, the proposed DHL-PPI reduced the time complexity from O(M-2) to O(M log M)for performing an all-against-all PPI prediction for a database with M proteins. With the proposed approach, a protein database can be preprocessed and stored for later search using the proposed encoding scheme. This can provide a more efficient way to cope with the rapidly increasing volume of protein datasets.
引用
收藏
页数:19
相关论文
共 52 条
[1]  
Adjeroh D., 2008, The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching, Vsecond, DOI 10.1007/978-0-387-78909-5
[2]   Feature-Based and String-Based Models for Predicting RNA-Protein Interaction [J].
Adjeroh, Donald ;
Allaga, Maen ;
Tan, Jun ;
Lin, Jie ;
Jiang, Yue ;
Abbasi, Ahmed ;
Zhou, Xiaobo .
MOLECULES, 2018, 23 (03)
[3]   Text mining and its potential applications in systems biology [J].
Ananiadou, Sophia ;
Kell, Douglas B. ;
Tsujii, Jun-ichi .
TRENDS IN BIOTECHNOLOGY, 2006, 24 (12) :571-579
[4]  
[Anonymous], 2015, IEEE T IMAGE PROCESS, DOI [DOI 10.1109/TIP.2015.2467315, 10.1109/TIP.2015.2467315]
[5]  
[Anonymous], 1999, CONVOLUTION KERNELS
[6]  
Browne F, 2007, PROCEEDINGS OF THE 7TH IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, VOLS I AND II, P1365
[7]  
Bunescu Razvan., 2006, P WORKSHOP LINKING N, P49
[8]   Order-Sensitive Deep Hashing for Multimorbidity Medical Image Retrieval [J].
Chen, Zhixiang ;
Cai, Ruojin ;
Lu, Jiwen ;
Feng, Jianjiang ;
Zhou, Jie .
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2018, PT I, 2018, 11070 :620-628
[9]  
Chollet F., 2017, Deep Learning with Python
[10]   DeepPPI: Boosting Prediction of Protein-Protein Interactions with Deep Neural Networks [J].
Du, Xiuquan ;
Sun, Shiwei ;
Hu, Changlin ;
Yao, Yu ;
Yan, Yuanting ;
Zhang, Yanping .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2017, 57 (06) :1499-1510