Identification of adaptor proteins by incorporating deep learning and PSSM profiles

被引:5
作者
Gao, Wentao [1 ]
Xu, Dali [1 ]
Li, Hongfei [1 ]
Du, Junping [2 ]
Wang, Guohua [1 ]
Li, Dan [1 ]
机构
[1] Northeast Forestry Univ, Coll Informat & Comp Engn, Harbin 150000, Chin, Myanmar
[2] Beijing Univ Posts & Telecommun, Sch Comp Sci, Beijing Key Lab Intelligent Telecommun Software &, Beijing 100876, Peoples R China
关键词
PSSM; biLSTM; Adaptor proteins; Deep learning; PREDICTION; NETWORKS; ENSEMBLE; PSEAAC; SITES; KNN;
D O I
10.1016/j.ymeth.2022.11.001
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Adaptor proteins, also known as signal transduction adaptor proteins, are important proteins in signal trans-duction pathways, and play a role in connecting signal proteins for signal transduction between cells. Studies have shown that adaptor proteins are closely related to some diseases, such as tumors and diabetes. Therefore, it is very meaningful to construct a relevant model to accurately identify adaptor proteins. In recent years, many studies have used a position-specific scoring matrix (PSSM) and neural network methods to identify adaptor proteins. However, ordinary neural network models cannot correlate the contextual information in PSSM profiles well, so these studies usually process 20 x N (N > 20) PSSM into 20 x 20 dimensions, which results in the loss of a large amount of protein information; This research proposes an efficient method that combines one-dimensional convolution (1-D CNN) and a bidirectional long short-term memory network (biLSTM) to identify adaptor proteins. The complete PSSM profiles are the input of the model, and the complete information of the protein is retained during the training process. We perform cross-validation during model training and test the performance of the model on an independent test set; in the data set with 1224 adaptor proteins and 11,078 non-adaptor proteins, five indicators including specificity, sensitivity, accuracy, area under the receiver operating characteristic curve (AUC) metric and Matthews correlation coefficient (MCC), were employed to evaluate model performance. On the independent test set, the specificity, sensitivity, accuracy and MCC were 0.817, 0.865, 0.823 and 0.465, respectively. Those results show that our method is better than the state-of-the art methods. This study is committed to improve the accuracy of adaptor protein identification, and laid a foundation for further research on diseases related to adaptor protein. This research provided a new idea for the application of deep learning related models in bioinformatics and computational biology.
引用
收藏
页码:10 / 17
页数:8
相关论文
共 61 条
[1]   DeepCDA: deep cross-domain compound-protein affinity prediction through LSTM and convolutional neural networks [J].
Abbasi, Karim ;
Razzaghi, Parvin ;
Poso, Antti ;
Amanlou, Massoud ;
Ghasemi, Jahan B. ;
Masoudi-Nejad, Ali .
BIOINFORMATICS, 2020, 36 (17) :4633-4642
[2]   Prediction of antioxidant proteins using hybrid feature representation method and random forest [J].
Ao, Chunyan ;
Zhou, Wenyang ;
Gao, Lin ;
Dong, Benzhi ;
Yu, Liang .
GENOMICS, 2020, 112 (06) :4666-4674
[3]   CNN-based transfer learning-BiLSTM network: A novel approach for COVID-19 infection detection [J].
Aslan, Muhammet Fatih ;
Unlersen, Muhammed Fahri ;
Sabanci, Kadir ;
Durdu, Akif .
APPLIED SOFT COMPUTING, 2021, 98
[4]   iTSP-PseAAC: Identifying Tumor Suppressor Proteins by Using Fully Connected Neural Network and PseAAC [J].
Awais, Muhammad ;
Hussain, Waqar ;
Rasool, Nouman ;
Khan, Yaser Daanial .
CURRENT BIOINFORMATICS, 2021, 16 (05) :700-709
[5]   ITP-Pred: an interpretable method for predicting, therapeutic peptides with fused features low-dimension representation [J].
Cai, Lijun ;
Wang, Li ;
Fu, Xiangzheng ;
Xia, Chenxing ;
Zeng, Xiangxiang ;
Zou, Quan .
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (04)
[6]   EvolStruct-Phogly: incorporating structural properties and evolutionary information from profile bigrams for the phosphoglycerylation prediction [J].
Chandra, Abel Avitesh ;
Sharma, Alok ;
Dehzangi, Abdollah ;
Tsunoda, Tatushiko .
BMC GENOMICS, 2019, 19 (Suppl 9)
[7]   BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides [J].
Charoenkwan, Phasit ;
Nantasenamat, Chanin ;
Hasan, Md Mehedi ;
Manavalan, Balachandran ;
Shoombuatong, Watshara .
BIOINFORMATICS, 2021, 37 (17) :2556-2562
[8]   Prediction of transporter targets using efficient RBF networks with PSSM profiles and biochemical properties [J].
Chen, Shu-An ;
Ou, Yu-Yen ;
Lee, Tzong-Yi ;
Gromiha, M. Michael .
BIOINFORMATICS, 2011, 27 (15) :2062-2067
[9]   MUFFIN: multi-scale feature fusion for drug-drug interaction prediction [J].
Chen, Yujie ;
Ma, Tengfei ;
Yang, Xixi ;
Wang, Jianmin ;
Song, Bosheng ;
Zeng, Xiangxiang .
BIOINFORMATICS, 2021, 37 (17) :2651-2658
[10]   DeepYY1: a deep learning approach to identify YY1-mediated chromatin loops [J].
Dao, Fu-Ying ;
Lv, Hao ;
Zhang, Dan ;
Zhang, Zi-Mei ;
Liu, Li ;
Lin, Hao .
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (04)