ctP2ISP: Protein-Protein Interaction Sites Prediction Using Convolution and Transformer With Data Augmentation

被引:0
作者
Li, Kailong [1 ]
Quan, Lijun [1 ,2 ,3 ]
Jiang, Yelu [1 ]
Li, Yan [1 ]
Zhou, Yiting [1 ]
Wu, Tingfang [1 ,2 ,3 ]
Lyu, Qiang [1 ,2 ,3 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Jiangsu, Peoples R China
[2] Soochow Univ, Prov Key Lab Informat Proc Technol, Suzhou 215006, Jiangsu, Peoples R China
[3] Collaborat Innovat Ctr Novel Software Technol, Nanjing 210000, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein-protein interaction sites; convolution; transformer; data augmentation; SEQUENCE-BASED PREDICTION; SECONDARY STRUCTURE; FINGERPRINTS; CLASSIFIER;
D O I
10.1109/TCBB.2022.3154413
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein-protein interactions are the basis of many cellular biological processes, such as cellular organization, signal transduction, and immune response. Identifying protein-protein interaction sites is essential for understanding the mechanisms of various biological processes, disease development, and drug design. However, it remains a challenging task to make accurate predictions, as the small amount of training data and severe imbalanced classification reduce the performance of computational methods. We design a deep learning method named ctP(2)ISP to improve the prediction of protein-protein interaction sites. ctP(2)ISP employs Convolution and Transformer to extract information and enhance information perception so that semantic features can be mined to identify protein-protein interaction sites. A weighting loss function with different sample weights is designed to suppress the preference of the model toward multi-category prediction. To efficiently reuse the information in the training set, a preprocessing of data augmentation with an improved sample-oriented sampling strategy is applied. The trained ctP(2)ISP was evaluated against current stateof-the-art methods on six public datasets. The results show that ctP(2)ISP outperforms all other competing methods on the balance metrics: F1, MCC, and AUPRC. In particular, our prediction on open tests related to viruses may also be consistent with biological insights. The source code and data can be obtained from https://github.com/lennylv/ctP2ISP.
引用
收藏
页码:297 / 306
页数:10
相关论文
共 44 条
  • [1] CaMELS: In silico prediction of calmodulin binding proteins and their binding sites
    Abbasi, Wajid Arshad
    Asif, Amina
    Andleeb, Saiqa
    Minhas, Fayyaz ul Amir Afsar
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2017, 85 (09) : 1724 - 1740
  • [2] Issues in performance evaluation for host-pathogen protein interaction prediction
    Abbasi, Wajid Arshad
    Minhas, Fayyaz Ul Amir Afsar
    [J]. JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2016, 14 (03)
  • [3] Bonetta L, 2010, NATURE, V468, P851, DOI [10.1038/468851a, 10.1038/468852a, 10.1038/468854a]
  • [4] Branco P, 2015, Arxiv, DOI [arXiv:1505.01658, 10.48550/arXiv.1505.01658, DOI 10.48550/ARXIV.1505.01658]
  • [5] TransformerCPI: improving compound-protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments
    Chen, Lifan
    Tan, Xiaoqin
    Wang, Dingyan
    Zhong, Feisheng
    Liu, Xiaohong
    Yang, Tianbiao
    Luo, Xiaomin
    Chen, Kaixian
    Jiang, Hualiang
    Zheng, Mingyue
    [J]. BIOINFORMATICS, 2020, 36 (16) : 4406 - 4414
  • [6] Detection of Outlier Residues for Improving Interface Prediction in Protein Heterocomplexes
    Chen, Peng
    Wong, Limsoon
    Li, Jinyan
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (04) : 1155 - 1165
  • [7] Day B, 2020, MESSAGE PASSING NEUR
  • [8] Protein-Protein Interactions Essentials: Key Concepts to Building and Analyzing Interactome Networks
    De Las Rivas, Javier
    Fontanillo, Celia
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2010, 6 (06) : 1 - 8
  • [9] Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier
    Dhole, Kaustubh
    Singh, Gurdeep
    Pai, Priyadarshini P.
    Mondal, Sukanta
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2014, 348 : 47 - 54
  • [10] Global approaches to protein-protein interactions
    Drewes, G
    Bouwmeester, T
    [J]. CURRENT OPINION IN CELL BIOLOGY, 2003, 15 (02) : 199 - 205