Accurate and Rapid Prediction of Protein pK a: Protein Language Models Reveal the Sequence-pK a Relationship

被引:0
|
作者
Xu, Shijie [1 ]
Onoda, Akira [1 ,2 ]
机构
[1] Hokkaido Univ, Grad Sch Environm Sci, Sapporo 0600810, Japan
[2] Hokkaido Univ, Fac Environm Earth Sci, Sapporo 0600810, Japan
基金
日本学术振兴会;
关键词
PH MOLECULAR-DYNAMICS; PERTURBED PK(A) VALUES; POISSON-BOLTZMANN; ISOELECTRIC POINTS; EXPLICIT SOLVENT; IONIZABLE GROUPS; PROTONATION; DIFFERENCE; DEPENDENCE; EQUATION;
D O I
10.1021/acs.jctc.4c01288
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Protein pK a prediction is a key challenge in computational biology. In this study, we present pKALM, a novel deep learning-based method for high-throughput protein pK a prediction. pKALM uses a protein language model (PLM) to capture the complex sequence-structure relationships of proteins. While traditionally considered a structure-based problem, our results show that a PLM pretrained on large-scale protein sequence databases can effectively learn this relationship and achieve state-of-the-art performance. pKALM accurately predicts the pK a values of six residues (Asp, Glu, His, Lys, Cys, and Tyr) and two termini with high precision and efficiency. It performs well at predicting both exposed and buried residues, which often deviate from standard pK a values measured in the solvent. We demonstrate a novel finding that predicted protein isoelectric points (pI) can be used to improve the accuracy of pK a prediction. High-throughput pK a prediction of the human proteome using pKALM achieves a speed of 4,965 pK a predictions per second, which is several orders of magnitude faster than existing state-of-the-art methods. The case studies illustrate the efficacy of pKALM in estimating pK a values and the constraints of the method. pKALM will thus be a valuable tool for researchers in the fields of biochemistry, biophysics, and drug design.
引用
收藏
页码:3752 / 3764
页数:13
相关论文
共 50 条
  • [1] Explicit solvent models in protein pK(a) calculations
    Gibas, CJ
    Subramaniam, S
    BIOPHYSICAL JOURNAL, 1996, 71 (01) : 138 - 147
  • [2] Single-sequence protein structure prediction by integrating protein language models
    Jing, Xiaoyang
    Wu, Fandi
    Luo, Xiao
    Xu, Jinbo
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (13)
  • [3] TemStaPro: protein thermostability prediction using sequence representations from protein language models
    Pudziuvelyte, Ieva
    Olechnovic, Kliment
    Godliauskaite, Egle
    Sermokas, Kristupas
    Urbaitis, Tomas
    Gasiunas, Giedrius
    Kazlauskas, Darius
    BIOINFORMATICS, 2024, 40 (04)
  • [4] Accurate and Fast Prediction of Intrinsically Disordered Protein by Multiple Protein Language Models and Ensemble Learning
    Xu, Shijie
    Onoda, Akira
    Journal of Chemical Information and Modeling, 2024, 64 (07) : 2901 - 2911
  • [5] Accurate and Fast Prediction of Intrinsically Disordered Protein by Multiple Protein Language Models and Ensemble Learning
    Xu, Shijie
    Onoda, Akira
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 64 (07) : 2901 - 2911
  • [6] Chemically accurate protein structures:: Validation of protein NMR structures by comparison of measured and predicted pK a values
    Powers, N.
    Jensen, Jan H.
    JOURNAL OF BIOMOLECULAR NMR, 2006, 35 (01) : 39 - 51
  • [7] Single-sequence protein structure prediction using supervised transformer protein language models
    Wang, Wenkai
    Peng, Zhenling
    Yang, Jianyi
    NATURE COMPUTATIONAL SCIENCE, 2022, 2 (12): : 804 - +
  • [8] Single-sequence protein structure prediction using supervised transformer protein language models
    Wenkai Wang
    Zhenling Peng
    Jianyi Yang
    Nature Computational Science, 2022, 2 : 804 - 814
  • [9] DeepKa Web Server: High-Throughput Protein pK a Prediction
    Cai, Zhitao
    Peng, Hao
    Sun, Shuo
    He, Jiahao
    Luo, Fangfang
    Huang, Yandong
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (08) : 2933 - 2940
  • [10] NetSurfP-3.0: accurate and fast prediction of protein structural features by protein language models and deep learning
    Hoie, Magnus Haraldson
    Kiehl, Erik Nicolas
    Petersen, Bent
    Nielsen, Morten
    Winther, Ole
    Nielsen, Henrik
    Hallgren, Jeppe
    Marcatili, Paolo
    NUCLEIC ACIDS RESEARCH, 2022, 50 (W1) : W510 - W515