Improving Small Molecule pKa Prediction Using Transfer Learning With Graph Neural Networks

被引:20
作者
Mayr, Fritz [1 ]
Wieder, Marcus [1 ]
Wieder, Oliver [1 ]
Langer, Thierry [1 ]
机构
[1] Univ Vienna, Dept Pharmaceut Sci, Pharmaceut Chem Div, Vienna, Austria
来源
FRONTIERS IN CHEMISTRY | 2022年 / 10卷
基金
奥地利科学基金会; 欧盟地平线“2020”;
关键词
physical properties; PKA; Graph Neural Network (GNN); transfer learning; protonation states;
D O I
10.3389/fchem.2022.866585
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Enumerating protonation states and calculating microstate pK(a) values of small molecules is an important yet challenging task for lead optimization and molecular modeling. Commercial and non-commercial solutions have notable limitations such as restrictive and expensive licenses, high CPU/GPU hour requirements, or the need for expert knowledge to set up and use. We present a graph neural network model that is trained on 714,906 calculated microstate pK(a) predictions from molecules obtained from the ChEMBL database. The model is fine-tuned on a set of 5,994 experimental pK(a) values significantly improving its performance on two challenging test sets. Combining the graph neural network model with Dimorphite-DL, an open-source program for enumerating ionization states, we have developed the open-source Python package pkasolver, which is able to generate and enumerate protonation states and calculate pK(a) values with high accuracy.
引用
收藏
页数:10
相关论文
共 42 条
  • [1] [Anonymous], 2007, CRC HDB CHEM PHYS, V88, P88
  • [2] Baltruschat M., 2020, MACHINE LEARNING MEE
  • [3] Evaluation of log P, pKa, and log D predictions from the SAMPL7 blind challenge
    Bergazin, Teresa Danielle
    Tielker, Nicolas
    Zhang, Yingying
    Mao, Junjun
    Gunner, M. R.
    Francisco, Karol
    Ballatore, Carlo
    Kast, Stefan M.
    Mobley, David L.
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2021, 35 (07) : 771 - 802
  • [4] Bisong E., 2019, BUILDING MACHINE LEA, P59, DOI [10.1007/978-1-4842-4470-8_19, DOI 10.1007/978-1-4842-4470-8_19, 10.1007/978-1-4842-4470-8_7.]
  • [5] Dardonville C, 2017, NEW J CHEM, V41, P11016, DOI 10.1039/c7nj02497e
  • [6] ChEMBL web services: streamlining access to drug discovery data and utilities
    Davies, Mark
    Nowotka, Michal
    Papadatos, George
    Dedman, Nathan
    Gaulton, Anna
    Atkinson, Francis
    Bellis, Louisa
    Overington, John P.
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (W1) : W612 - W620
  • [7] XGraphBoost: Extracting Graph Neural Network-Based Features for a Better Prediction of Molecular Properties
    Deng, Daiguo
    Chen, Xiaowei
    Zhang, Ruochi
    Lei, Zengrong
    Wang, Xiaojian
    Zhou, Fengfeng
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (06) : 2697 - 2705
  • [8] Fey Matthias, 2019, ICLR WORKSHOP REPRES
  • [9] ChEMBL: a large-scale bioactivity database for drug discovery
    Gaulton, Anna
    Bellis, Louisa J.
    Bento, A. Patricia
    Chambers, Jon
    Davies, Mark
    Hersey, Anne
    Light, Yvonne
    McGlinchey, Shaun
    Michalovich, David
    Al-Lazikani, Bissan
    Overington, John P.
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D1100 - D1107
  • [10] Gilmer J, 2017, PR MACH LEARN RES, V70