Natural Language Processing Methods for the Study of Protein-Ligand Interactions

被引:2
作者
Michels, James [1 ]
Bandarupalli, Ramya [2 ]
Akbari, Amin Ahangar [2 ]
Le, Thai [3 ]
Xiao, Hong [1 ,4 ]
Li, Jing [2 ]
Hom, Erik F. Y. [5 ,6 ]
机构
[1] Univ Mississippi, Dept Comp & Informat Sci, University, MS 38677 USA
[2] Univ Mississippi, Sch Pharm, Dept Biomol Sci, University, MS 38677 USA
[3] Indiana Univ, Dept Comp Sci, Bloomington, IN 47408 USA
[4] Univ Mississippi, Inst Data Sci, University, MS 38677 USA
[5] Univ Mississippi, Dept Biol, University, MS 38677 USA
[6] Univ Mississippi, Ctr Biodivers & Conservat Res, University, MS 38677 USA
关键词
NEURAL-NETWORK; STRUCTURE PREDICTION; AFFINITY PREDICTION; INFORMATION-SYSTEM; CHEMICAL LANGUAGE; DATABASE; BINDING; ATTENTION; SEQUENCE; REPRESENTATIONS;
D O I
10.1021/acs.jcim.4c01907
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Natural Language Processing (NLP) has revolutionized the way computers are used to study and interact with human languages and is increasingly influential in the study of protein and ligand binding, which is critical for drug discovery and development. This review examines how NLP techniques have been adapted to decode the "language" of proteins and small molecule ligands to predict protein-ligand interactions (PLIs). We discuss how methods such as long short-term memory (LSTM) networks, transformers, and attention mechanisms can leverage different protein and ligand data types to identify potential interaction patterns. Significant challenges are highlighted including the scarcity of high-quality negative data, difficulties in interpreting model decisions, and sampling biases in existing data sets. We argue that focusing on improving data quality, enhancing model robustness, and fostering both collaboration and competition could catalyze future advances in machine-learning-based predictions of PLIs.
引用
收藏
页码:2191 / 2213
页数:23
相关论文
共 269 条
[1]   DeepCDA: deep cross-domain compound-protein affinity prediction through LSTM and convolutional neural networks [J].
Abbasi, Karim ;
Razzaghi, Parvin ;
Poso, Antti ;
Amanlou, Massoud ;
Ghasemi, Jahan B. ;
Masoudi-Nejad, Ali .
BIOINFORMATICS, 2020, 36 (17) :4633-4642
[2]   ResBiGAAT: Residual Bi-GRU with attention for protein-ligand binding affinity prediction [J].
Abdelkader, Gelany Aly ;
Njimbouom, Soualihou Ngnamsie ;
Oh, Tae-Jin ;
Kim, Jeong-Dong .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2023, 107
[3]   Accurate structure prediction of biomolecular interactions with AlphaFold 3 [J].
Abramson, Josh ;
Adler, Jonas ;
Dunger, Jack ;
Evans, Richard ;
Green, Tim ;
Pritzel, Alexander ;
Ronneberger, Olaf ;
Willmore, Lindsay ;
Ballard, Andrew J. ;
Bambrick, Joshua ;
Bodenstein, Sebastian W. ;
Evans, David A. ;
Hung, Chia-Chun ;
O'Neill, Michael ;
Reiman, David ;
Tunyasuvunakool, Kathryn ;
Wu, Zachary ;
Zemgulyte, Akvile ;
Arvaniti, Eirini ;
Beattie, Charles ;
Bertolli, Ottavia ;
Bridgland, Alex ;
Cherepanov, Alexey ;
Congreve, Miles ;
Cowen-Rivers, Alexander I. ;
Cowie, Andrew ;
Figurnov, Michael ;
Fuchs, Fabian B. ;
Gladman, Hannah ;
Jain, Rishub ;
Khan, Yousuf A. ;
Low, Caroline M. R. ;
Perlin, Kuba ;
Potapenko, Anna ;
Savy, Pascal ;
Singh, Sukhdeep ;
Stecula, Adrian ;
Thillaisundaram, Ashok ;
Tong, Catherine ;
Yakneen, Sergei ;
Zhong, Ellen D. ;
Zielinski, Michal ;
Zidek, Augustin ;
Bapst, Victor ;
Kohli, Pushmeet ;
Jaderberg, Max ;
Hassabis, Demis ;
Jumper, John M. .
NATURE, 2024, 630 (8016) :493-500
[4]   Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) [J].
Adadi, Amina ;
Berrada, Mohammed .
IEEE ACCESS, 2018, 6 :52138-52160
[5]   OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization [J].
Ahdritz, Gustaf ;
Bouatta, Nazim ;
Floristean, Christina ;
Kadyan, Sachin ;
Xia, Qinghui ;
Gerecke, William ;
O'Donnell, Timothy J. ;
Berenberg, Daniel ;
Fisk, Ian ;
Zanichelli, Niccolo ;
Zhang, Bo ;
Nowaczynski, Arkadiusz ;
Wang, Bei ;
Stepniewska-Dziubinska, Marta M. ;
Zhang, Shang ;
Ojewole, Adegoke ;
Guney, Murat Efe ;
Biderman, Stella ;
Watkins, Andrew M. ;
Ra, Stephen ;
Lorenzo, Pablo Ribalta ;
Nivon, Lucas ;
Weitzner, Brian ;
Ban, Yih-En Andrew ;
Chen, Shiyang ;
Zhang, Minjia ;
Li, Conglong ;
Song, Shuaiwen Leon ;
He, Yuxiong ;
Sorger, Peter K. ;
Mostaque, Emad ;
Zhang, Zhao ;
Bonneau, Richard ;
AlQuraishi, Mohammed .
NATURE METHODS, 2024, 21 (08) :1514-1524
[6]  
Albalate A., 2013, SEMISUPERVISED UNERV
[7]   PSCDB: a database for protein structural change upon ligand binding [J].
Amemiya, Takayuki ;
Koike, Ryotaro ;
Kidera, Akinori ;
Ota, Motonori .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D554-D558
[8]   PSnpBind: a database of mutated binding site protein-ligand complexes constructed using a multithreaded virtual screening workflow [J].
Ammar, Ammar ;
Cavill, Rachel ;
Evelo, Chris ;
Willighagen, Egon .
JOURNAL OF CHEMINFORMATICS, 2022, 14 (01)
[9]  
Anteghini M, 2023, J CELL BIOCHEM, V124, P1870, DOI 10.1002/jcb.30490
[10]   In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins [J].
Anteghini, Marco ;
dos Santos, Vitor Martins ;
Saccenti, Edoardo .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2021, 22 (12)