Identifying 124 new anti-HIV drug candidates in a 37 billion-compound database: An integrated approach of machine learning (QSAR), molecular docking, and molecular dynamics simulation

被引:5
作者
Cobre, Alexandre de Fatima [1 ]
Ara, Anderson [2 ]
Alves, Alexessander Couto [3 ]
Neto, Moises Maia [4 ]
Fachi, Mariana Millan [5 ]
Beca, Laize Silvia dos Anjos Botas [6 ]
Tonin, Fernanda Stumpf [7 ]
Pontarolo, Roberto [8 ]
机构
[1] Univ Fed Parana, Dept Stat, Postgrad Program Data Sci & Big Data, Curitiba, Brazil
[2] Univ Fed Parana, Dept Stat, Curitiba, Brazil
[3] Univ Surrey, Fac Hlth & Med Sci, Sch Biosci & Med, Guildford, England
[4] Univ Fed Parana, Dept Pharm, Postgrad Program Pharmaceut Sci, Curitiba, Brazil
[5] Univ Fed Parana, Postgrad Program Pharmaceut Sci, Curitiba, Brazil
[6] Lurio Univ, Dept Pharm, Nampula, Mozambique
[7] Inst Politecn Lisboa, H &TRC Hlth & Technol Res Ctr, ESTeSL, Escola Super Tecnol Saude, Lisbon, Portugal
[8] Univ Fed Parana, Dept Pharm, Curitiba, Brazil
关键词
HIV; CCR5; Drug discovery; Machine learning; Molecular docking; Molecular dynamics; BIOLOGICAL EVALUATION; FEATURE-SELECTION; INHIBITORS; DESIGN; MODELS;
D O I
10.1016/j.chemolab.2024.105145
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent data from the World Health Organization reveals that in 2023, 38.8 million people were living with HIV. Within this population, there were 1.5 million new cases and 650 thousand deaths attributed to the disease. This study employs an integrated approach involving QSAR-based machine learning models, molecular docking, and molecular dynamics simulations to identify potential compounds for inhibiting the bioactivity of the CC chemokine receptor type 5 (CCR5) protein, a key entry point for the HIV virus. Using non-redundant experimental data from the CHEMBL database, 40 different machine learning algorithms were trained and the top four models (XGBoost, Histogram based gradient Boosting, Light Gradient Boosted Machine, and Extra Trees Regression) were utilized to predict anti-HIV bioactivity for 37 billion compounds in the ZINC-22 database. The screening resulted in the identification of 124 new anti-HIV drug candidates, confirmed through molecular docking and dynamics simulations. The study underscores the therapeutic potential of these compounds, paving the way for further in vitro and in vivo investigations. The convergence of machine learning and experimental findings presents a promising avenue for significant advancements in pharmaceutical research, particularly in the treatment of viral diseases such as HIV. To guarantee the reproducibility of our study, we have made the Python code (google colab) and the associated database available on GitHub. You can access them through the following link: GitHub Link: https://github.com/AlexandreCOBRE/code.
引用
收藏
页数:11
相关论文
共 57 条
[1]   Ibrutinib and novel BTK inhibitors in clinical development [J].
Akinleye, Akintunde ;
Chen, Yamei ;
Mukhi, Nikhil ;
Song, Yongping ;
Liu, Delong .
JOURNAL OF HEMATOLOGY & ONCOLOGY, 2013, 6
[2]   Machine Learning Model for Multiomics Biomarkers Identification for Menopause Status in Breast Cancer [J].
Alghanim, Firas ;
Al-Hurani, Ibrahim ;
Qattous, Hazem ;
Al-Refai, Abdullah ;
Batiha, Osamah ;
Alkhateeb, Abedalrhman ;
Ikki, Salama .
ALGORITHMS, 2024, 17 (01)
[3]   Quantitative structural assessments of potential meprin β inhibitors by non-linear QSAR approaches and validation by binding mode of interaction analysis [J].
Banerjee, Suvankar ;
Baidya, Sandip Kumar ;
Ghosh, Balaram ;
Nandi, Suvendu ;
Mandal, Mahitosh ;
Jha, Tarun ;
Adhikari, Nilanjan .
NEW JOURNAL OF CHEMISTRY, 2023, 47 (15) :7051-7069
[4]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[5]   PubChem3D: a new resource for scientists [J].
Bolton, Evan E. ;
Chen, Jie ;
Kim, Sunghwan ;
Han, Lianyi ;
He, Siqian ;
Shi, Wenyao ;
Simonyan, Vahan ;
Sun, Yan ;
Thiessen, Paul A. ;
Wang, Jiyao ;
Yu, Bo ;
Zhang, Jian ;
Bryant, Stephen H. .
JOURNAL OF CHEMINFORMATICS, 2011, 3
[6]   Decision Tree and Ensemble Learning Algorithms with Their Applications in Bioinformatics [J].
Che, Dongsheng ;
Liu, Qi ;
Rasheed, Khaled ;
Tao, Xiuping .
SOFTWARE TOOLS AND ALGORITHMS FOR BIOLOGICAL SYSTEMS, 2011, 696 :191-199
[7]   The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation [J].
Chicco, Davide ;
Warrens, Matthijs J. ;
Jurman, Giuseppe .
PEERJ COMPUTER SCIENCE, 2021,
[8]   Opportunities and obstacles for deep learning in biology and medicine [J].
Ching, Travers ;
Himmelstein, Daniel S. ;
Beaulieu-Jones, Brett K. ;
Kalinin, Alexandr A. ;
Do, Brian T. ;
Way, Gregory P. ;
Ferrero, Enrico ;
Agapow, Paul-Michael ;
Zietz, Michael ;
Hoffman, Michael M. ;
Xie, Wei ;
Rosen, Gail L. ;
Lengerich, Benjamin J. ;
Israeli, Johnny ;
Lanchantin, Jack ;
Woloszynek, Stephen ;
Carpenter, Anne E. ;
Shrikumar, Avanti ;
Xu, Jinbo ;
Cofer, Evan M. ;
Lavender, Christopher A. ;
Turaga, Srinivas C. ;
Alexandari, Amr M. ;
Lu, Zhiyong ;
Harris, David J. ;
DeCaprio, Dave ;
Qi, Yanjun ;
Kundaje, Anshul ;
Peng, Yifan ;
Wiley, Laura K. ;
Segler, Marwin H. S. ;
Boca, Simina M. ;
Swamidass, S. Joshua ;
Huang, Austin ;
Gitter, Anthony ;
Greene, Casey S. .
JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2018, 15 (141)
[9]   Naringenin-4'-glucuronide as a new drug candidate against the COVID-19 Omicron variant: a study based on molecular docking, molecular dynamics, MM/PBSA and MM/GBSA [J].
Cobre, Alexandre de Fatima ;
Neto, Moises Maia ;
de Melo, Eduardo Borges ;
Fachi, Mariana Millan ;
Ferreira, Luana Mota ;
Tonin, Fernanda Stumpf ;
Pontarolo, Roberto .
JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2024, 42 (11) :5881-5894
[10]   Molecular fingerprint-based machine learning assisted QSAR model development for prediction of ionic liquid properties [J].
Ding, Yi ;
Chen, Minchun ;
Guo, Chao ;
Zhang, Peng ;
Wang, Jingwen .
JOURNAL OF MOLECULAR LIQUIDS, 2021, 326