Machine learning facilitated structural activity relationship approach for the discovery of novel inhibitors targeting EGFR

被引:8
作者
Choudhary, Rekha [1 ]
Walhekar, Vinayak [1 ]
Muthal, Amol [2 ]
Kumar, Dilip [1 ,3 ,4 ]
Bagul, Chandrakant [1 ]
Kulkarni, Ravindra [1 ,5 ]
机构
[1] BVDUS Poona Coll Pharm, Dept Pharmaceut Chem, Pune, Maharashtra, India
[2] BVDUS Poona Coll Pharm, Dept Pharmacol, Pune, Maharashtra, India
[3] Univ Calif Davis, Dept Entomol, Davis, CA USA
[4] Univ Calif Davis, UC Davis Comprehens Canc Ctr, Davis, CA USA
[5] BVDUS Poona Coll Pharm, Dept Pharmaceut Chem, Pune 411038, Maharashtra, India
关键词
EGFR; machine learning; molecular docking; molecular dynamics; virtual screening; SVM; and random forest; LOGISTIC-REGRESSION; RANDOM FOREST; KINASE; DERIVATIVES; PREDICTION; MUTATIONS; DOCKING; DESIGN;
D O I
10.1080/07391102.2023.2175263
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
This research manuscript aims to find the most effective epidermal growth factor receptor (EGFR) inhibitors from millions of in house compounds through Machine Learning (ML) techniques. ML-based structure activity relationship (SAR) models were validated to predict biological activity of untested novel molecules. Six ML algorithms, including k nearest neighbour (KNN), decision tree (DT), Logistic Regression, support vector machine (SVM), multilinear regression (MLR), and random forest (RF), were used to build for activity prediction. Among these, RF classifier (accuracy for train and test set is 90% and 81%) and RF regressor (R-2 and MSE for trainset is 0.83 and 0.29 and for test set, 0.69 and 0.46) showed good predictive performance. Also, the six most essential features that affect the biological activity parameter and highly contribute to model development were successfully selected by the variable importance technique. RF regression model was used to predict the biological activity expressed as pIC(50) of nearly ten million molecules while RF classification model classifies those molecules into active, moderately active, and least active according to their predicted pIC(50). Based on two models, thousand molecules from million molecules with higher predicted pIC(50) values and classified as active were selected for molecular docking. Based on the docking scores, predicted pIC(50), and binding interactions with MET769 residue, compounds, i.e., Zinc257233137, Zinc257232249, and Zinc101379788, were identified as potential EGFR inhibitors with predicted pIC(50) 7.72, 7.85, and 7.70. Dynamics studies were also performed on Zinc257233137 to illustrate that it has good binding free energy and stable hydrogen bonding interactions with EGFR. These molecules can be used for further research and proved to be the novel drugs for EGFR in cancer treatment.Communicated by Ramaswamy H. Sarma
引用
收藏
页码:12445 / 12463
页数:19
相关论文
共 72 条
[1]   Prediction of kinase inhibitors binding modes with machine learning and reduced descriptor sets [J].
Abdelbaky, Ibrahim ;
Tayara, Hilal ;
Chong, Kil To .
SCIENTIFIC REPORTS, 2021, 11 (01)
[2]   Globally Approved EGFR Inhibitors: Insights into Their Syntheses, Target Kinases, Biological Activities, Receptor Interactions, and Metabolism [J].
Abourehab, Mohammed A. S. ;
Alqahtani, Alaa M. ;
Youssif, Bahaa G. M. ;
Gouda, Ahmed M. .
MOLECULES, 2021, 26 (21)
[3]  
Abraham Mark James, 2015, SoftwareX, V1-2, P19, DOI [10.1016/j.softx.2015.06.001, 10.1016/j.softx.2015.06.001]
[4]   Structural insights into conformational stability of both wild-type and mutant EZH2 receptor [J].
Aier, Imlimaong ;
Varadwaj, Pritish Kumar ;
Raj, Utkarsh .
SCIENTIFIC REPORTS, 2016, 6
[5]  
Asiya C., 2022, WATER AIR SOIL POLL, DOI [https://doi.org/10.21203/rs.3.rs-1670133/v1, DOI 10.21203/RS.3.RS-1670133/V1]
[6]  
Balasubramanian K., 2022, Comprehensive Pharmacology, V2, P553, DOI [10.1016/B978-0-12-820472-6.00015-3, DOI 10.1016/B978-0-12-820472-6.00015-3]
[7]  
Begum Rokeya University, 2020, International Journal of Applied Mathematics and Machine Learning, V10, P15, DOI [10.18642/ijamml_7100122032, 10.18642/ijamml_7100122032, DOI 10.18642/IJAMML_7100122032NI, https://doi.org/10.18642/ijamml_7100122032, DOI 10.18642/IJAMML_7100122032]
[8]   Random forest in remote sensing: A review of applications and future directions [J].
Belgiu, Mariana ;
Dragut, Lucian .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2016, 114 :24-31
[9]  
Benatti, 2018, CHAPTER ARE POSTCRIS, P30
[10]   Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics [J].
Boulesteix, Anne-Laure ;
Janitza, Silke ;
Kruppa, Jochen ;
Koenig, Inke R. .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 2 (06) :493-507