Prediction of KRASG12C inhibitors using conjoint fingerprint and machine learning-based QSAR models

被引:10
作者
Srisongkram, Tarapong [1 ]
Khamtang, Patcharapa [2 ]
Weerapreeyakul, Natthida [1 ]
机构
[1] Khon Kaen Univ, Fac Pharmaceut Sci, Div Pharmaceut Chem, 123 Mittrapap Rd, Khon Kaen 40002, Thailand
[2] Khon Kaen Univ, Fac Pharmaceut Sci, Khon Kaen 40002, Thailand
关键词
QSAR; KRAS; Machine learning; Drug design; XGBoost; Random forest; Deep neural network; Support vector regression; KRAS; BEWARE;
D O I
10.1016/j.jmgm.2023.108466
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Kirsten rat sarcoma virus G12C (KRAS(G12C)) is the major protein mutation associated with non-small cell lung cancer (NSCLC) severity. Inhibiting KRAS(G12C) is therefore one of the key therapeutic strategies for NSCLC patients. In this paper, a cost-effective data driven drug design employing machine learning-based quantitative structure-activity relationship (QSAR) analysis was built for predicting ligand affinities against KRAS(G12C) protein. A curated and non-redundant dataset of 1033 compounds with KRAS(G12C) inhibitory activity (pIC(50)) was used to build and test the models. The PubChem fingerprint, Substructure fingerprint, Substructure fingerprint count, and the conjoint fingerprint-a combination of PubChem fingerprint and Substructure fingerprint count-were used to train the models. Using comprehensive validation methods and various machine learning algorithms, the results clearly showed that the XGBoost regression (XGBoost) achieved the highest performance in term of goodness of fit, predictivity, generalizability and model robustness (R-2 = 0.81, Q(2)CV = 0.60, Q(2)Ext = 0.62, R-2 - Q(2)Ext = 0.19, R-Y-Random(2) = 0.31 +/- 0.03, Q(2)Y(-Random) = 0.09 +/- 0.04). The top 13 molecular fingerprints that correlated with the predicted pIC50 values were SubFPC274 (aromatic atoms), SubFPC307 (number of chiral-centers), PubChemFP37 (=1 Chlorine), SubFPC18 (Number of alkylarylethers), SubFPC1 (number of primary carbons), SubFPC300 (number of 1,3-tautomerizables), PubChemFP621 (N-C:C:C:N structure), PubChemFP23 (>= 1 Fluorine), SubFPC2 (number of secondary carbons), SubFPC295 (number of C-ONS bonds), PubChemFP199 (>= 4 6-membered rings), PubChemFP180 (>= 1 nitrogen-containing 6-membered ring), and SubFPC180 (number of tertiary amine). These molecular fingerprints were virtualized and validated using molecular docking experiments. In conclusion, this conjoint fingerprint and XGBoost-QSAR model demonstrated to be useful as a high-throughput screening tool for KRAS(G12C) inhibitor identification and drug design.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Virtual Screening Based on Machine Learning Explores Mangrove Natural Products as KRASG12C Inhibitors
    Luo, Lianxiang
    Zheng, Tongyu
    Wang, Qu
    Liao, Yingling
    Zheng, Xiaoqi
    Zhong, Ai
    Huang, Zunnan
    Luo, Hui
    PHARMACEUTICALS, 2022, 15 (05)
  • [2] A dataset for machine learning-based QSAR models establishment to screen beta-lactamase inhibitors using the FARM -BIOMOL chemical library
    Pitakbut, Thanet
    Munkert, Jennifer
    Xi, Wenhui
    Wei, Yanjie
    Fuhrmann, Gregor
    BMC RESEARCH NOTES, 2025, 18 (01)
  • [3] Machine learning-based prediction models for accidental hypothermia patients
    Yohei Okada
    Tasuku Matsuyama
    Sachiko Morita
    Naoki Ehara
    Nobuhiro Miyamae
    Takaaki Jo
    Yasuyuki Sumida
    Nobunaga Okada
    Makoto Watanabe
    Masahiro Nozawa
    Ayumu Tsuruoka
    Yoshihiro Fujimoto
    Yoshiki Okumura
    Tetsuhisa Kitamura
    Ryoji Iiduka
    Shigeru Ohtsuru
    Journal of Intensive Care, 9
  • [4] Machine learning-based prediction models for accidental hypothermia patients
    Okada, Yohei
    Matsuyama, Tasuku
    Morita, Sachiko
    Ehara, Naoki
    Miyamae, Nobuhiro
    Jo, Takaaki
    Sumida, Yasuyuki
    Okada, Nobunaga
    Watanabe, Makoto
    Nozawa, Masahiro
    Tsuruoka, Ayumu
    Fujimoto, Yoshihiro
    Okumura, Yoshiki
    Kitamura, Tetsuhisa
    Iiduka, Ryoji
    Ohtsuru, Shigeru
    JOURNAL OF INTENSIVE CARE, 2021, 9 (01)
  • [5] Structure-Based Design and Pharmacokinetic Optimization of Covalent Allosteric Inhibitors of the Mutant GTPase KRASG12C
    Kettle, Jason G.
    Bagal, Sharan K.
    Bickerton, Sue
    Bodnarchuk, Michael S.
    Breed, Jason
    Carbajo, Rodrigo J.
    Cassar, Doyle J.
    Chakraborty, Atanu
    Cosulich, Sabina
    Cumming, Iain
    Davies, Michael
    Eatherton, Andrew
    Evans, Laura
    Feron, Lyman
    Fillery, Shaun
    Gleave, Emma S.
    Goldberg, Frederick W.
    Harlfinger, Stephanie
    Hanson, Lyndsey
    Howard, Martin
    Howells, Rachel
    Jackson, Anne
    Kemmitt, Paul
    Kingston, Jennifer K.
    Lamont, Scott
    Lewis, Hilary J.
    Li, Songlei
    Liu, Libin
    Ogg, Derek
    Phillips, Christopher
    Polanski, Radek
    Robb, Graeme
    Robinson, David
    Ross, Sarah
    Smith, James M.
    Tonge, Michael
    Whiteley, Rebecca
    Yang, Junsheng
    Zhang, Longfei
    Zhao, Xiliang
    JOURNAL OF MEDICINAL CHEMISTRY, 2020, 63 (09) : 4468 - 4483
  • [6] Machine Learning-Based Models for Accident Prediction at a Korean Container Port
    Kim, Jae Hun
    Kim, Juyeon
    Lee, Gunwoo
    Park, Juneyoung
    SUSTAINABILITY, 2021, 13 (16)
  • [7] Interpretability of machine learning-based prediction models in healthcare
    Stiglic, Gregor
    Kocbek, Primoz
    Fijacko, Nino
    Zitnik, Marinka
    Verbert, Katrien
    Cilar, Leona
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 10 (05)
  • [8] Harmonizing QSAR Machine Learning-Based Models and Docking Approaches for Identifying Novel Histone Deacetylase 2 Inhibitors
    Tung, Dao Quang
    Dung, Do Thi Mai
    Cong, Nguyen Thanh
    Hai, Dao Ngoc Nam
    Baecker, Daniel
    Ngo, Son Tung
    Dung, Phan Thi Phuong
    Thuan, Nguyen Thi
    Nam, Nguyen Hai
    An, Nguyen Ngoc
    CHEMISTRYSELECT, 2024, 9 (40):
  • [9] Optimal selection of learning data for highly accurate QSAR prediction of chemical biodegradability: a machine learning-based approach
    Takeda, K.
    Takeuchi, K.
    Sakuratani, Y.
    Kimbara, K.
    SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2023, 34 (09) : 729 - 743
  • [10] Machine Learning-Based Models for Prediction of Toxicity Outcomes in Radiotherapy
    Isaksson, Lars J.
    Pepa, Matteo
    Zaffaroni, Mattia
    Marvaso, Giulia
    Alterio, Daniela
    Volpe, Stefania
    Corrao, Giulia
    Augugliaro, Matteo
    Starzynska, Anna
    Leonardi, Maria C.
    Orecchia, Roberto
    Jereczek-Fossa, Barbara A.
    FRONTIERS IN ONCOLOGY, 2020, 10