Prediction of KRASG12C inhibitors using conjoint fingerprint and machine learning-based QSAR models

被引:10
作者
Srisongkram, Tarapong [1 ]
Khamtang, Patcharapa [2 ]
Weerapreeyakul, Natthida [1 ]
机构
[1] Khon Kaen Univ, Fac Pharmaceut Sci, Div Pharmaceut Chem, 123 Mittrapap Rd, Khon Kaen 40002, Thailand
[2] Khon Kaen Univ, Fac Pharmaceut Sci, Khon Kaen 40002, Thailand
关键词
QSAR; KRAS; Machine learning; Drug design; XGBoost; Random forest; Deep neural network; Support vector regression; KRAS; BEWARE;
D O I
10.1016/j.jmgm.2023.108466
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Kirsten rat sarcoma virus G12C (KRAS(G12C)) is the major protein mutation associated with non-small cell lung cancer (NSCLC) severity. Inhibiting KRAS(G12C) is therefore one of the key therapeutic strategies for NSCLC patients. In this paper, a cost-effective data driven drug design employing machine learning-based quantitative structure-activity relationship (QSAR) analysis was built for predicting ligand affinities against KRAS(G12C) protein. A curated and non-redundant dataset of 1033 compounds with KRAS(G12C) inhibitory activity (pIC(50)) was used to build and test the models. The PubChem fingerprint, Substructure fingerprint, Substructure fingerprint count, and the conjoint fingerprint-a combination of PubChem fingerprint and Substructure fingerprint count-were used to train the models. Using comprehensive validation methods and various machine learning algorithms, the results clearly showed that the XGBoost regression (XGBoost) achieved the highest performance in term of goodness of fit, predictivity, generalizability and model robustness (R-2 = 0.81, Q(2)CV = 0.60, Q(2)Ext = 0.62, R-2 - Q(2)Ext = 0.19, R-Y-Random(2) = 0.31 +/- 0.03, Q(2)Y(-Random) = 0.09 +/- 0.04). The top 13 molecular fingerprints that correlated with the predicted pIC50 values were SubFPC274 (aromatic atoms), SubFPC307 (number of chiral-centers), PubChemFP37 (=1 Chlorine), SubFPC18 (Number of alkylarylethers), SubFPC1 (number of primary carbons), SubFPC300 (number of 1,3-tautomerizables), PubChemFP621 (N-C:C:C:N structure), PubChemFP23 (>= 1 Fluorine), SubFPC2 (number of secondary carbons), SubFPC295 (number of C-ONS bonds), PubChemFP199 (>= 4 6-membered rings), PubChemFP180 (>= 1 nitrogen-containing 6-membered ring), and SubFPC180 (number of tertiary amine). These molecular fingerprints were virtualized and validated using molecular docking experiments. In conclusion, this conjoint fingerprint and XGBoost-QSAR model demonstrated to be useful as a high-throughput screening tool for KRAS(G12C) inhibitor identification and drug design.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Machine Learning-Based Prediction Models for Control Traffic in SDN Systems
    Yoo, Yeonho
    Yang, Gyeongsik
    Shin, Changyong
    Lee, Junseok
    Yoo, Chuck
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2023, 16 (06) : 4389 - 4403
  • [22] Machine Learning-based Software Quality Prediction Models: State of the Art
    Al-Jamimi, Hamdi A.
    Ahmed, Moataz
    [J]. 2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA 2013), 2013,
  • [23] Machine learning-based models for thermal cracking prediction of flexible pavements
    Abd El-Hakim, Ragaa T.
    Kaloop, Mosbeh R.
    El-Badawy, Sherif M.
    Hu, Jong Wan
    Ali, Eman K.
    [J]. ROAD MATERIALS AND PAVEMENT DESIGN, 2024,
  • [24] Recursive Feature Elimination for Machine Learning-based Landslide Prediction Models
    Munasinghe, Kusala
    Karunanayake, Piyumika
    [J]. 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021), 2021, : 126 - 129
  • [25] Credit scoring using machine learning and deep Learning-Based models
    Mestiri, Sami
    [J]. DATA SCIENCE IN FINANCE AND ECONOMICS, 2024, 4 (02): : 236 - 248
  • [26] Long-term Power Generation Prediction in Photovoltaics Using Machine Learning-based Models
    Colbu, Stefania-Cristiana
    Bancila, Daniel-Marian
    Popescu, Dumitru
    [J]. ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY, 2025, 28 (01): : 39 - 50
  • [27] Prediction of shear strength in UHPC beams using machine learning-based models and SHAP interpretation
    Ye, Meng
    Li, Lifeng
    Yoo, Doo-Yeol
    Li, Huihui
    Zhou, Cong
    Shao, Xudong
    [J]. CONSTRUCTION AND BUILDING MATERIALS, 2023, 408
  • [28] Lifestyle and occupational risks assessment of bladder cancer using machine learning-based prediction models
    Shakhssalim, Naser
    Talebi, Atefeh
    Pahlevan-Fallahy, Mohammad-Taha
    Sotoodeh, Kasra
    Alavimajd, Hamid
    Borumandnia, Nasrin
    Taheri, Maryam
    [J]. CANCER REPORTS, 2023, 6 (09)
  • [29] Development of QSAR machine learning-based models to forecast the effect of substances on malignant melanoma cells
    Ancuceanu, Robert
    Dinu, Mihaela
    Neaga, Iana
    Laszlo, Fekete Gyula
    Boda, Daniel
    [J]. ONCOLOGY LETTERS, 2019, 17 (05) : 4188 - 4196
  • [30] Analysis of Classification Models Based on Cuisine Prediction Using Machine Learning
    Jayaraman, Shobhna
    Choudhury, Tanupriya
    Kumar, Praveen
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES FOR SMART NATION (SMARTTECHCON), 2017, : 1485 - 1490