Machine-learning-based similarity meets traditional QSAR: "q-RASAR" for the enhancement of the external predictivity and detection of prediction confidence outliers in an hERG toxicity dataset

被引:23
作者
Banerjee, Arkaprava [1 ]
Roy, Kunal [1 ]
机构
[1] Jadavpur Univ, Dept Pharmaceut Technol, Drug Theoret & Cheminformat Lab, Kolkata 700032, India
关键词
q-RASAR; Machine learning; hERG; DTC Plot; VALIDATION; REGRESSION; INSIGHTS;
D O I
10.1016/j.chemolab.2023.104829
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, the concept of quantitative Read-Across Structure-Activity Relationship (q-RASAR) has been introduced by using various Machine Learning (ML) - derived similarity functions in the traditional quantitative structure-activity relationship (QSAR) modeling framework with the objective of enhancing the external predictivity of models while using the same available chemical information content. The present study uses the hERG K+ channel inhibition cardiotoxicity, a pharmaceutically relevant endpoint, as the modeling set for making predictions using the novel q-RASAR approach, as the approach combines the merits of QSAR and Read-Across, and generates simple and interpretable models using various similarity and error-based measures as descriptors. The cardiotoxicity data (in terms of pIC50 values) were collected from the literature. The curated data set was then divided into training and test sets using the sorted response-based division algorithm. The important set of features was identified based on the internal validation metrics of initial genetic algorithm models. Based on the features selected in the final Multiple Linear Regression (MLR) model, RASAR descriptors were computed using a tool available from https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home. The RASAR descriptors were then merged with the previously selected features, and an MLR q-RASAR model was generated using the grid search approach. The prediction outliers were then identified using the novel DTC Applicability Domain Plot, and the q-RASAR models were used for predictions after the removal of prediction outliers. A final Partial Least Squares (PLS) q-RASAR model was generated to obviate inter-correlation among descriptors. Various other Machine Learning approaches were also employed with the optimization of relevant hyperparameters based on the cross-validation approach, and the final test set prediction results were compared. Based on the performance in the test set predictions and interpretability, the PLS q-RASAR model was chosen as the final model which provided enhanced predictivity in comparison to previously reported models even without using 3-D descriptors. This model can thus be used for the quick screening of molecules, even before their synthesis, to estimate their cardiotoxic potential thus prioritizing molecules for further experimental testing in the drug discovery pipeline. A Java-based prediction tool has also been developed for the quick screening of cardiotoxic properties of query compounds and made available from https://sites.google.com/jadavpuruniversit y.in/dtc-lab-software/home.
引用
收藏
页数:13
相关论文
共 52 条
  • [1] Awad M., 2015, Efficient learning machines: theories, concepts, and applications for engineers and system designers, P67
  • [2] Key read across framework components and biology based improvements
    Ball, Nicholas
    Madden, Judith
    Paini, Alicia
    Mathea, Miriam
    Palmer, Andrew David
    Sperber, Saskia
    Hartung, Thomas
    van Ravenzwaay, Bennard
    [J]. MUTATION RESEARCH-GENETIC TOXICOLOGY AND ENVIRONMENTAL MUTAGENESIS, 2020, 853
  • [3] Efficient predictions of cytotoxicity of TiO2-based multi-component nanoparticles using a machine learning-based q-RASAR approach
    Banerjee, Arkaprava
    Kar, Supratik
    Pore, Souvik
    Roy, Kunal
    [J]. NANOTOXICOLOGY, 2023, 17 (01) : 78 - 93
  • [4] On Some Novel Similarity-Based Functions Used in the ML-Based q-RASAR Approach for Efficient Quantitative Predictions of Selected Toxicity End Points
    Banerjee, Arkaprava
    Roy, Kunal
    [J]. CHEMICAL RESEARCH IN TOXICOLOGY, 2023, 36 (03) : 446 - 464
  • [5] A machine learning q-RASPR approach for efficient predictions of the specific surface area of perovskites
    Banerjee, Arkaprava
    Gajewicz-Skretna, Agnieszka
    Roy, Kunal
    [J]. MOLECULAR INFORMATICS, 2023, 42 (04)
  • [6] Quick and efficient quantitative predictions of androgen receptor binding affinity for screening Endocrine Disruptor Chemicals using 2D-QSAR and Chemical Read-Across
    Banerjee, Arkaprava
    De, Priyanka
    Kumar, Vinay
    Kar, Supratik
    Roy, Kunal
    [J]. CHEMOSPHERE, 2022, 309
  • [7] First report of q-RASAR modeling toward an approach of easy interpretability and efficient transferability
    Banerjee, Arkaprava
    Roy, Kunal
    [J]. MOLECULAR DIVERSITY, 2022, 26 (05) : 2847 - 2862
  • [8] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [9] Editorial: In silico Methods for Drug Design and Discovery
    Brogi, Simone
    Ramalho, Teodorico Castro
    Kuca, Kamil
    Medina-Franco, Jose L.
    Valko, Marian
    [J]. FRONTIERS IN CHEMISTRY, 2020, 8
  • [10] A novel quantitative read-across tool designed purposefully to fill the existing gaps in nanosafety data
    Chatterjee, Mainak
    Banerjee, Arkaprava
    De, Priyanka
    Gajewicz-Skretna, Agnieszka
    Roy, Kunal
    [J]. ENVIRONMENTAL SCIENCE-NANO, 2022, 9 (01) : 189 - 203