Machine-learning-based similarity meets traditional QSAR: "q-RASAR" for the enhancement of the external predictivity and detection of prediction confidence outliers in an hERG toxicity dataset

被引:23
作者
Banerjee, Arkaprava [1 ]
Roy, Kunal [1 ]
机构
[1] Jadavpur Univ, Dept Pharmaceut Technol, Drug Theoret & Cheminformat Lab, Kolkata 700032, India
关键词
q-RASAR; Machine learning; hERG; DTC Plot; VALIDATION; REGRESSION; INSIGHTS;
D O I
10.1016/j.chemolab.2023.104829
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, the concept of quantitative Read-Across Structure-Activity Relationship (q-RASAR) has been introduced by using various Machine Learning (ML) - derived similarity functions in the traditional quantitative structure-activity relationship (QSAR) modeling framework with the objective of enhancing the external predictivity of models while using the same available chemical information content. The present study uses the hERG K+ channel inhibition cardiotoxicity, a pharmaceutically relevant endpoint, as the modeling set for making predictions using the novel q-RASAR approach, as the approach combines the merits of QSAR and Read-Across, and generates simple and interpretable models using various similarity and error-based measures as descriptors. The cardiotoxicity data (in terms of pIC50 values) were collected from the literature. The curated data set was then divided into training and test sets using the sorted response-based division algorithm. The important set of features was identified based on the internal validation metrics of initial genetic algorithm models. Based on the features selected in the final Multiple Linear Regression (MLR) model, RASAR descriptors were computed using a tool available from https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home. The RASAR descriptors were then merged with the previously selected features, and an MLR q-RASAR model was generated using the grid search approach. The prediction outliers were then identified using the novel DTC Applicability Domain Plot, and the q-RASAR models were used for predictions after the removal of prediction outliers. A final Partial Least Squares (PLS) q-RASAR model was generated to obviate inter-correlation among descriptors. Various other Machine Learning approaches were also employed with the optimization of relevant hyperparameters based on the cross-validation approach, and the final test set prediction results were compared. Based on the performance in the test set predictions and interpretability, the PLS q-RASAR model was chosen as the final model which provided enhanced predictivity in comparison to previously reported models even without using 3-D descriptors. This model can thus be used for the quick screening of molecules, even before their synthesis, to estimate their cardiotoxic potential thus prioritizing molecules for further experimental testing in the drug discovery pipeline. A Java-based prediction tool has also been developed for the quick screening of cardiotoxic properties of query compounds and made available from https://sites.google.com/jadavpuruniversit y.in/dtc-lab-software/home.
引用
收藏
页数:13
相关论文
共 52 条
[31]  
Mauri A, 2020, METHOD PHARMACOL TOX, P801, DOI 10.1007/978-1-0716-0150-1_32
[32]   A chemoinformatics approach for the characterization of hybrid nanomaterials: safer and efficient design perspective [J].
Mikolajczyk, Alicja ;
Sizochenko, Natalia ;
Mulkiewicz, Ewa ;
Malankowska, Anna ;
Rasulev, Bakhtiyor ;
Puzyn, Tomasz .
NANOSCALE, 2019, 11 (24) :11808-11818
[33]  
Murtagh Fionn, 1991, NEUROCOMPUTING, V2, P183, DOI [DOI 10.1016/0925-2312(91)90023-5, 10.1016/0925-2312(91)90023-5]
[34]   AN IN SILICO APPROACH TO CYTOTOXICITY OF PHARMACEUTICALS AND PERSONAL CARE PRODUCTS ON THE RAINBOW TROUT LIVER CELL LINE RTL-W1 [J].
Onlu, Serli ;
Sacan, Melek Turker .
ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY, 2017, 36 (05) :1162-1169
[35]   Random forest classifier for remote sensing classification [J].
Pal, M .
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2005, 26 (01) :217-222
[36]  
Pedregosa F, 2011, J MACH LEARN RES, V12, P2825
[37]  
Roy K., 2015, ' Flypaper Effects' in Transfers Targeted
[38]   How Precise Are Our Quantitative Structure-Activity Relationship Derived Predictions for New Query Chemicals? [J].
Roy, Kunal ;
Ambure, Pravin ;
Kar, Supratik .
ACS OMEGA, 2018, 3 (09) :11392-11406
[39]   Be aware of error measures. Further studies on validation of predictive QSAR models [J].
Roy, Kunal ;
Das, Rudra Narayan ;
Ambure, Pravin ;
Aher, Rahul B. .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2016, 152 :18-33
[40]   hERG potassium channels and cardiac arrhythmia [J].
Sanguinetti, MC ;
Tristani-Firouzi, M .
NATURE, 2006, 440 (7083) :463-469