Prediction-Inspired Intelligent Training for the Development of Classification Read-across Structure-Activity Relationship (c-RASAR) Models for Organic Skin Sensitizers: Assessment of Classification Error Rate from Novel Similarity Coefficients

被引:33
作者
Banerjee, Arkaprava [1 ]
Roy, Kunal [1 ]
机构
[1] Jadavpur Univ, Dept Pharmaceut Technol, Drug Theoret & Cheminformat Lab, Kolkata 700032, India
关键词
QSAR MODELS; QUANTITATIVE PREDICTIONS; VALIDATION;
D O I
10.1021/acs.chemrestox.3c00155
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
The advancements in the field of cheminformatics have led to a reduction in animal testing to estimate the activity, property, and toxicity of query chemicals. Read-across structureactivity relationship (RASAR) is an emerging concept that utilizes various similarity functions derived from chemical information to develop highly predictive models. Unlike quantitative structureactivity relationship (QSAR) models, RASAR descriptors of a query compound are computed from its close congeners instead of the compound itself, thus targeting predictions in the model training phase. The objective of the present study is not to propose new QSAR models for skin sensitization but to demonstrate the enhancement in the quality of predictions of the skin-sensitizing potential of organic compounds by developing classification-based RASAR (c-RASAR) models. A diverse, previously curated data set was collected from the literature for which 2D descriptors were computed. The extracted essential features were then used to develop a classification-based linear discriminant analysis (LDA) QSAR model. Furthermore, from the read-across-based predictions, RASAR descriptors were calculated using the basic settings of the hyperparameters for the Laplacian Kernel-based optimum similarity measure. After feature selection, an LDA c-RASAR model was developed, which superseded the prediction quality of the LDA-QSAR model. Various other combinations of RASAR descriptors were also taken to develop additional c-RASAR models, all showing better prediction quality than the LDA QSAR model while using a lower number of descriptors. Various other machine learning c-RASAR models were also developed for comparison purposes. In this work, we have proposed and analyzed three new similarity metrics: gm_class, sm1, and sm2. The first one is an indicator variable used to generate a simple univariate c-RASAR model with good prediction ability, while the remaining two are similarity indices used to analyze possible activity cliffs in the training and test sets and are believed to play an important role in the modelability analysis of data sets.
引用
收藏
页码:1518 / 1531
页数:14
相关论文
共 46 条
[1]   Predicting chemically-induced skin reactions. Part I: QSAR models of skin sensitization and their application to identify potentially hazardous compounds [J].
Alves, Vinicius M. ;
Muratov, Eugene ;
Fourches, Denis ;
Strickland, Judy ;
Kleinstreuer, Nicole ;
Andrade, Carolina H. ;
Tropsha, Alexander .
TOXICOLOGY AND APPLIED PHARMACOLOGY, 2015, 284 (02) :262-272
[2]  
[Anonymous], DTC LAB TOOLS SUPPL
[3]  
[Anonymous], DTC LAB SOFTW TOOLS
[4]   Key read across framework components and biology based improvements [J].
Ball, Nicholas ;
Madden, Judith ;
Paini, Alicia ;
Mathea, Miriam ;
Palmer, Andrew David ;
Sperber, Saskia ;
Hartung, Thomas ;
van Ravenzwaay, Bennard .
MUTATION RESEARCH-GENETIC TOXICOLOGY AND ENVIRONMENTAL MUTAGENESIS, 2020, 853
[5]   Machine-learning-based similarity meets traditional QSAR: "q-RASAR" for the enhancement of the external predictivity and detection of prediction confidence outliers in an hERG toxicity dataset [J].
Banerjee, Arkaprava ;
Roy, Kunal .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2023, 237
[6]   Efficient predictions of cytotoxicity of TiO2-based multi-component nanoparticles using a machine learning-based q-RASAR approach [J].
Banerjee, Arkaprava ;
Kar, Supratik ;
Pore, Souvik ;
Roy, Kunal .
NANOTOXICOLOGY, 2023, 17 (01) :78-93
[7]   On Some Novel Similarity-Based Functions Used in the ML-Based q-RASAR Approach for Efficient Quantitative Predictions of Selected Toxicity End Points [J].
Banerjee, Arkaprava ;
Roy, Kunal .
CHEMICAL RESEARCH IN TOXICOLOGY, 2023, 36 (03) :446-464
[8]   A machine learning q-RASPR approach for efficient predictions of the specific surface area of perovskites [J].
Banerjee, Arkaprava ;
Gajewicz-Skretna, Agnieszka ;
Roy, Kunal .
MOLECULAR INFORMATICS, 2023, 42 (04)
[9]   Quick and efficient quantitative predictions of androgen receptor binding affinity for screening Endocrine Disruptor Chemicals using 2D-QSAR and Chemical Read-Across [J].
Banerjee, Arkaprava ;
De, Priyanka ;
Kumar, Vinay ;
Kar, Supratik ;
Roy, Kunal .
CHEMOSPHERE, 2022, 309
[10]   Quantitative predictions from chemical read-across and their confidence measures [J].
Banerjee, Arkaprava ;
Chatterjee, Mainak ;
De, Priyanka ;
Roy, Kunal .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2022, 227