Two-step hybrid modeling for variable selection and estimation: An application to quantitative structure activity relationship study

被引:2
|
作者
Oranye, Henrietta Ebele [1 ,2 ]
Ugwuowo, Fidelis Ifeanyi [1 ]
Arum, Kingsley Chinedu [1 ]
机构
[1] Univ Nigeria, Dept Stat, Nsukka, Nigeria
[2] Univ Nigeria, Dept Stat, Nsukka, Enugu, Nigeria
关键词
cross-validation; jackknife; molecular descriptors; random forest; variable selection; ADAPTIVE LASSO; REGRESSION; QSAR; CLASSIFICATION;
D O I
10.1002/cem.3522
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this study, we developed a simple technique for effective parameter estimation and prediction of the quantitative structure activity relationship studies using a two-step procedure. The first step is to choose the important molecular descriptors using the random forest regression, and the second step is to optimally predict the biological activity of the selected chemical compounds using the following estimators: ridge regression, jackknife ridge, Liu regression, jackknife Liu, Kibria-Lukman, and jackknife Kibria-Lukman. We conducted a simulation study and a real-life analysis with a quantitative structure-activity relationship (QSAR) data with 2540 descriptors after preprocessing. The optimal prediction is determined using the cross-validation error. The estimator with minimum cross-validation error is considered best. It is obvious that performing jackknife estimation after random forest selection is preferred. In this study, we developed a simple technique for effective parameter estimation and prediction of the quantitative structure activity relationship studies (QSAR) using a two-step procedure. We conducted a simulation study and a real-life application with QSAR data with 2540 descriptors after preprocessing. The optimal prediction is determined using the cross-validation error. The performance of the methods is judged using the root mean squared error of prediction. It is obvious that performing jackknife estimation after random forest selection is preferred.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] A Novel Two-step Sparse Learning Approach for Variable Selection and Optimal Predictive Modeling
    Liu, Yiren
    Qin, S. Joe
    IFAC PAPERSONLINE, 2022, 55 (07): : 57 - 64
  • [2] Two-step variable selection in partially linear additive models with time series data
    Feng, Mu
    Chen, Zhao
    Cheng, Ximing
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2018, 47 (03) : 661 - 671
  • [3] Least absolute deviation estimator-bridge variable selection and estimation for quantitative structure-activity relationship model
    Al-Dabbagh, Zainab Tawfeeq
    Algamal, Zakariya Yahya
    JOURNAL OF CHEMOMETRICS, 2019, 33 (07)
  • [4] A Novel Two-Step Hierarchical Quantitative Structure-Activity Relationship Modeling Work Flow for Predicting Acute Toxicity of Chemicals in Rodents
    Zhu, Hao
    Ye, Lin
    Richard, Ann
    Golbraikh, Alexander
    Wright, Fred A.
    Rusyn, Ivan
    Tropsha, Alexander
    ENVIRONMENTAL HEALTH PERSPECTIVES, 2009, 117 (08) : 1257 - 1264
  • [5] A two-step method for variable selection in the analysis of a case-cohort study
    Newcombe, P. J.
    Connolly, S.
    Seaman, S.
    Richardson, S.
    Sharp, S. J.
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2018, 47 (02) : 597 - 604
  • [6] Comparison study of two-step LGD estimation model with probability machines
    Tanoue, Yuta
    Yamashita, Satoshi
    Nagahata, Hideaki
    RISK MANAGEMENT-AN INTERNATIONAL JOURNAL, 2020, 22 (03): : 155 - 177
  • [7] A Two-Step Algorithm to Estimate Variable Importance for Multi-State Data: An Application to COVID-19
    Alafchi, Behnaz
    Tapak, Leili
    Doosti, Hassan
    Chesneau, Christophe
    Roshanaei, Ghodratollah
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 135 (03): : 2047 - 2064
  • [8] Two-step based hybrid feature selection method for spam filtering
    Wang, Youwei
    Liu, Yuanning
    Zhu, Xiaodong
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 27 (06) : 2785 - 2796
  • [9] A quantitative structure-activity relationship study of some substance P-related peptides - A multivariate approach using PLS and variable selection
    Norinder, U
    Rivera, C
    Unden, A
    JOURNAL OF PEPTIDE RESEARCH, 1997, 49 (02): : 155 - 162
  • [10] Modified tabu search approach for variable selection in quantitative structure-activity relationship studies of toxicity of aromatic compounds
    Shen, Qi
    Shi, Wei-Min
    Kong, Wei
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2010, 49 (01) : 61 - 66