Development of Ensemble Learning Method Considering Applicability Domains

被引:0
作者
Sato, Keigo [1 ]
Kaneko, Hiromasa [1 ]
机构
[1] Meiji Univ, Sch Sci & Technol, Dept Appl Chem, Tokyo, Japan
关键词
Ensemble learning; Regression; Applicability domain; QSAR; QSPR; MODELS; PREDICTION; REGRESSION;
D O I
10.2477/jccj.2019-0010
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In quantitative structure-activity relationship and quantitative structure-physical relationship quantitatively, regression models are constructed activities and properties y, and molecular descriptors x for compounds. To improve predictive performance of models, multiple sub-models are constructed and a final y-value is predicted by integrating y-values predicted with sub-models in ensemble learning. Although it was confirmed that predictive performance improved by considering the applicability domain (AD) of each sub-model and by using only the sub-models inside AD, ADs cannot be compared between sub-datasets with different x. It was impossible to predict a y-value by selecting and weighting sub-models for a new sample. In this study, we focused on the similarity-weighted root-mean-square distance (wRMSD), which is an index of AD, and developed wRMSD-based AD considering ensemble learning (WEL), an ensemble learning method based on wRMSD. Since wRMSD is represented as the scale of y, AD can be compared between sub-models with different x, and thus, it is possible to predict a y-value, weighting sub-models having low wRMSD-values, which means high reliability of prediction, for a new sample. It was confirmed that AD was enlarged and predictive performance improved by using WEL compared to the conventional ensemble learning method through data analysis using three datasets of compounds for which water solubility, toxicity and pharmacological activity were measured. Python code for WEL is available at https://github.com/hkaneko1985/wel.
引用
收藏
页码:187 / 193
页数:7
相关论文
共 50 条
  • [41] Compiler Optimization Parameter Selection Method Based on Ensemble Learning
    Liu, Hui
    Xu, Jinlong
    Chen, Sen
    Guo, Te
    ELECTRONICS, 2022, 11 (15)
  • [42] An ensemble learning method for variable selection: application to high-dimensional data and missing values
    Bar-Hen, Avner
    Audigier, Vincent
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2022, 92 (16) : 3488 - 3510
  • [43] ToPs: Ensemble Learning With Trees of Predictors
    Yoon, Jinsung
    Zame, William R.
    van der Schaar, Mihaela
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2018, 66 (08) : 2141 - 2152
  • [44] A Prediction Method of Cable Crosstalk in Electronic Systems with Ensemble Learning
    Yang, Xu
    Zhou, Dejian
    Song, Wei
    She, Yulai
    Chen, Xiaoyong
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2022, 47 (03) : 2987 - 3000
  • [45] A new ensemble learning method based on learning automata
    Mohammad Savargiv
    Behrooz Masoumi
    Mohammad Reza Keyvanpour
    Journal of Ambient Intelligence and Humanized Computing, 2022, 13 : 3467 - 3482
  • [46] Online Multimodal Ensemble Learning Using Self-Learned Sensorimotor Representations
    Zambelli, Martina
    Demiris, Yiannis
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2017, 9 (02) : 113 - 126
  • [47] A novel energy demand prediction strategy for residential buildings based on ensemble learning
    Huang, Yao
    Yuan, Yue
    Chen, Huanxin
    Wang, Jiangyu
    Guo, Yabin
    Ahmad, Tanveer
    INNOVATIVE SOLUTIONS FOR ENERGY TRANSITIONS, 2019, 158 : 3411 - 3416
  • [48] An improved ensemble learning machine for biological activity prediction of tyrosine kinase inhibitors
    Tavakoli, Hossein
    Ghasemi, Jahan B.
    JOURNAL OF CHEMOMETRICS, 2015, 29 (04) : 213 - 223
  • [49] Data Visualization, Regression, Applicability Domains and Inverse Analysis Based on Generative Topographic Mapping
    Kaneko, Hiromasa
    MOLECULAR INFORMATICS, 2019, 38 (03)
  • [50] Inferring Association between Compound and Pathway with an Improved Ensemble Learning Method
    Song, Meiyue
    Jiang, Zhenran
    MOLECULAR INFORMATICS, 2015, 34 (11-12) : 753 - 760