Automatic feature subset selection for decision tree-based ensemble methods in the prediction of bioactivity

被引:39
|
作者
Cao, Dong-Sheng [1 ]
Xu, Qing-Song [2 ]
Liang, Yi-Zeng [1 ]
Chen, Xian [1 ]
Li, Hong-Dong [1 ]
机构
[1] Cent South Univ, Res Ctr Modernizat Tradit Chinese Med, Changsha 410083, Peoples R China
[2] Cent South Univ, Sch Math Sci & Comp Technol, Changsha 410083, Peoples R China
关键词
Feature selection; Bagging; Boosting; Random Forest (RF); Classification and Regression Tree (CART); Ensemble learning; QSAR MODELS; COMPOUND CLASSIFICATION; RANDOM FOREST; REGRESSION; INHIBITORS; QSPR; TOOL;
D O I
10.1016/j.chemolab.2010.06.008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the structure-activity relationship (SAR) study, a learning algorithm is usually faced with the problem of selecting a compact subset of descriptors related to the property of interest, while ignoring the rest. This paper presents a new method of molecular descriptor selection utilizing three commonly used decision tree (DT)-based ensemble methods coupled with a backward elimination strategy (BES). Our proposed method eliminates descriptor redundancy automatically and searches for more compact descriptor subset tailored to DT-based ensemble methods. Six real SAR datasets related to different categorical bioactivities of compounds are used to evaluate the proposed method. The results obtained in this study indicate that DT-based ensemble methods coupled with BES, especially boosting tree model, yield better classification performance for compounds related to ADMET. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:129 / 136
页数:8
相关论文
共 50 条
  • [1] Multidimensional Feature Selection and Interaction Mining with Decision Tree Based Ensemble Methods
    Krol, Lukasz
    Polanska, Joanna
    11TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS, 2017, 616 : 118 - 125
  • [2] Landslide Susceptibility Prediction based on Decision Tree and Feature Selection Methods
    Nirbhav
    Malik, Anand
    Maheshwar
    Jan, Tony
    Prasad, Mukesh
    JOURNAL OF THE INDIAN SOCIETY OF REMOTE SENSING, 2023, 51 (04) : 771 - 786
  • [3] Landslide Susceptibility Prediction based on Decision Tree and Feature Selection Methods
    Anand Nirbhav
    Tony Malik
    Mukesh Maheshwar
    Journal of the Indian Society of Remote Sensing, 2023, 51 : 771 - 786
  • [4] An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost
    Selçuk Demir
    Emrehan Kutlug Sahin
    Neural Computing and Applications, 2023, 35 : 3173 - 3190
  • [5] An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost
    Demir, Selcuk
    Sahin, Emrehan Kutlug
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (04): : 3173 - 3190
  • [6] A tree-based stacking ensemble technique with feature selection for network intrusion detection
    Rashid, Mamunur
    Kamruzzaman, Joarder
    Imam, Tasadduq
    Wibowo, Santoso
    Gordon, Steven
    APPLIED INTELLIGENCE, 2022, 52 (09) : 9768 - 9781
  • [7] A tree-based stacking ensemble technique with feature selection for network intrusion detection
    Mamunur Rashid
    Joarder Kamruzzaman
    Tasadduq Imam
    Santoso Wibowo
    Steven Gordon
    Applied Intelligence, 2022, 52 : 9768 - 9781
  • [8] A comparative study of combining tree-based feature selection methods and classifiers in personal loan default prediction
    Guo, Weidong
    Zhou, Zach Zhizhong
    JOURNAL OF FORECASTING, 2022, 41 (06) : 1248 - 1313
  • [9] Enhancing credit risk prediction based on ensemble tree-based feature transformation and logistic regression
    Liu, Jiaming
    Liu, Jiajia
    Wu, Chong
    Wang, Shouyang
    JOURNAL OF FORECASTING, 2024, 43 (02) : 429 - 455
  • [10] Feature Selection Methods Based on Decision Rule and Tree Models
    Paja, Wieslaw
    INTELLIGENT DECISION TECHNOLOGIES 2016, PT II, 2016, 57 : 63 - 70