Automatic feature subset selection for decision tree-based ensemble methods in the prediction of bioactivity

被引:39
|
作者
Cao, Dong-Sheng [1 ]
Xu, Qing-Song [2 ]
Liang, Yi-Zeng [1 ]
Chen, Xian [1 ]
Li, Hong-Dong [1 ]
机构
[1] Cent South Univ, Res Ctr Modernizat Tradit Chinese Med, Changsha 410083, Peoples R China
[2] Cent South Univ, Sch Math Sci & Comp Technol, Changsha 410083, Peoples R China
关键词
Feature selection; Bagging; Boosting; Random Forest (RF); Classification and Regression Tree (CART); Ensemble learning; QSAR MODELS; COMPOUND CLASSIFICATION; RANDOM FOREST; REGRESSION; INHIBITORS; QSPR; TOOL;
D O I
10.1016/j.chemolab.2010.06.008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the structure-activity relationship (SAR) study, a learning algorithm is usually faced with the problem of selecting a compact subset of descriptors related to the property of interest, while ignoring the rest. This paper presents a new method of molecular descriptor selection utilizing three commonly used decision tree (DT)-based ensemble methods coupled with a backward elimination strategy (BES). Our proposed method eliminates descriptor redundancy automatically and searches for more compact descriptor subset tailored to DT-based ensemble methods. Six real SAR datasets related to different categorical bioactivities of compounds are used to evaluate the proposed method. The results obtained in this study indicate that DT-based ensemble methods coupled with BES, especially boosting tree model, yield better classification performance for compounds related to ADMET. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:129 / 136
页数:8
相关论文
共 50 条
  • [31] Ensembles of instance selection methods based on feature subset
    Blachnik, Marcin
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS 18TH ANNUAL CONFERENCE, KES-2014, 2014, 35 : 388 - 396
  • [32] A feature selection algorithm of decision tree based on feature weight
    Zhou, HongFang
    Zhang, JiaWei
    Zhou, YueQing
    Guo, XiaoJie
    Ma, YiMing
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 164
  • [33] A feature selection algorithm of decision tree based on feature weight
    Zhou, HongFang
    Zhang, JiaWei
    Zhou, YueQing
    Guo, XiaoJie
    Ma, YiMing
    Expert Systems with Applications, 2021, 164
  • [34] Wrapper- and Ensemble-Based Feature Subset Selection Methods for Biomarker Discovery in Targeted Metabolomics
    Franken, Holger
    Lehmann, Rainer
    Haering, Hans-Ulrich
    Fritsche, Andreas
    Stefan, Norbert
    Zell, Andreas
    PATTERN RECOGNITION IN BIOINFORMATICS, 2011, 7036 : 121 - +
  • [35] Feature selection methods and ensemble of predictors for prediction of air pollution
    Siwek, Krzysztof
    Osowski, Stanislaw
    INTERNATIONAL WORK-CONFERENCE ON TIME SERIES (ITISE 2014), 2014, : 1207 - 1217
  • [36] A tree-based intelligence ensemble approach for spatial prediction of potential groundwater
    Avand, Mohammadtaghi
    Janizadeh, Saeid
    Tien Bui, Dieu
    Pham, Viet Hoa
    Ngo, Phuong Thao T.
    Nhu, Viet-Ha
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2020, 13 (12) : 1408 - 1429
  • [37] Constructing response model using ensemble based on feature subset selection
    Yu, EZ
    Cho, SZ
    EXPERT SYSTEMS WITH APPLICATIONS, 2006, 30 (02) : 352 - 360
  • [38] An Aggregated Decision Tree-Based Learner for Renewable Integration Prediction
    Lu, Tianguang
    Ai, Qian
    Lee, Wei-Jen
    Wang, Zhe
    He, Hongying
    2018 IEEE INDUSTRY APPLICATIONS SOCIETY ANNUAL MEETING (IAS), 2018,
  • [39] A Decision Tree-Based Method for Protein Contact Map Prediction
    Santiesteban Toca, Cosme Ernesto
    Marquez Chamorro, Alfonso E.
    Asencio Cortes, Gualberto
    Aguilar-Ruiz, Jesus S.
    EVOLUTIONARY COMPUTATION, MACHINE LEARNING AND DATA MINING IN BIOINFORMATICS, 2011, 6623 : 153 - 158
  • [40] Prediction performance of improved decision tree-based algorithms: a review
    Mienye, Ibomoiye Domor
    Sun, Yanxia
    Wang, Zenghui
    2ND INTERNATIONAL CONFERENCE ON SUSTAINABLE MATERIALS PROCESSING AND MANUFACTURING (SMPM 2019), 2019, 35 : 698 - 703