Automatic feature subset selection for decision tree-based ensemble methods in the prediction of bioactivity

被引:39
|
作者
Cao, Dong-Sheng [1 ]
Xu, Qing-Song [2 ]
Liang, Yi-Zeng [1 ]
Chen, Xian [1 ]
Li, Hong-Dong [1 ]
机构
[1] Cent South Univ, Res Ctr Modernizat Tradit Chinese Med, Changsha 410083, Peoples R China
[2] Cent South Univ, Sch Math Sci & Comp Technol, Changsha 410083, Peoples R China
关键词
Feature selection; Bagging; Boosting; Random Forest (RF); Classification and Regression Tree (CART); Ensemble learning; QSAR MODELS; COMPOUND CLASSIFICATION; RANDOM FOREST; REGRESSION; INHIBITORS; QSPR; TOOL;
D O I
10.1016/j.chemolab.2010.06.008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the structure-activity relationship (SAR) study, a learning algorithm is usually faced with the problem of selecting a compact subset of descriptors related to the property of interest, while ignoring the rest. This paper presents a new method of molecular descriptor selection utilizing three commonly used decision tree (DT)-based ensemble methods coupled with a backward elimination strategy (BES). Our proposed method eliminates descriptor redundancy automatically and searches for more compact descriptor subset tailored to DT-based ensemble methods. Six real SAR datasets related to different categorical bioactivities of compounds are used to evaluate the proposed method. The results obtained in this study indicate that DT-based ensemble methods coupled with BES, especially boosting tree model, yield better classification performance for compounds related to ADMET. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:129 / 136
页数:8
相关论文
共 50 条
  • [41] Classification of repeated measurements data using tree-based ensemble methods
    Werner Adler
    Sergej Potapov
    Berthold Lausen
    Computational Statistics, 2011, 26
  • [42] Exploring the potential of tree-based ensemble methods in solar radiation modeling
    Hassan, Muhammed A.
    Khalil, A.
    Kaseb, S.
    Kassem, M. A.
    APPLIED ENERGY, 2017, 203 : 897 - 916
  • [43] Predicting musculoskeletal disorders risk using tree-based ensemble methods
    Paraponaris, A.
    Ba, A.
    Gallic, E.
    Liance, Q.
    Michel, Pierre
    EUROPEAN JOURNAL OF PUBLIC HEALTH, 2019, 29
  • [44] Decision tree-based Feature Ranking in Concept Drifting Data Streams
    Pereira Karax, Jean Antonio
    Malucelli, Andreia
    Barddal, Jean Paul
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 590 - 592
  • [45] Classification of repeated measurements data using tree-based ensemble methods
    Adler, Werner
    Potapov, Sergej
    Lausen, Berthold
    COMPUTATIONAL STATISTICS, 2011, 26 (02) : 355 - 369
  • [46] Genetic Algorithm Based Feature Selection With Ensemble Methods For Student Academic Performance Prediction
    Farissi, Al
    Dahlan, Halina Mohamed
    Samsuryadi
    3RD FORUM IN RESEARCH, SCIENCE, AND TECHNOLOGY (FIRST 2019) INTERNATIONAL CONFERENCE, 2020, 1500
  • [47] Feature Bundles and their Effect on the Performance of Tree-based Evolutionary Classification and Feature Selection Algorithms
    Neshatian, Kourosh
    Varn, Lucianne
    2019 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2019, : 1612 - 1619
  • [48] Modeling Bus Dwell Time with Decision Tree-Based Methods
    Rashidi, Soroush
    Ranjitkar, Prakash
    Hadas, Yuval
    TRANSPORTATION RESEARCH RECORD, 2014, (2418) : 74 - 83
  • [49] Performance evaluation of feature selection and tree-based algorithms for traffic classification
    Aouedi, Ons
    Piamrat, Kandaraj
    Parrein, Benoit
    2021 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS (ICC WORKSHOPS), 2021,
  • [50] An improved tree model based on ensemble feature selection for classification
    Mohan, Chandralekha
    Nagarajan, Shenbagavadivu
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (02) : 1290 - 1307