Feature selection based on artificial bee colony and gradient boosting decision tree

被引:387
作者
Rao, Haidi [1 ,2 ]
Shi, Xianzhang [1 ,2 ]
Rodrigue, Ahoussou Kouassi [1 ,2 ]
Feng, Juanjuan [1 ]
Xia, Yingchun [1 ]
Elhoseny, Mohamed [4 ]
Yuan, Xiaohui [3 ]
Gu, Lichuan [1 ,2 ]
机构
[1] Anhui Agr Univ, Coll Comp & Informat, Hefei 230036, Anhui, Peoples R China
[2] Minist Agr, Key Lab Agr Elect Commerce, Hefei 230036, Anhui, Peoples R China
[3] Univ North Texas, Dept Comp Sci & Engn, Denton, TX 76203 USA
[4] Mansoura Univ, Mansoura 35516, Egypt
基金
中国国家自然科学基金;
关键词
Bee colony algorithm; Decision tree; Feature selection; Dimensionality reduction; CANCER-DIAGNOSIS; FRAMEWORK; OPTIMIZATION; DEPENDENCY; ALGORITHM;
D O I
10.1016/j.asoc.2018.10.036
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data from many real-world applications can be high dimensional and features of such data are usually highly redundant. Identifying informative features has become an important step for data mining to not only circumvent the curse of dimensionality but to reduce the amount of data for processing. In this paper, we propose a novel feature selection method based on bee colony and gradient boosting decision tree aiming at addressing problems such as efficiency and informative quality of the selected features. Our method achieves global optimization of the inputs of the decision tree using the bee colony algorithm to identify the informative features. The method initializes the feature space spanned by the dataset. Less relevant features are suppressed according to the information they contribute to the decision making using an artificial bee colony algorithm. Experiments are conducted with two breast cancer datasets and six datasets from the public data repository. Experimental results demonstrate that the proposed method effectively reduces the dimensions of the dataset and achieves superior classification accuracy using the selected features. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:634 / 642
页数:9
相关论文
共 35 条
  • [1] Text feature selection using ant colony optimization
    Aghdam, Mehdi Hosseinzadeh
    Ghasem-Aghaee, Nasser
    Basiri, Mohammad Ehsan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 6843 - 6853
  • [2] Breast cancer diagnosis using GA feature selection and Rotation Forest
    Alickovic, Emina
    Subasi, Abdulhamit
    [J]. NEURAL COMPUTING & APPLICATIONS, 2017, 28 (04) : 753 - 763
  • [3] Alsaffar A, 2014, I C INF TECH MULTIM, P270, DOI 10.1109/ICIMU.2014.7066643
  • [4] [Anonymous], 2016, KDD16 P 22 ACM, DOI DOI 10.1145/2939672.2939785
  • [5] Bi J., 2003, Journal of Machine Learning Research, V3, P1229, DOI 10.1162/153244303322753643
  • [6] A review of feature selection methods on synthetic data
    Bolon-Canedo, Veronica
    Sanchez-Marono, Noelia
    Alonso-Betanzos, Amparo
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (03) : 483 - 519
  • [7] [程美英 Cheng Meiying], 2014, [模式识别与人工智能, Pattern Recognition and Artificial Intelligence], V27, P1005
  • [8] Consistency-based search in feature selection
    Dash, M
    Liu, HA
    [J]. ARTIFICIAL INTELLIGENCE, 2003, 151 (1-2) : 155 - 176
  • [9] Farmer M. E., 2004, P 17 INT C PATT REC
  • [10] Greedy function approximation: A gradient boosting machine
    Friedman, JH
    [J]. ANNALS OF STATISTICS, 2001, 29 (05) : 1189 - 1232