Tree-based ensemble methods and their applications in analytical chemistry

被引:23
作者
Cao, Dong-Sheng [1 ]
Xu, Qing-Song [2 ]
Zhang, Liang-Xiao [3 ]
Huang, Jian-Hua [1 ]
Liang, Yi-Zeng [1 ]
机构
[1] Cent South Univ, Res Ctr Modernizat Tradit Chinese Med, Changsha 410083, Peoples R China
[2] Cent South Univ, Sch Math & Stat, Changsha 410083, Peoples R China
[3] Chinese Acad Sci, Dalian Inst Chem Phys, Key Lab Separat Sci Analyt Chem, Dalian 116023, Peoples R China
基金
中国国家自然科学基金;
关键词
Chemometrics; Classification and regression tree (CART); Cluster analysis; Complex data; Ensemble algorithm; Kernel method; Outlier detection; Pattern analysis; Tree-based ensemble; Variable selection; MULTIVARIATE REGRESSION TREES; RANDOM FOREST; OUTLIER DETECTION; FEATURE-SELECTION; COMPOUND CLASSIFICATION; DECISION TREES; PREDICTION; ALGORITHM; TOOL; ELIMINATION;
D O I
10.1016/j.trac.2012.07.012
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Large amounts of data from high-throughput analytical instruments have generally become more and more complex, bringing a number of challenges to statistical modeling. To understand complex data further, new statistically-efficient approaches are urgently needed to: (1) select salient features from the data; (2) discard uninformative data; (3) detect outlying samples in data; (4) visualize existing patterns of the data; (5) improve the prediction accuracy of the data; and, finally, (6) feed back to the analyst understandable summaries of information from the data. We review current developments in tree-based ensemble methods to mine effectively the knowledge hidden in chemical and biology data. We report on applications of these algorithms to variable selection, outlier detection, supervised pattern analysis, cluster analysis, and tree-based kernel and ensemble learning. Through this report, we wish to inspire chemists to take greater interest in decision trees and to obtain greater benefits from using the tree-based ensemble techniques. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:158 / 167
页数:10
相关论文
共 57 条
  • [1] Empirical comparison of tree ensemble variable importance measures
    Auret, Lidia
    Aldrich, Chris
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2011, 105 (02) : 157 - 170
  • [2] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
    Blewitt, Marnie E.
    Gendrel, Anne-Valerie
    Pang, Zhenyi
    Sparrow, Duncan B.
    Whitelaw, Nadia
    Craig, Jeffrey M.
    Apedaile, Anwyn
    Hilton, Douglas J.
    Dunwoodie, Sally L.
    Brockdorff, Neil
    Kay, Graham F.
    Whitelaw, Emma
    [J]. NATURE GENETICS, 2008, 40 (05) : 663 - 669
  • [3] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [4] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [5] Brown SD, 2009, COMPREHENSIVE CHEMOMETRICS: CHEMICAL AND BIOCHEMICAL DATA ANALYSIS, VOLS 1-4, pB541
  • [6] In silico toxicity prediction by support vector machine and SMILES representation-based string kernel
    Cao, D. -S.
    Zhao, J. -C.
    Yang, Y. -N.
    Zhao, C. -X.
    Yan, J.
    Liu, S.
    Hu, Q. -N.
    Xu, Q. -S.
    Liang, Y. -Z.
    [J]. SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2012, 23 (1-2) : 141 - 153
  • [7] Kernel k-nearest neighbor algorithm as a flexible SAR modeling tool
    Cao, Dong-Sheng
    Huang, Jian-Hua
    Yan, Jun
    Zhang, Liang-Xiao
    Hu, Qian-Nan
    Xu, Qing-Song
    Liang, Yi-Zeng
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2012, 114 : 19 - 23
  • [8] Computer-aided prediction of toxicity with substructure pattern and random forest
    Cao, Dong-Sheng
    Yang, Yan-Ning
    Zhao, Jian-Chao
    Yan, Jun
    Liu, Shao
    Hu, Qian-Nan
    Xu, Qing-Song
    Liang, Yi-Zeng
    [J]. JOURNAL OF CHEMOMETRICS, 2012, 26 (01) : 7 - 15
  • [9] A novel kernel Fisher discriminant analysis: Constructing informative kernel by decision tree ensemble for metabolomics data analysis
    Cao, Dong-Sheng
    Zeng, Mao-Mao
    Yi, Lun-Zhao
    Wang, Bing
    Xu, Qing-Song
    Hu, Qian-Nan
    Zhang, Liang-Xiao
    Lu, Hong-Mei
    Liang, Yi-Zeng
    [J]. ANALYTICA CHIMICA ACTA, 2011, 706 (01) : 97 - 104
  • [10] In silico classification of human maximum recommended daily dose based on modified random forest and substructure fingerprint
    Cao, Dong-Sheng
    Hu, Qian-Nan
    Xu, Qing-Song
    Yang, Yan-Ning
    Zhao, Jian-Chao
    Lu, Hong-Mei
    Zhang, Liang-Xiao
    Liang, Yi-Zeng
    [J]. ANALYTICA CHIMICA ACTA, 2011, 692 (1-2) : 50 - 56