On the behaviour of permutation-based variable importance measures in random forest clustering

被引:6
|
作者
Nembrini, Stefano [1 ]
机构
[1] Univ Florida, Coll Med, Emerging Pathogens Inst, Dept Pathol, Gainesville, FL 32610 USA
关键词
random forest clustering; variable importance measures; variable selection;
D O I
10.1002/cem.3135
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Unsupervised random forest (RF) is a popular clustering method that can be implemented by artificially creating a two-class problem. Variable importance measures (VIMs) can be used to determine which variables are relevant for defining the RF dissimilarity, but they have not received as much attention as the supervised case. Here, I show that sampling schemes used in generating the artificial data-including the original one-can influence the behaviour of the permutation importance in a way that can affect conclusions on variable relevance and also propose a solution. Generating the artificial data using a Bayesian bootstrap keeps the desirable properties of the permutation VIM.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] The behaviour of random forest permutation-based variable importance measures under predictor correlation
    Kristin K Nicodemus
    James D Malley
    Carolin Strobl
    Andreas Ziegler
    BMC Bioinformatics, 11
  • [2] The behaviour of random forest permutation-based variable importance measures under predictor correlation
    Nicodemus, Kristin K.
    Malley, James D.
    Strobl, Carolin
    Ziegler, Andreas
    BMC BIOINFORMATICS, 2010, 11
  • [3] Permutation-based variable importance measures for unsupervised random forests
    Fouodo, Cesaire J. K.
    Koenig, Inke R.
    GENETIC EPIDEMIOLOGY, 2020, 44 (05) : 482 - 482
  • [4] Margin Based Permutation Variable Importance: a Stable Importance Measure for Random Forest
    Pei, Liu
    Lai, Yongxuan
    Piao, Peng
    Yang, Fan
    2017 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (IEEE ISKE), 2017,
  • [5] A PERMUTATION-BASED ALGORITHM FOR BLOCK CLUSTERING
    DUFFY, DE
    QUIROZ, AJ
    JOURNAL OF CLASSIFICATION, 1991, 8 (01) : 65 - 91
  • [6] Empirical characterization of random forest variable importance measures
    Archer, Kelfie J.
    Kirnes, Ryan V.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (04) : 2249 - 2260
  • [7] Efficient permutation testing of variable importance measures by the example of random forests
    Hapfelmeier, Alexander
    Hornung, Roman
    Haller, Bernhard
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2023, 181
  • [8] A fast permutation-based algorithm for block clustering
    I. Llatas
    A. J. Quiroz
    J. M. Renóm
    Test, 1997, 6 : 397 - 418
  • [9] A fast permutation-based algorithm for block clustering
    Llatas, I
    Quiroz, AJ
    Renom, JM
    TEST, 1997, 6 (02) : 397 - 418
  • [10] Detecting gene-gene interactions using a permutation-based random forest method
    Jing Li
    James D. Malley
    Angeline S. Andrew
    Margaret R. Karagas
    Jason H. Moore
    BioData Mining, 9