Feature ranking for semi-supervised learning

被引:6
作者
Petkovic, Matej [1 ,2 ]
Dzeroski, Saso [1 ,2 ]
Kocev, Dragi [1 ,2 ]
机构
[1] Jozef Stefan Inst, Jamova 39, Ljubljana 1000, Slovenia
[2] Jozef Stefan Int Postgrad Sch, Jamova 39, Ljubljana 1000, Slovenia
关键词
Feature ranking; Semi-supervised learning; Tree ensembles; Relief; Structured output prediction; Multi-target prediction; FEATURE-SELECTION; CLASSIFICATION; TREES;
D O I
10.1007/s10994-022-06181-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The data used for analysis are becoming increasingly complex along several directions: high dimensionality, number of examples and availability of labels for the examples. This poses a variety of challenges for the existing machine learning methods, related to analyzing datasets with a large number of examples that are described in a high-dimensional space, where not all examples have labels provided. For example, when investigating the toxicity of chemical compounds, there are many compounds available that can be described with information-rich high-dimensional representations, but not all of the compounds have information on their toxicity. To address these challenges, we propose methods for semi-supervised learning (SSL) of feature rankings. The feature rankings are learned in the context of classification and regression, as well as in the context of structured output prediction (multi-label classification, MLC, hierarchical multi-label classification, HMLC and multi-target regression, MTR) tasks. This is the first work that treats the task of feature ranking uniformly across various tasks of semi-supervised structured output prediction. To the best of our knowledge, it is also the first work on SSL of feature rankings for the tasks of HMLC and MTR. More specifically, we propose two approaches-based on predictive clustering tree ensembles and the Relief family of algorithms-and evaluate their performance across 38 benchmark datasets. The extensive evaluation reveals that rankings based on Random Forest ensembles perform the best for classification tasks (incl. MLC and HMLC tasks) and are the fastest for all tasks, while ensembles based on extremely randomized trees work best for the regression tasks. Semi-supervised feature rankings outperform their supervised counterparts across the majority of datasets for all of the different tasks, showing the benefit of using unlabeled in addition to labeled data.
引用
收藏
页码:4379 / 4408
页数:30
相关论文
共 68 条
  • [1] Soft-constrained Laplacian score for semi-supervised multi-label feature selection
    Alalga, Abdelouahid
    Benabdeslem, Khalid
    Taleb, Nora
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 47 (01) : 75 - 98
  • [2] [Anonymous], 2008, ISMIR
  • [3] Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
  • [4] A semi-supervised feature ranking method with ensemble learning
    Bellal, Fazia
    Elghazel, Haytham
    Aussem, Alex
    [J]. PATTERN RECOGNITION LETTERS, 2012, 33 (10) : 1426 - 1433
  • [5] An unsupervised technique for optimal feature selection in attribute profiles for spectral-spatial classification of hyperspectral images
    Bhardwaj, Kaushal
    Patra, Swarnajyoti
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2018, 138 : 139 - 150
  • [6] Bilken University, 2020, FUNCT APPR REP
  • [7] Top-down induction of first-order logical decision trees
    Blockeel, H
    De Raedt, L
    [J]. ARTIFICIAL INTELLIGENCE, 1998, 101 (1-2) : 285 - 297
  • [8] Learning multi-label scene classification
    Boutell, MR
    Luo, JB
    Shen, XP
    Brown, CM
    [J]. PATTERN RECOGNITION, 2004, 37 (09) : 1757 - 1771
  • [9] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [10] THE 9TH ANNUAL MLSP COMPETITION: NEW METHODS FOR ACOUSTIC CLASSIFICATION OF MULTIPLE SIMULTANEOUS BIRD SPECIES IN A NOISY ENVIRONMENT
    Briggs, Forrest
    Huang, Yonghong
    Raich, Raviv
    Eftaxias, Konstantinos
    Lei, Zhong
    Cukierski, William
    Hadley, Sarah Frey
    Hadley, Adam
    Betts, Matthew
    Fern, Xiaoli Z.
    Irvine, Jed
    Neal, Lawrence
    Thomas, Anil
    Fodor, Gabor
    Tsoumakas, Grigorios
    Ng, Hong Wei
    Thi Ngoc Tho Nguyen
    Huttunen, Heikki
    Ruusuvuori, Pekka
    Manninen, Tapio
    Diment, Aleksandr
    Virtanen, Tuomas
    Marzat, Julien
    Defretin, Joseph
    Callender, Dave
    Hurlburt, Chris
    Larrey, Ken
    Milakov, Maxim
    [J]. 2013 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2013,