HMC-ReliefF: Feature Ranking for Hierarchical Multi-label Classification

被引:13
作者
Slavkov, Ivica [1 ,2 ]
Karcheska, Jana [3 ]
Kocev, Dragi [4 ,5 ]
Dzeroski, Saso [4 ,5 ]
机构
[1] CRG, Barcelona, Spain
[2] UPF, Barcelona, Spain
[3] Univ Ss Cyril & Methodius, Skopje, Macedonia
[4] Jozef Stefan Inst, Dept Knowledge Technol, Ljubljana, Slovenia
[5] Jozef Stefan Int Postgrad Sch, Ljubljana, Slovenia
关键词
feature selection; feature ranking; structured data; hierarchical multi-label classification; ReliefF; FEATURE-SELECTION; ENSEMBLES;
D O I
10.2298/CSIS170115043S
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In machine learning, the growing complexity of the available data poses an increased challenge for its analysis. The rising complexity is both in terms of the data becoming more high-dimensional as well as the data having a more intricate structure. This emphasizes the need for developing machine learning algorithms that are able to tackle both the high-dimensionality and the complex structure of the data. Our work in this paper focuses on the development and analysis of the HMC-ReliefF algorithm, which is a feature relevance (ranking) algorithm for the task of Hierarchical Multi-label Classification (HMC). The basis of the algorithm is the RReliefF algorithm for regression that is adapted for hierarchical multi-label target variables. We perform an extensive experimental investigation of the HMC-ReliefF algorithm on several datasets from the domains of image annotation and functional genomics. We analyse the algorithm's performance in terms of accuracy in a filter-like setting and also in terms of ranking stability for various parameter values. The results show that the HMC-ReliefF can successfully detect relevant features from the data that can be further used for constructing accurate predictive models. Additionally, the stability analysis helps to determine the preferred parameter values for obtaining not just accurate, but also a stable algorithm output.
引用
收藏
页码:187 / 209
页数:23
相关论文
共 42 条
  • [1] Aleksovski D., 2009, 1st Workshop on Learning from Multi-Label Data (MLD) held in conjunction with ECML/PKDD, P5
  • [2] [Anonymous], 2003, THESIS
  • [3] [Anonymous], 1901, Bull. Soc. Vaudoise Sci. Nat
  • [4] [Anonymous], 1997, P 14 INT C MACH LEAR
  • [5] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [6] Selection of relevant features and examples in machine learning
    Blum, AL
    Langley, P
    [J]. ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) : 245 - 271
  • [7] Automatic medical image annotation in ImageCLEF 2007: Overview, results, and discussion
    Deselaers, Thomas
    Deserno, Thomas M.
    Mueller, Henning
    [J]. PATTERN RECOGNITION LETTERS, 2008, 29 (15) : 1988 - 1995
  • [8] Structured machine learning: the next ten years
    Dietterich, Thomas G.
    Domingos, Pedro
    Getoor, Lise
    Muggleton, Stephen
    Tadepalli, Prasad
    [J]. MACHINE LEARNING, 2008, 73 (01) : 3 - 23
  • [9] Dimitrovski I., 2008, P 11 INT MULT INF SO, P174
  • [10] Hierarchical classification of diatom images using ensembles of predictive clustering trees
    Dimitrovski, Ivica
    Kocev, Dragi
    Loskovska, Suzana
    Dzeroski, Saso
    [J]. ECOLOGICAL INFORMATICS, 2012, 7 (01) : 19 - 29