Feature selection for high-dimensional data using a multivariate search space reduction strategy based scatter search

被引:0
|
作者
Garcia-Torres, Miguel [1 ]
机构
[1] Univ Pablo de Olavide, Data Sci & Big Data Lab, Seville 41013, Spain
关键词
Feature selection; High-dimensional data; Scatter search; Feature grouping; Search space reduction; Multivariate symmetrical uncertainty; GENETIC ALGORITHM; SUBSET-SELECTION; CANCER; CLASSIFICATION; INFORMATION; RELEVANCE; EFFICIENT; MACHINE; NETWORK; TUMOR;
D O I
10.1007/s10732-025-09550-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In feature selection, the increasing of the dimensionality and the complexity of feature interactions make the problem challenging. Furthermore, searching for an optimal subset of features from a high-dimensional feature space is known to be an NP\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{N}\mathcal{P}$$\end{document}-hard problem. To improve the efficiency and effectiveness of the search algorithm, feature grouping has emerged as a way to reduce the search space by clustering features according to a measure. In this work we propose to reduce the search space by applying a greedy algorithm, called Multivariate Greedy Predominant Groups Generator (MGPGG). MGPGG extends the idea of the Greedy Predominant Groups Generator (GPGG) algorithm by taking into account feature interaction among three or more features. For this purpose, MGPGG uses the Multivariate Symmetrical Uncertainty (MSU) to group features that share information about the class label. We also propose a Scatter Search strategy that integrates MGPGG to find small subsets of features with high predictive power. The proposed algorithm, called Multivariate Predominant Group-based Scatter Search (MPGSS), is tested on high-dimensional data from biomedical and text-mining fields. The proposal is compared with state-of-the-art feature selection strategies. Results show that MPGSS is competitive since it is capable of finding small subsets of features while keeping high predictive classification models.
引用
收藏
页数:33
相关论文
共 50 条
  • [1] Evolutionary feature selection on high dimensional data using a search space reduction approach
    Garcia-Torres, Miguel
    Ruiz, Roberto
    Divina, Federico
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 117
  • [2] Search space division method for wrapper feature selection on high-dimensional data classification
    Chaudhuri, Abhilasha
    KNOWLEDGE-BASED SYSTEMS, 2024, 291
  • [3] Feature selection based on dynamic crow search algorithm for high-dimensional data classification
    Jiang, He
    Yang, Ye
    Wan, Qiuying
    Dong, Yao
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 250
  • [4] Ranking-based Feature Selection with Wrapper PSO Search in High-Dimensional Data Classification
    Saw, Thinzar
    Oo, Win Mar
    IAENG International Journal of Computer Science, 2023, 50 (01)
  • [5] Feature Subset Selection for High-Dimensional, Low Sampling Size Data Classification Using Ensemble Feature Selection With a Wrapper-Based Search
    Mandal, Ashis Kumar
    Nadim, MD.
    Saha, Hasi
    Sultana, Tangina
    Hossain, Md. Delowar
    Huh, Eui-Nam
    IEEE ACCESS, 2024, 12 : 62341 - 62357
  • [6] A Variable Granularity Search-Based Multiobjective Feature Selection Algorithm for High-Dimensional Data Classification
    Cheng, Fan
    Cui, Junjie
    Wang, Qijun
    Zhang, Lei
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2023, 27 (02) : 266 - 280
  • [7] Investigation on particle swarm optimisation for feature selection on high-dimensional data: local search and selection bias
    Binh Tran
    Xue, Bing
    Zhang, Mengjie
    Su Nguyen
    CONNECTION SCIENCE, 2016, 28 (03) : 270 - 294
  • [8] High-dimensional hybrid feature selection using interaction information-guided search
    Nakariyakul, Songyot
    KNOWLEDGE-BASED SYSTEMS, 2018, 145 : 59 - 66
  • [9] High-dimensional similarity search using data-sensitive space partitioning
    Kulkarni, Sachin
    Orlandic, Ratko
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, 4080 : 738 - 750
  • [10] Feature selection for high dimensional imbalanced class data using harmony search
    Moayedikia, Alireza
    Ong, Kok-Leong
    Boo, Yee Ling
    Yeoh, William G. S.
    Jensen, Richard
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2017, 57 : 38 - 49