Ant-Based Feature and Instance Selection for Multiclass Imbalanced Data

被引:0
作者
Villuendas-Rey, Yenny [1 ]
Yanez-Marquez, Cornelio [2 ]
Camacho-Nieto, Oscar [1 ]
机构
[1] Inst Politecn Nacl, Ctr Innovac & Desarrollo Tecnol Computo, Mexico City 07700, Mexico
[2] Inst Politecn Nacl, Ctr Invest Comp, Mexico City 07738, Mexico
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Feature extraction; Rough sets; Classification algorithms; Training; Metadata; Metaheuristics; Information systems; Nearest neighbor methods; Ant colony optimization; Algorithm design and theory; Multiclass imbalanced data; feature selection; instance selection; nearest neighbor; EVOLUTIONARY INSTANCE; ALGORITHMS; INFORMATION; SOFTWARE; SETS;
D O I
10.1109/ACCESS.2024.3418669
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces a novel algorithm called Ant-based Feature and Instance Selection. This new algorithm addresses the simultaneous selection of instances and features for mixed, incomplete, and imbalanced data in the context of lazy instance-based classifiers. The proposed algorithm uses a hybrid selection strategy based on metaheuristic procedures and Rough Sets. The Ant-based Feature and Instance Selection algorithm combines Ant Colony Optimization and Generic Extended Rough Sets for Mixed and Incomplete Information Systems. It has five stages: reduct computation, metadata computation, intelligent instance preprocessing, submatrices creation, and fusion. To test the performance of the proposed algorithm, we used 25 datasets from the Machine Learning repository of the University of California at Irvine. All these datasets are imbalanced, with multiple classes and represent real-world classification problems. The number of classes ranges between three and eight classes. Most of them also have mixed or incomplete descriptions. We used several performance measures and computed the Instance Retention ratio and the Feature Retention ratio. To determine the existence or not of significant differences in the performance of the compared algorithms, we used non-parametric hypothesis testing. The statistical analysis results confirm the high quality of the proposed algorithm for selecting features and instances in multiclass imbalanced data.
引用
收藏
页码:133952 / 133968
页数:17
相关论文
共 66 条
  • [1] To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques
    Abdi, Lida
    Hashemi, Sattar
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) : 238 - 251
  • [2] To combat multi-class imbalanced problems by means of over-sampling and boosting techniques
    Abdi, Lida
    Hashemi, Sattar
    [J]. SOFT COMPUTING, 2015, 19 (12) : 3369 - 3385
  • [3] Agrawal A, 2015, 2015 7TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (IC3K), P226
  • [4] Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
  • [5] Cognizable crime rate prediction and analysis under Indian penal code using deep learning with novel optimization approach
    Aziz, Rabia Musheer
    Hussain, Aftab
    Sharma, Prajwal
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 22663 - 22700
  • [6] Multivariate comparison of classification performance measures
    Ballabio, Davide
    Grisoni, Francesca
    Todeschini, Roberto
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2018, 174 : 33 - 44
  • [7] Batista GEAPA., 2004, ACM SIGKDD EXPL NEWS, V6, P20, DOI [DOI 10.1145/1007730.1007735, 10.1145/1007730.1007735, 10.1145/1007730.1007735.2]
  • [8] l2,1 norm regularized multi-kernel based joint nonlinear feature selection and over-sampling for imbalanced data classification
    Cao, Peng
    Liu, Xiaoli
    Zhang, Jian
    Zhao, Dazhe
    Huang, Min
    Zaiane, Osmar
    [J]. NEUROCOMPUTING, 2017, 234 : 38 - 57
  • [9] A multi-kernel based framework for heterogeneous feature selection and over-sampling for computer-aided detection of pulmonary nodules
    Cao, Peng
    Liu, Xiaoli
    Yang, Jinzhu
    Zhao, Dazhe
    Li, Wei
    Huang, Min
    Zaiane, Osmar
    [J]. PATTERN RECOGNITION, 2017, 64 : 327 - 346
  • [10] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)