Ant-Based Feature and Instance Selection for Multiclass Imbalanced Data

被引:0
|
作者
Villuendas-Rey, Yenny [1 ]
Yanez-Marquez, Cornelio [2 ]
Camacho-Nieto, Oscar [1 ]
机构
[1] Inst Politecn Nacl, Ctr Innovac & Desarrollo Tecnol Computo, Mexico City 07700, Mexico
[2] Inst Politecn Nacl, Ctr Invest Comp, Mexico City 07738, Mexico
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Feature extraction; Rough sets; Classification algorithms; Training; Metadata; Metaheuristics; Information systems; Nearest neighbor methods; Ant colony optimization; Algorithm design and theory; Multiclass imbalanced data; feature selection; instance selection; nearest neighbor; EVOLUTIONARY INSTANCE; ALGORITHMS; INFORMATION; SOFTWARE; SETS;
D O I
10.1109/ACCESS.2024.3418669
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces a novel algorithm called Ant-based Feature and Instance Selection. This new algorithm addresses the simultaneous selection of instances and features for mixed, incomplete, and imbalanced data in the context of lazy instance-based classifiers. The proposed algorithm uses a hybrid selection strategy based on metaheuristic procedures and Rough Sets. The Ant-based Feature and Instance Selection algorithm combines Ant Colony Optimization and Generic Extended Rough Sets for Mixed and Incomplete Information Systems. It has five stages: reduct computation, metadata computation, intelligent instance preprocessing, submatrices creation, and fusion. To test the performance of the proposed algorithm, we used 25 datasets from the Machine Learning repository of the University of California at Irvine. All these datasets are imbalanced, with multiple classes and represent real-world classification problems. The number of classes ranges between three and eight classes. Most of them also have mixed or incomplete descriptions. We used several performance measures and computed the Instance Retention ratio and the Feature Retention ratio. To determine the existence or not of significant differences in the performance of the compared algorithms, we used non-parametric hypothesis testing. The statistical analysis results confirm the high quality of the proposed algorithm for selecting features and instances in multiclass imbalanced data.
引用
收藏
页码:133952 / 133968
页数:17
相关论文
共 50 条
  • [31] Default forecasting based on a novel group feature selection method for imbalanced data
    Chi, Guotai
    Xing, Jin
    Pan, Ancheng
    JOURNAL OF CREDIT RISK, 2023, 19 (03): : 51 - 77
  • [32] A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets
    Fernandez, Alberto
    Jose Carmona, Cristobal
    Jose del Jesus, Maria
    Herrera, Francisco
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2017, 27 (06)
  • [33] Ant-based service selection framework for a smart home monitoring environment
    M. Shamim Hossain
    S. K. Alamgir Hossain
    Atif Alamri
    M. Anwar Hossain
    Multimedia Tools and Applications, 2013, 67 : 433 - 453
  • [34] Ant-based service selection framework for a smart home monitoring environment
    Hossain, M. Shamim
    Hossain, S. K. Alamgir
    Alamri, Atif
    Hossain, M. Anwar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2013, 67 (02) : 433 - 453
  • [35] Radial-Based Oversampling for Multiclass Imbalanced Data Classification
    Krawczyk, Bartosz
    Koziarski, Michal
    Wozniak, Michal
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (08) : 2818 - 2831
  • [36] Ant-based Data Traffic Splitting for application-based routing
    Schulz, Joerg
    INNOVATIVE INTERNET COMMUNITY SYSTEMS, 2006, 3473 : 49 - 58
  • [37] Resampling approach for imbalanced data classification based on class instance density per feature value intervals
    Wang, Fei
    Zheng, Ming
    Ma, Kai
    Hu, Xiaowen
    INFORMATION SCIENCES, 2025, 692
  • [38] A Voronoi Diagram Based Classifier for Multiclass Imbalanced Data Sets
    Silva, Evandro J. R.
    Zanchettin, Cleber
    PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 109 - 114
  • [39] Exploring ant-based algorithms for gene expression data analysis
    He, Yulan
    Hui, Siu Cheung
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2009, 47 (02) : 105 - 119
  • [40] Ant-based IP traceback
    Lai, Gu Hsin
    Chen, Chia-Mei
    Jeng, Bing-Chiang
    Chao, Willanis
    EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (04) : 3071 - 3080