Ant-Based Feature and Instance Selection for Multiclass Imbalanced Data

被引：0

作者：

Villuendas-Rey, Yenny ^{[1
]}

Yanez-Marquez, Cornelio ^{[2
]}

Camacho-Nieto, Oscar ^{[1
]}

机构：

[1] Inst Politecn Nacl, Ctr Innovac & Desarrollo Tecnol Computo, Mexico City 07700, Mexico

[2] Inst Politecn Nacl, Ctr Invest Comp, Mexico City 07738, Mexico

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Feature extraction; Rough sets; Classification algorithms; Training; Metadata; Metaheuristics; Information systems; Nearest neighbor methods; Ant colony optimization; Algorithm design and theory; Multiclass imbalanced data; feature selection; instance selection; nearest neighbor; EVOLUTIONARY INSTANCE; ALGORITHMS; INFORMATION; SOFTWARE; SETS;

D O I：

10.1109/ACCESS.2024.3418669

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper introduces a novel algorithm called Ant-based Feature and Instance Selection. This new algorithm addresses the simultaneous selection of instances and features for mixed, incomplete, and imbalanced data in the context of lazy instance-based classifiers. The proposed algorithm uses a hybrid selection strategy based on metaheuristic procedures and Rough Sets. The Ant-based Feature and Instance Selection algorithm combines Ant Colony Optimization and Generic Extended Rough Sets for Mixed and Incomplete Information Systems. It has five stages: reduct computation, metadata computation, intelligent instance preprocessing, submatrices creation, and fusion. To test the performance of the proposed algorithm, we used 25 datasets from the Machine Learning repository of the University of California at Irvine. All these datasets are imbalanced, with multiple classes and represent real-world classification problems. The number of classes ranges between three and eight classes. Most of them also have mixed or incomplete descriptions. We used several performance measures and computed the Instance Retention ratio and the Feature Retention ratio. To determine the existence or not of significant differences in the performance of the compared algorithms, we used non-parametric hypothesis testing. The statistical analysis results confirm the high quality of the proposed algorithm for selecting features and instances in multiclass imbalanced data.

引用

页码：133952 / 133968

页数：17

共 50 条

[21] Ant-Based Computing
Michael, Loizos
ARTIFICIAL LIFE, 2009, 15 (03) : 337 - 349
[22] New data reduction algorithms based on the fusion of instance and feature selection
Kusy, Maciej
Zajdel, Roman
KNOWLEDGE-BASED SYSTEMS, 2024, 296
[23] Cluster-based sampling of multiclass imbalanced data
Prachuabsupakij, Wanthanee
Soonthornphisaj, Nuanwan
INTELLIGENT DATA ANALYSIS, 2014, 18 (06) : 1109 - 1135
[24] OligoIS: Scalable Instance Selection for Class-Imbalanced Data Sets
Garcia-Pedrajas, Nicolas
Perez-Rodriguez, Javier
de Haro-Garcia, Aida
IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (01) : 332 - 346
[25] An Embedded Feature Selection Method for Imbalanced Data Classification
Liu, Haoyue
Zhou, MengChu
Liu, Qing
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2019, 6 (03) : 703 - 715
[26] Feature Selection with Imbalanced Data for Software Defect Prediction
Khoshgoftaar, Taghi M.
Gao, Kehan
EIGHTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2009, : 235 - +
[27] Feature selection for high-dimensional imbalanced data
Yin, Liuzhi
Ge, Yong
Xiao, Keli
Wang, Xuehua
Quan, Xiaojun
NEUROCOMPUTING, 2013, 105 : 3 - 11
[28] An Embedded Feature Selection Method for Imbalanced Data Classification
Haoyue Liu
MengChu Zhou
Qing Liu
IEEE/CAA Journal of Automatica Sinica, 2019, 6 (03) : 703 - 715
[29] Feature Selection and Imbalanced Data Handling for Depression Detection
Mousavian, Marzieh
Chen, Jianhua
Greening, Steven
BRAIN INFORMATICS, BI 2018, 2018, 11309 : 349 - 358
[30] Feature Selection with High-Dimensional Imbalanced Data
Van Hulse, Jason
Khoshgoftaar, Taghi M.
Napolitano, Amri
Wald, Randall
2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 507 - 514

← 1 2 3 4 5 →