A fuzzy rough set-based undersampling approach for imbalanced data

被引:1
|
作者
Zhang, Xiao [1 ]
He, Zhaoqian [1 ]
Yang, Yanyan [2 ]
机构
[1] Xian Univ Technol, Dept Appl Math, 58 Yanxiang Rd, Xian 710054, Shanxi, Peoples R China
[2] Beijing Jiaotong Univ, Sch Software Engn, Beixiaguan Rd, Beijing 100044, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Fuzzy rough sets; Undersampling; Instance selection; CLASSIFIERS; REDUCTION;
D O I
10.1007/s13042-023-02064-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How to effectively handle imbalanced data is one of the hot issues in the fields of machine learning and data mining. Undersampling is a popular technique of dealing with imbalanced data. The aim of undersampling is to select an instance subset from the majority class of an imbalanced dataset and then make the dataset balanced. However, the traditional undersampling approaches may lead to the information loss of majority class instances. Therefore, on the basis of the concept of the importance degree of a fuzzy granule, a measure criterion of selecting representative instances from the majority class is presented in this paper by considering the fuzzy relations between the k-nearest neighbors of a majority class instance and the minority class instances. Then, we put forward an undersampling approach based on fuzzy rough sets (USFRS). With the proposed USFRS, the representativeness of the selected majority class instances can be guaranteed and the information loss due to undersampling can be reduced to the utmost extent. Furthermore, USFRS is compared with the relative undersampling methods, and the difference of the experimental results is analyzed by the statistic test. The experimental results demonstrate that USFRS performs well in classification for imbalanced data.
引用
收藏
页码:2799 / 2810
页数:12
相关论文
共 50 条
  • [1] A Novel Approach to Fuzzy Rough Set-Based Analysis of Information Systems
    Mieszkowicz-Rolka, Alicja
    Rolka, Leszek
    INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, PT IV, 2016, 432 : 173 - 183
  • [2] Fuzzy Rough Set-Based Unstructured Text Categorization
    Bharadwaj, Aditya
    Ramanna, Sheela
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2017, 2017, 10233 : 335 - 340
  • [3] Radial-Based Undersampling for imbalanced data classification
    Koziarski, Michal
    PATTERN RECOGNITION, 2020, 102
  • [4] Neighbourhood-based undersampling approach for handling imbalanced and overlapped data
    Vuttipittayamongkol, Pattaramon
    Elyan, Eyad
    INFORMATION SCIENCES, 2020, 509 : 47 - 70
  • [5] Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition: a fuzzy rough set approach
    Vluymans, Sarah
    Fernandez, Alberto
    Saeys, Yvan
    Cornelis, Chris
    Herrera, Francisco
    KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 56 (01) : 55 - 84
  • [6] Overlap-Based Undersampling for Improving Imbalanced Data Classification
    Vuttipittayamongkol, Pattaramon
    Elyan, Eyad
    Petrovski, Andrei
    Jayne, Chrisina
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2018, PT I, 2018, 11314 : 689 - 697
  • [7] Undersampling method based on minority class density for imbalanced data
    Sun, Zhongqiang
    Ying, Wenhao
    Zhang, Wenjin
    Gong, Shengrong
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [8] Intuitionistic Fuzzy Rough Set-Based Granular Structures and Attribute Subset Selection
    Tan, Anhui
    Wu, Wei-Zhi
    Qian, Yuhua
    Liang, Jiye
    Chen, Jinkun
    Li, Jinjin
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2019, 27 (03) : 527 - 539
  • [9] Clustering Based Undersampling for Effective Learning from Imbalanced Data: An Iterative Approach
    Bhattacharya R.
    De R.
    Chakraborty A.
    Sarkar R.
    SN Computer Science, 5 (4)
  • [10] SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory
    Ramentol, Enislay
    Caballero, Yaile
    Bello, Rafael
    Herrera, Francisco
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 33 (02) : 245 - 265