A fair-multicluster approach to clustering of categorical data

被引:0
|
作者
Carlos Santos-Mangudo
Antonio J. Heras
机构
[1] Complutense University of Madrid,Financial and Actuarial Economics and Statistics Department
来源
Central European Journal of Operations Research | 2023年 / 31卷
关键词
Clustering; Fairness; Fair clustering; Categorical data;
D O I
暂无
中图分类号
学科分类号
摘要
In the last few years, the need of preventing classification biases due to race, gender, social status, etc. has increased the interest in designing fair clustering algorithms. The main idea is to ensure that the output of a cluster algorithm is not biased towards or against specific subgroups of the population. There is a growing specialized literature on this topic, dealing with the problem of clustering numerical data bases. Nevertheless, to our knowledge, there are no previous papers devoted to the problem of fair clustering of pure categorical attributes. In this paper, we show that the Multicluster methodology proposed by Santos and Heras (Interdiscip J Inf Knowl Manag 15:227–246, 2020. https://doi.org/10.28945/4643) for clustering categorical data, can be modified in order to increase the fairness of the clusters. Of course, there is a trade-off between fairness and efficiency, so that an increase in the fairness objective usually leads to a loss of classification efficiency. Yet it is possible to reach a reasonable compromise between these goals, since the methodology proposed by Santos and Heras (2020) can be easily adapted in order to get homogeneous and fair clusters.
引用
收藏
页码:583 / 604
页数:21
相关论文
共 50 条
  • [41] Summarizing categorical data by clustering attributes
    Michael Mampaey
    Jilles Vreeken
    Data Mining and Knowledge Discovery, 2013, 26 : 130 - 173
  • [42] Clustering categorical data in projected spaces
    Mohamed Bouguessa
    Data Mining and Knowledge Discovery, 2015, 29 : 3 - 38
  • [43] Fuzzy rough clustering for categorical data
    Shuliang Xu
    Shenglan Liu
    Jian Zhou
    Lin Feng
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 3213 - 3223
  • [44] Weighted Topological Clustering for Categorical Data
    Rogovschi, Nicoleta
    Nadif, Mohamed
    NEURAL INFORMATION PROCESSING, PT I, 2011, 7062 : 599 - +
  • [45] Clustering categorical data in projected spaces
    Bouguessa, Mohamed
    DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 29 (01) : 3 - 38
  • [46] Incremental Clustering for Categorical Data Using Clustering Ensemble
    Li Taoying
    Chne Yan
    Qu Lili
    Mu Xiangwei
    PROCEEDINGS OF THE 29TH CHINESE CONTROL CONFERENCE, 2010, : 2519 - 2524
  • [47] Fuzzy rough clustering for categorical data
    Xu, Shuliang
    Liu, Shenglan
    Zhou, Jian
    Feng, Lin
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (11) : 3213 - 3223
  • [48] A New Stratified Immune Based Approach for Clustering High Dimensional Categorical Data
    Narayana, G. Surya
    Vasumathi, D.
    Prasanna, K.
    EMERGING TRENDS IN ELECTRICAL, COMMUNICATIONS AND INFORMATION TECHNOLOGIES, 2017, 394 : 141 - 150
  • [49] A fuzzy k-modes algorithm for clustering categorical data
    Huang, ZX
    Ng, MK
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 1999, 7 (04) : 446 - 452
  • [50] Understanding and Enhancement of Internal Clustering Validation Indexes for Categorical Data
    Gao, Xuedong
    Yang, Minghan
    ALGORITHMS, 2018, 11 (11)