A fair-multicluster approach to clustering of categorical data

被引:0
|
作者
Carlos Santos-Mangudo
Antonio J. Heras
机构
[1] Complutense University of Madrid,Financial and Actuarial Economics and Statistics Department
来源
Central European Journal of Operations Research | 2023年 / 31卷
关键词
Clustering; Fairness; Fair clustering; Categorical data;
D O I
暂无
中图分类号
学科分类号
摘要
In the last few years, the need of preventing classification biases due to race, gender, social status, etc. has increased the interest in designing fair clustering algorithms. The main idea is to ensure that the output of a cluster algorithm is not biased towards or against specific subgroups of the population. There is a growing specialized literature on this topic, dealing with the problem of clustering numerical data bases. Nevertheless, to our knowledge, there are no previous papers devoted to the problem of fair clustering of pure categorical attributes. In this paper, we show that the Multicluster methodology proposed by Santos and Heras (Interdiscip J Inf Knowl Manag 15:227–246, 2020. https://doi.org/10.28945/4643) for clustering categorical data, can be modified in order to increase the fairness of the clusters. Of course, there is a trade-off between fairness and efficiency, so that an increase in the fairness objective usually leads to a loss of classification efficiency. Yet it is possible to reach a reasonable compromise between these goals, since the methodology proposed by Santos and Heras (2020) can be easily adapted in order to get homogeneous and fair clusters.
引用
收藏
页码:583 / 604
页数:21
相关论文
共 50 条
  • [21] Squeezer: An efficient algorithm for clustering categorical data
    Zengyou He
    Xiaofei Xu
    Shengchun Deng
    Journal of Computer Science and Technology, 2002, 17 : 611 - 624
  • [22] EnsCat: clustering of categorical data via ensembling
    Bertrand S. Clarke
    Saeid Amiri
    Jennifer L. Clarke
    BMC Bioinformatics, 17
  • [23] Improved Clustering for Categorical Data with Genetic Algorithm
    Sharma, Abha
    Thakur, R. S.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MICROELECTRONICS, COMPUTING & COMMUNICATION SYSTEMS, MCCS 2015, 2018, 453 : 67 - 76
  • [24] The Performance of Objective Functions for Clustering Categorical Data
    Xiang, Zhengrong
    Islam, Md Zahidul
    KNOWLEDGE MANAGEMENT AND ACQUISITION FOR SMART SYSTEMS AND SERVICES, PKAW 2014, 2014, 8863 : 16 - 28
  • [25] Coercion: A Distributed Clustering Algorithm for Categorical Data
    Wang, Bin
    Zhou, Yang
    Hei, Xinhong
    2013 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2013, : 683 - 687
  • [26] A Roughset Based Data Labeling Method for Clustering Categorical Data
    Reddy, H. Venkateswara
    Raju, S. Viswanadha
    2014 3RD INTERNATIONAL CONFERENCE ON ECO-FRIENDLY COMPUTING AND COMMUNICATION SYSTEMS (ICECCS 2014), 2014, : 51 - 55
  • [27] Mixture of Networks for Clustering Categorical Data: A Penalized Composite Likelihood Approach
    Baek, Jangsun
    Park, Jeong-Soo
    AMERICAN STATISTICIAN, 2023, 77 (03) : 259 - 273
  • [28] An Efficient Approach for Clustering US Census Data Based on Cluster Similarity Using Rough Entropy on Categorical Data
    Sreenivasulu, G.
    Raju, S. Viswanadha
    Rao, N. Sambasiva
    INFORMATION AND COMMUNICATION TECHNOLOGY FOR COMPETITIVE STRATEGIES, 2019, 40 : 359 - 375
  • [29] Rough set approach for clustering categorical data using information-theoretic dependency measure
    Park, In-Kyoo
    Choi, Gyoo-Seok
    INFORMATION SYSTEMS, 2015, 48 : 289 - 295
  • [30] CLUSTERING CATEGORICAL DATA BASED ON COMBINATIONS OF ATTRIBUTE VALUES
    Do, Hee-Jung
    Kim, Jae Yearn
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2009, 5 (12A): : 4393 - 4405