A fair-multicluster approach to clustering of categorical data

被引:1
|
作者
Santos-Mangudo, Carlos [1 ]
Heras, Antonio J. [1 ]
机构
[1] Univ Complutense Madrid, Financial & Actuarial Econ & Stat Dept, Campus Somosaguas S-N, Pozuelo De Alarcon 28223, Spain
关键词
Clustering; Fairness; Fair clustering; Categorical data; DATA SETS; ALGORITHM; INITIALIZATION;
D O I
10.1007/s10100-022-00824-2
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
In the last few years, the need of preventing classification biases due to race, gender, social status, etc. has increased the interest in designing fair clustering algorithms. The main idea is to ensure that the output of a cluster algorithm is not biased towards or against specific subgroups of the population. There is a growing specialized literature on this topic, dealing with the problem of clustering numerical data bases. Nevertheless, to our knowledge, there are no previous papers devoted to the problem of fair clustering of pure categorical attributes. In this paper, we show that the Multicluster methodology proposed by Santos and Heras (Interdiscip J Inf Knowl Manag 15:227-246, 2020. https://doi.org/10.28945/4643) for clustering categorical data, can be modified in order to increase the fairness of the clusters. Of course, there is a trade-off between fairness and efficiency, so that an increase in the fairness objective usually leads to a loss of classification efficiency. Yet it is possible to reach a reasonable compromise between these goals, since the methodology proposed by Santos and Heras (2020) can be easily adapted in order to get homogeneous and fair clusters.
引用
收藏
页码:583 / 604
页数:22
相关论文
共 50 条
  • [21] Weighted Topological Clustering for Categorical Data
    Rogovschi, Nicoleta
    Nadif, Mohamed
    NEURAL INFORMATION PROCESSING, PT I, 2011, 7062 : 599 - +
  • [22] Clustering categorical data in projected spaces
    Bouguessa, Mohamed
    DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 29 (01) : 3 - 38
  • [23] Fuzzy rough clustering for categorical data
    Xu, Shuliang
    Liu, Shenglan
    Zhou, Jian
    Feng, Lin
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (11) : 3213 - 3223
  • [24] Low Dimensional Representation of Space Structure and Clustering of Categorical Data
    Cao, Jianjun
    Zheng, Qibin
    Diao, Xingchun
    Weng, Nianfeng
    2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS, 2018, : 1079 - 1086
  • [25] Clustering Categorical Data Using a Swarm-based Method
    Izakian, Hesam
    Abraham, Ajith
    Snasel, Vaclav
    2009 WORLD CONGRESS ON NATURE & BIOLOGICALLY INSPIRED COMPUTING (NABIC 2009), 2009, : 1719 - +
  • [26] From Context to Distance: Learning Dissimilarity for Categorical Data Clustering
    Ienco, Dino
    Pensa, Ruggero G.
    Meo, Rosa
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2012, 6 (01)
  • [27] Kernel Subspace Clustering Algorithm for Categorical Data
    Xu K.-P.
    Chen L.-F.
    Sun H.-J.
    Wang B.-Z.
    Ruan Jian Xue Bao/Journal of Software, 2020, 31 (11): : 3492 - 3505
  • [28] The performance of objective functions for clustering categorical data
    Xiang, Zhengrong
    Islam, Md Zahidul
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8863 : 16 - 28
  • [29] Squeezer: An efficient algorithm for clustering categorical data
    He, ZY
    Xu, XF
    Deng, SC
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2002, 17 (05) : 611 - 624
  • [30] Clustering categorical data based on distance vectors
    Zhang, P
    Wang, XG
    Song, PXK
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) : 355 - 367