A fair-multicluster approach to clustering of categorical data

被引:1
|
作者
Santos-Mangudo, Carlos [1 ]
Heras, Antonio J. [1 ]
机构
[1] Univ Complutense Madrid, Financial & Actuarial Econ & Stat Dept, Campus Somosaguas S-N, Pozuelo De Alarcon 28223, Spain
关键词
Clustering; Fairness; Fair clustering; Categorical data; DATA SETS; ALGORITHM; INITIALIZATION;
D O I
10.1007/s10100-022-00824-2
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
In the last few years, the need of preventing classification biases due to race, gender, social status, etc. has increased the interest in designing fair clustering algorithms. The main idea is to ensure that the output of a cluster algorithm is not biased towards or against specific subgroups of the population. There is a growing specialized literature on this topic, dealing with the problem of clustering numerical data bases. Nevertheless, to our knowledge, there are no previous papers devoted to the problem of fair clustering of pure categorical attributes. In this paper, we show that the Multicluster methodology proposed by Santos and Heras (Interdiscip J Inf Knowl Manag 15:227-246, 2020. https://doi.org/10.28945/4643) for clustering categorical data, can be modified in order to increase the fairness of the clusters. Of course, there is a trade-off between fairness and efficiency, so that an increase in the fairness objective usually leads to a loss of classification efficiency. Yet it is possible to reach a reasonable compromise between these goals, since the methodology proposed by Santos and Heras (2020) can be easily adapted in order to get homogeneous and fair clusters.
引用
收藏
页码:583 / 604
页数:22
相关论文
共 50 条
  • [31] Squeezer: An efficient algorithm for clustering categorical data
    Zengyou He
    Xiaofei Xu
    Shengchun Deng
    Journal of Computer Science and Technology, 2002, 17 : 611 - 624
  • [32] EnsCat: clustering of categorical data via ensembling
    Bertrand S. Clarke
    Saeid Amiri
    Jennifer L. Clarke
    BMC Bioinformatics, 17
  • [33] Improved Clustering for Categorical Data with Genetic Algorithm
    Sharma, Abha
    Thakur, R. S.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MICROELECTRONICS, COMPUTING & COMMUNICATION SYSTEMS, MCCS 2015, 2018, 453 : 67 - 76
  • [34] Coercion: A Distributed Clustering Algorithm for Categorical Data
    Wang, Bin
    Zhou, Yang
    Hei, Xinhong
    2013 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2013, : 683 - 687
  • [35] An effective dissimilarity measure for clustering of high-dimensional categorical data
    Lee, Jeonghoon
    Lee, Yoon-Joon
    KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 38 (03) : 743 - 757
  • [36] Clustering of Categorical Data Using Intuitionistic Fuzzy k-modes
    Mehta, Darshan
    Tripathy, B. K.
    PROCEEDINGS OF SIXTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2016), VOL 1, 2017, 546 : 254 - 263
  • [37] Clustering Categorical Data: A Survey
    Naouali, Sami
    Ben Salem, Semeh
    Chtourou, Zied
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2020, 19 (01) : 49 - 96
  • [38] Many-objective fuzzy centroids clustering algorithm for categorical data
    Zhu, Shuwei
    Xu, Lihong
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 96 : 230 - 248
  • [39] Categorical data clustering: 25 years beyond K-modes
    Dinh, Tai
    Wong, Hauchi
    Fournier-Viger, Philippe
    Lisik, Daniil
    Ha, Minh-Quyet
    Dam, Hieu-Chi
    Huynh, Van-Nam
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 272
  • [40] Evaluation of Categorical Data Clustering
    Rezankova, Hana
    Loster, Tomas
    Husek, Dusan
    ADVANCES IN INTELLIGENT WEB MASTERING 3, 2011, 86 : 173 - 182