A fair-multicluster approach to clustering of categorical data

被引:1
|
作者
Santos-Mangudo, Carlos [1 ]
Heras, Antonio J. [1 ]
机构
[1] Univ Complutense Madrid, Financial & Actuarial Econ & Stat Dept, Campus Somosaguas S-N, Pozuelo De Alarcon 28223, Spain
关键词
Clustering; Fairness; Fair clustering; Categorical data; DATA SETS; ALGORITHM; INITIALIZATION;
D O I
10.1007/s10100-022-00824-2
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
In the last few years, the need of preventing classification biases due to race, gender, social status, etc. has increased the interest in designing fair clustering algorithms. The main idea is to ensure that the output of a cluster algorithm is not biased towards or against specific subgroups of the population. There is a growing specialized literature on this topic, dealing with the problem of clustering numerical data bases. Nevertheless, to our knowledge, there are no previous papers devoted to the problem of fair clustering of pure categorical attributes. In this paper, we show that the Multicluster methodology proposed by Santos and Heras (Interdiscip J Inf Knowl Manag 15:227-246, 2020. https://doi.org/10.28945/4643) for clustering categorical data, can be modified in order to increase the fairness of the clusters. Of course, there is a trade-off between fairness and efficiency, so that an increase in the fairness objective usually leads to a loss of classification efficiency. Yet it is possible to reach a reasonable compromise between these goals, since the methodology proposed by Santos and Heras (2020) can be easily adapted in order to get homogeneous and fair clusters.
引用
收藏
页码:583 / 604
页数:22
相关论文
共 50 条
  • [1] A fair-multicluster approach to clustering of categorical data
    Carlos Santos-Mangudo
    Antonio J. Heras
    Central European Journal of Operations Research, 2023, 31 : 583 - 604
  • [2] A multicluster approach to selecting initial sets for clustering of categorical data
    Santos-Mangudo C.
    Heras A.J.
    Santos-Mangudo, Carlos (casant01@ucm.es), 2020, Informing Science Institute (15) : 227 - 246
  • [3] Rough Set Approach for Categorical Data Clustering
    Herawan, Tutut
    Yanto, Iwan Tri Riyadi
    Deris, Mustafa Mat
    DATABASE THEORY AND APPLICATION, 2009, 64 : 179 - 186
  • [4] Clustering Categorical Data:A Cluster Ensemble Approach
    何增友
    High Technology Letters, 2003, (04) : 8 - 12
  • [5] A Link-Based Cluster Ensemble Approach for Categorical Data Clustering
    Iam-On, Natthakan
    Boongoen, Tossapon
    Garrett, Simon
    Price, Chris
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (03) : 413 - 425
  • [6] An Integrated Clustering Approach for High Dimensional Categorical Data
    Kalaivani, K.
    Raghavendra, A. P. V.
    2013 IEEE INTERNATIONAL CONFERENCE ON GREEN HIGH PERFORMANCE COMPUTING (ICGHPC), 2013,
  • [7] Categorical data clustering: A correlation-based approach for unsupervised attribute weighting
    Carbonera, Joel Luis
    Abel, Mara
    2014 IEEE 26TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2014, : 259 - 263
  • [8] A hybrid data transformation approach for privacy preserving clustering of categorical data
    Natarajan, A. M.
    Rajalaxmi, R. R.
    Uma, N.
    Kirubhakar, G.
    INNOVATIONS AND ADVANCED TECHNIQUES IN COMPUTER AND INFORMATION SCIENCES AND ENGINEERING, 2007, : 403 - 408
  • [9] Clustering categorical data based on the relational analysis approach and MapReduce
    Lamari Y.
    Slaoui S.C.
    Journal of Big Data, 2017, 4 (01)
  • [10] Generalized Similarity Measure for Categorical Data Clustering
    Sharma, Shruti
    Singh, Manoj
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 765 - 769