A fair-multicluster approach to clustering of categorical data

被引：0

作者：

Carlos Santos-Mangudo

Antonio J. Heras

机构：

[1] Complutense University of Madrid,Financial and Actuarial Economics and Statistics Department

来源：

Central European Journal of Operations Research | 2023年 / 31卷

关键词：

Clustering; Fairness; Fair clustering; Categorical data;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In the last few years, the need of preventing classification biases due to race, gender, social status, etc. has increased the interest in designing fair clustering algorithms. The main idea is to ensure that the output of a cluster algorithm is not biased towards or against specific subgroups of the population. There is a growing specialized literature on this topic, dealing with the problem of clustering numerical data bases. Nevertheless, to our knowledge, there are no previous papers devoted to the problem of fair clustering of pure categorical attributes. In this paper, we show that the Multicluster methodology proposed by Santos and Heras (Interdiscip J Inf Knowl Manag 15:227–246, 2020. https://doi.org/10.28945/4643) for clustering categorical data, can be modified in order to increase the fairness of the clusters. Of course, there is a trade-off between fairness and efficiency, so that an increase in the fairness objective usually leads to a loss of classification efficiency. Yet it is possible to reach a reasonable compromise between these goals, since the methodology proposed by Santos and Heras (2020) can be easily adapted in order to get homogeneous and fair clusters.

引用

页码：583 / 604

页数：21

共 50 条

[31] Multiobjective clustering algorithm with fuzzy centroids for categorical data
Zhou Z.
Zhu S.
Zhang D.
1600, Science Press (53): : 2594 - 2606
[32] Performances of parallel clustering algorithm for categorical and mixed data
Hai, NTM
Susumu, H
PARALLEL AND DISTRIBUTED COMPUTING: APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2004, 3320 : 252 - 256
[33] A k-populations algorithm for clustering categorical data
Kim, DW
Lee, K
Lee, D
Lee, KH
PATTERN RECOGNITION, 2005, 38 (07) : 1131 - 1134
[34] Categorical data clustering: What similarity measure to recommend?
dos Santos, Tiago R. L.
Zarate, Luis E.
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (03) : 1247 - 1260
[35] A Framework for Clustering Massive Text and Categorical Data Streams
Aggarwal, Charu C.
Yu, Philip S.
PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, : 479 - 483
[36] Learning-Based Dissimilarity for Clustering Categorical Data
Rivera Rios, Edgar Jacob
Angel Medina-Perez, Miguel
Lazo-Cortes, Manuel S.
Monroy, Raul
APPLIED SCIENCES-BASEL, 2021, 11 (08):
[37] A data labeling method for clustering categorical data
Cao, Fuyuan
Liang, Jiye
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (03) : 2381 - 2385
[38] Ordering of categorical data in hierarchical clustering
Kazimianec, Michail
DATABASES AND INFORMATION SYSTEMS, 2008, : 401 - 404
[39] Formulations of fuzzy clustering for categorical data
Umayahara, Kazutaka
Miyamoto, Sadaaki
Nakamori, Yoshiteru
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2005, 1 (01): : 83 - 94
[40] Summarizing categorical data by clustering attributes
Mampaey, Michael
Vreeken, Jilles
DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 26 (01) : 130 - 173

← 1 2 3 4 5 →