Complexity analysis and practical resolution of the data classification problem with private characteristics

被引:0
作者
David Pantoja [1 ]
Ismael Rodríguez [2 ]
Fernando Rubio [2 ]
Clara Segura [1 ]
机构
[1] Universidad Complutense de Madrid,Facultad de Informática
[2] Universidad Complutense de Madrid,Instituto de Tecnología del Conocimiento, Facultad de Informática
关键词
Computational complexity; NP-completeness; Data classification; Decision trees; Selection processes; Privacy preservation; 68Q17; 68W25;
D O I
10.1007/s40747-025-01911-y
中图分类号
学科分类号
摘要
In this work we analyze the problem of, given the probability distribution of a population, questioning an unknown individual that is representative of the distribution so that our uncertainty about certain characteristics is significantly reduced—but the uncertainty about others, deemed private or sensitive, is not. Thus, the goal of the problem is extracting information being relevant to a legitimate purpose while preserving the privacy of individuals, which is crucial to enable non-intrusive selection processes in several areas. For instance, it is essential in the design of non-discriminatory personnel selection, promotion, and layoff processes in companies and institutions; in the retrieval of customer information being relevant to the service provided by a company (and no more); in certifications not revealing sensitive industrial information being irrelevant for the certification itself; etc. Interactive questioning processes are constructed for this purpose, which requires generalizing the notion of decision trees to account the amount of desired and undesired information retrieved for each branch of the plan. Our findings about this problem are both theoretical and practical: on the one hand, we prove its NP-completeness by a reduction from the Set Cover problem; and on the other hand, given this intractability, we provide heuristic solutions to find reasonable solutions in affordable time. In particular, a greedy algorithm and two genetic algorithms are presented. Our experiments indicate that the best results are obtained using a genetic algorithm reinforced with a greedy strategy.
引用
收藏
相关论文
empty
未找到相关数据