Constrained clustering by constraint programming

被引:49
作者
Thi-Bich-Hanh Dao [1 ]
Khanh-Chuong Duong [1 ]
Vrain, Christel [1 ]
机构
[1] Univ Orleans, INSA Ctr Val Loire, LIFO, EA 4022, F-45067 Orleans, France
关键词
Constrained clustering; Bi-criterion clustering; Constraint programming; Modeling; Global optimization constraint; Filtering algorithm;
D O I
10.1016/j.artint.2015.05.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Constrained Clustering allows to make the clustering task more accurate by integrating user constraints, which can be instance-level or cluster-level constraints. Few works consider the integration of different kinds of constraints, they are usually based on declarative frameworks and they are often exact methods, which either enumerate all the solutions satisfying the user constraints, or find a global optimum when an optimization criterion is specified. In a previous work, we have proposed a model for Constrained Clustering based on a Constraint Programming framework. It is declarative, allowing a user to integrate user constraints and to choose an optimization criterion among several ones. In this article we present a new and substantially improved model for Constrained Clustering, still based on a Constraint Programming framework. It differs from our earlier model in the way partitions are represented by means of variables and constraints. It is also more flexible since the number of clusters does not need to be set beforehand; only a lower and an upper bound on the number of clusters have to be provided. In order to make the model-based approach more efficient, we propose new global optimization constraints with dedicated filtering algorithms. We show that such a framework can easily be embedded in a more general process and we illustrate this on the problem of finding the optimal Pareto front of a bi-criterion constrained clustering task. We compare our approach with existing exact approaches, based either on a branch-and-bound approach or on graph coloring on twelve datasets. Experiments show that the model outperforms exact approaches in most cases. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:70 / 94
页数:25
相关论文
共 54 条
[11]   Optimal Correlation Clustering via MaxSAT [J].
Berg, Jeremias ;
Jarvisalo, Matti .
2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2013, :750-757
[12]  
Bessiere C, 2009, LECT NOTES COMPUT SC, V5732, P173, DOI 10.1007/978-3-642-04244-7_16
[13]   Knowledge Compilation for Itemset Mining [J].
Cambazard, Hadrien ;
Hadzic, Tarik ;
O'Sullivan, Barry .
ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2010, 215 :1109-+
[14]   REVIEW OF CLASSIFICATION [J].
CORMACK, RM .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-GENERAL, 1971, 134 :321-+
[15]  
Davidson I, 2005, LECT NOTES ARTIF INT, V3721, P59
[16]   The complexity of non-hierarchical clustering with instance and cluster level constraints [J].
Davidson, Ian ;
Ravi, S. S. .
DATA MINING AND KNOWLEDGE DISCOVERY, 2007, 14 (01) :25-61
[17]  
Davidson I, 2005, SIAM PROC S, P138
[18]  
De Raedt L, 2010, AAAI CONF ARTIF INTE, P1671
[19]  
De Raedt Luc, 2008, P 14 ACM SIGKDD INT, P204
[20]   BICRITERION CLUSTER-ANALYSIS [J].
DELATTRE, M ;
HANSEN, P .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1980, 2 (04) :277-291