Clustering With Constraints: Feasibility Issues and the k-Means Algorithm

被引:0
作者
Davidson, Ian [1 ]
Ravi, S. S. [1 ]
机构
[1] SUNY Albany, Dept Comp Sci, Albany, NY 12222 USA
来源
PROCEEDINGS OF THE FIFTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING | 2005年
关键词
k-Means clustering; constraints;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent work has looked at extending the k-Means algorithm to incorporate background information in the form of instance level must-link and cannot-link constraints. We introduce two ways of specifying additional background information in the form of delta and epsilon constraints that operate on all instances but which can be interpreted as conjunctions or disjunctions of instance level constraints and hence are easy to implement. We present complexity results for the feasibility of clustering under each type of constraint individually and several types together. A key finding is that determining whether there is a feasible solution satisfying all constraints is, in general, NP-complete. Thus, an iterative algorithm such as k-Means should not try to find a feasible partitioning at each iteration. This motivates our derivation of a new version of the k-Means algorithm that minimizes the constrained vector quantization error but at each iteration does not attempt to satisfy all constraints. Using standard UCI datasets, we find that using constraints improves accuracy as others have reported, but we also show that our algorithm reduces the number of iterations until convergence. Finally, we illustrate these benefits and our new constraint types on a complex real world object identification problem using the infra-red detector on an Aibo robot.
引用
收藏
页码:138 / 149
页数:12
相关论文
共 19 条
[1]  
[Anonymous], INT C KNOWL DISC DAT, DOI DOI 10.1145/1014052.1014062
[2]  
Basu S., 2002, P 19 INT C MACHINE L, P19, DOI [10.5555/645531.656012, DOI 10.5555/645531.656012]
[3]  
Basu S., P 4 SIAM INT C DAT M
[4]  
Bottou L., 1995, Advances in Neural Information Processing Systems 7, P585
[5]  
Bradley P. S., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P9
[6]  
Campers G., 1987, P OP RES 87 BUEN AIR, P917
[7]  
Cormen T. H., 2001, Introduction to Algorithms, V2nd
[8]  
Davidson I., 2003, P IEEE ICDM 2003 WOR, P16
[9]   PLANAR 3DM IS NP-COMPLETE [J].
DYER, ME ;
FRIEZE, AM .
JOURNAL OF ALGORITHMS, 1986, 7 (02) :174-184
[10]  
Ester M., 1996, 2 INT C KNOWL DISCOV, P226, DOI DOI 10.5555/3001460.3001507