Adapting k-means for supervised clustering

被引:52
作者
Al-Harbi, SH
Rayward-Smith, VJ
机构
[1] Informat Ctr, Riyadh 11485, Saudi Arabia
[2] Univ E Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England
关键词
classification; supervised clustering; weighted metrics; simulated annealing; supervised k-means;
D O I
10.1007/s10489-006-8513-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
k-means is traditionally viewed as an algorithm for the unsupervised clustering of a heterogeneous population into a number of more homogeneous groups of objects. However, it is not necessarily guaranteed to group the same types (classes) of objects together. In such cases, some supervision is needed to partition objects which have the same label into one cluster. This paper demonstrates how the popular k-means clustering algorithm can be profitably modified to be used as a classifier algorithm. The output field itself cannot be used in the clustering but it is used in developing a suitable metric defined on other fields. The proposed algorithm combines Simulated Annealing with the modified k-means algorithm. We apply the proposed algorithm to real data sets, and compare the output of the resultant classifier to that of C4.5.
引用
收藏
页码:219 / 226
页数:8
相关论文
共 27 条
  • [1] Al-Harbi SH, 2003, LECT NOTES ARTIF INT, V2718, P575
  • [2] [Anonymous], 1990, FINDING GROUPS IN DA
  • [3] AYAN NF, 1999, P 8 TURK S ART INT N
  • [4] Basu S., 2002, P 19 INT C MACH LEAR, V2, P27
  • [5] Berry MichaelJ., 1997, DATA MINING TECHNIQU
  • [6] Berson A., 1999, BUILDING DATA MINING
  • [7] Randomizing outputs to increase prediction accuracy
    Breiman, L
    [J]. MACHINE LEARNING, 2000, 40 (03) : 229 - 242
  • [8] BRITTAIN D, 1999, THESIS U BRISTOL UK
  • [9] BURGESS M, 2003, P ICANNGA C 2003, P249
  • [10] COHN D, 2003, SEMI SUPERVISED CLUS