Generalized k-means algorithm on nominal dataset

被引:0
作者
Al-Harbi, S. H. [1 ]
Al-Shahri, A. M. [1 ]
机构
[1] Ctr Informat Technol, Riyadh, Saudi Arabia
来源
DATA MINING IX: DATA MINING, PROTECTION, DETECTION AND OTHER SECURITY TECHNOLOGIES | 2008年 / 40卷
关键词
clustering; data mining; mahalanobis metric; D-CV metric; hamming metric; k-means;
D O I
10.2495/DATA080051
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering has typically been a problem related to continuous fields. However, in data mining, often the data values are nominal and cannot be assigned meaningful continuous substitutes. The largest advantage of the k-means algorithm in data mining applications is its efficiency in clustering large data sets. The k-means algorithm usually uses the simple Euclidean metric which is only suitable for hyperspherical clusters, and its use is limited to numeric data. This paper extends our work on the D-CV metric which was introduced to deal with nominal data, and then demonstrates how the popular k-means clustering algorithm can be profitably modified to deal with the D-CV metric. Having adapted the k-means algorithm, the D-CV metric will be implemented and the results examined. With this development.
引用
收藏
页码:43 / 51
页数:9
相关论文
共 14 条
  • [1] ALHARBI A, 2003, STAT DATA MINING KNO, P339
  • [2] ALHARBI A, THESIS U E ANGLIA
  • [3] [Anonymous], 1976, DIGITAL PATTERN RECO
  • [4] [Anonymous], 1990, FINDING GROUPS IN DA
  • [5] Copson E.T., 1968, Metric Spaces
  • [6] DANIEL F, 1999, ANAL RECENT WORK CLU
  • [7] ETZIONI O, 1998, P 21 ANN INT ACM SIG, P46
  • [8] FREITAS AA, 2001, ADV EVOLUTIONARY COM
  • [9] Hartuv Erez., 1999, RECOMB, P188
  • [10] Huang Z., 1997, PROC 1 PACIFIC ASIA, P21, DOI DOI 10.4236/OJS.2017.72013