Generalized k-means algorithm on nominal dataset

被引：0

作者：

Al-Harbi, S. H. ^{[1
]}

Al-Shahri, A. M. ^{[1
]}

机构：

[1] Ctr Informat Technol, Riyadh, Saudi Arabia

来源：

DATA MINING IX: DATA MINING, PROTECTION, DETECTION AND OTHER SECURITY TECHNOLOGIES | 2008年 / 40卷

关键词：

clustering; data mining; mahalanobis metric; D-CV metric; hamming metric; k-means;

D O I：

10.2495/DATA080051

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Clustering has typically been a problem related to continuous fields. However, in data mining, often the data values are nominal and cannot be assigned meaningful continuous substitutes. The largest advantage of the k-means algorithm in data mining applications is its efficiency in clustering large data sets. The k-means algorithm usually uses the simple Euclidean metric which is only suitable for hyperspherical clusters, and its use is limited to numeric data. This paper extends our work on the D-CV metric which was introduced to deal with nominal data, and then demonstrates how the popular k-means clustering algorithm can be profitably modified to deal with the D-CV metric. Having adapted the k-means algorithm, the D-CV metric will be implemented and the results examined. With this development.

引用

页码：43 / 51

页数：9