A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set

被引：79

作者：

Ahmad, Amir ^{[1
]}

Dey, Lipika

机构：

[1] Solid State Phys Lab, MEMS Grp, Delhi 54, India

[2] Indian Inst Technol, Dept Math, Delhi 16, India

来源：

PATTERN RECOGNITION LETTERS | 2007年 / 28卷 / 01期

关键词：

categorical data; similarity; unsupervised learning; co-occurrences;

D O I：

10.1016/j.patrec.2006.06.006

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Computation of similarity between categorical data objects in unsupervised learning is an important data mining problem. We propose a method to compute distance between two attribute values of same attribute for unsupervised learning. This approach is based on the fact that similarity of two attribute values is dependent on their relationship with other attributes. Computational cost of this method is linear with respect to number of data objects in data set. To see the effectiveness of our proposed distance measure, we use proposed distance measure with K-mode clustering algorithm to cluster various categorical data sets. Significant improvement in clustering accuracy is observed as compared to clustering results obtained using traditional K-mode clustering algorithm. (c) 2006 Elsevier B.V. All rights reserved.

引用

页码：110 / 118

页数：9

共 23 条

[1]

Agarwal R, 1993, ACM SIGMOD C MAN DAT, P207

[2]

Agrawal R., 1994, Proceedings of the 20th International Conference on Very Large Data Bases. VLDB'94, P487

[3]

AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759

[4] A feature selection technique for classificatory analysis [J].

Ahmad, A ;

Dey, L .

PATTERN RECOGNITION LETTERS, 2005, 26 (01) :43-56

[5]

[Anonymous], P 9 ACM C HYP HYP

[6]

[Anonymous], 1999, IR0199 LAB INF AV

[7]

[Anonymous], P 4 INT C FDN DAT OR

[8]

Bock HH, 2002, ANAL SYMBOLIC DATA, P139

[9] A WEIGHTED NEAREST NEIGHBOR ALGORITHM FOR LEARNING WITH SYMBOLIC FEATURES [J].

COST, S ;

SALZBERG, S .

MACHINE LEARNING, 1993, 10 (01) :57-78

[10]

Das G, 2000, LECT NOTES COMPUT<D>, V1910, P201

← 1 2 3 →