A data labeling method for clustering categorical data

被引：14

作者：

Cao, Fuyuan

Liang, Jiye ^{[1
]}

机构：

[1] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Shanxi, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2011年 / 38卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Data labeling; Categorical data; Rough membership function; Similarity measure; K-MEANS ALGORITHM;

D O I：

10.1016/j.eswa.2010.08.026

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As the size of data growing at a rapid pace, clustering a very large data set inevitably incurs a time-consuming process. To improve the efficiency of clustering, sampling is usually used to scale down the size of data set. However, with sampling applied, how to allocate unlabeled objects into proper clusters is a very difficult problem. In this paper, based on the frequency of attribute values in a given cluster and the distributions of attribute values in different clusters, a novel similarity measure is proposed to allocate each unlabeled object into the corresponding appropriate cluster for clustering categorical data. Furthermore, a labeling algorithm for categorical data is presented, and its corresponding time complexity is analyzed as well. The effectiveness of the proposed algorithm is shown by the experiments on real-world data sets. (C) 2010 Elsevier Ltd. All rights reserved.

引用

页码：2381 / 2385

页数：5

共 50 条

[31] The performance of objective functions for clustering categorical data
Xiang, Zhengrong
Islam, Md Zahidul
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8863 : 16 - 28
[32] Generalized Similarity Measure for Categorical Data Clustering
Sharma, Shruti
Singh, Manoj
2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 765 - 769
[33] EnsCat: clustering of categorical data via ensembling
Clarke, Bertrand S.
Amiri, Saeid
Clarke, Jennifer L.
BMC BIOINFORMATICS, 2016, 17
[34] A hierarchical clustering algorithm for categorical sequence data
Oh, SJ
Kim, JY
INFORMATION PROCESSING LETTERS, 2004, 91 (03) : 135 - 140
[35] Squeezer: An efficient algorithm for clustering categorical data
He, ZY
Xu, XF
Deng, SC
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2002, 17 (05) : 611 - 624
[36] On clustering massive text and categorical data streams
Aggarwal, Charu C.
Yu, Philip S.
KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 24 (02) : 171 - 196
[37] DHCC: Divisive hierarchical clustering of categorical data
Xiong, Tengke
Wang, Shengrui
Mayers, Andre
Monga, Ernest
DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 24 (01) : 103 - 135
[38] DHCC: Divisive hierarchical clustering of categorical data
Tengke Xiong
Shengrui Wang
André Mayers
Ernest Monga
Data Mining and Knowledge Discovery, 2012, 24 : 103 - 135
[39] On clustering massive text and categorical data streams
Charu C. Aggarwal
Philip S. Yu
Knowledge and Information Systems, 2010, 24 : 171 - 196
[40] Parallel Hierarchical Subspace Clustering of Categorical Data
Pang, Ning
Zhang, Jifu
Zhang, Chaowei
Qin, Xiao
IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (04) : 542 - 555

← 1 2 3 4 5 →