A data labeling method for clustering categorical data

被引:14
|
作者
Cao, Fuyuan
Liang, Jiye [1 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Shanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Data labeling; Categorical data; Rough membership function; Similarity measure; K-MEANS ALGORITHM;
D O I
10.1016/j.eswa.2010.08.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the size of data growing at a rapid pace, clustering a very large data set inevitably incurs a time-consuming process. To improve the efficiency of clustering, sampling is usually used to scale down the size of data set. However, with sampling applied, how to allocate unlabeled objects into proper clusters is a very difficult problem. In this paper, based on the frequency of attribute values in a given cluster and the distributions of attribute values in different clusters, a novel similarity measure is proposed to allocate each unlabeled object into the corresponding appropriate cluster for clustering categorical data. Furthermore, a labeling algorithm for categorical data is presented, and its corresponding time complexity is analyzed as well. The effectiveness of the proposed algorithm is shown by the experiments on real-world data sets. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2381 / 2385
页数:5
相关论文
共 50 条
  • [31] The performance of objective functions for clustering categorical data
    Xiang, Zhengrong
    Islam, Md Zahidul
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8863 : 16 - 28
  • [32] Generalized Similarity Measure for Categorical Data Clustering
    Sharma, Shruti
    Singh, Manoj
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 765 - 769
  • [33] EnsCat: clustering of categorical data via ensembling
    Clarke, Bertrand S.
    Amiri, Saeid
    Clarke, Jennifer L.
    BMC BIOINFORMATICS, 2016, 17
  • [34] A hierarchical clustering algorithm for categorical sequence data
    Oh, SJ
    Kim, JY
    INFORMATION PROCESSING LETTERS, 2004, 91 (03) : 135 - 140
  • [35] Squeezer: An efficient algorithm for clustering categorical data
    He, ZY
    Xu, XF
    Deng, SC
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2002, 17 (05) : 611 - 624
  • [36] On clustering massive text and categorical data streams
    Aggarwal, Charu C.
    Yu, Philip S.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 24 (02) : 171 - 196
  • [37] DHCC: Divisive hierarchical clustering of categorical data
    Xiong, Tengke
    Wang, Shengrui
    Mayers, Andre
    Monga, Ernest
    DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 24 (01) : 103 - 135
  • [38] DHCC: Divisive hierarchical clustering of categorical data
    Tengke Xiong
    Shengrui Wang
    André Mayers
    Ernest Monga
    Data Mining and Knowledge Discovery, 2012, 24 : 103 - 135
  • [39] On clustering massive text and categorical data streams
    Charu C. Aggarwal
    Philip S. Yu
    Knowledge and Information Systems, 2010, 24 : 171 - 196
  • [40] Parallel Hierarchical Subspace Clustering of Categorical Data
    Pang, Ning
    Zhang, Jifu
    Zhang, Chaowei
    Qin, Xiao
    IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (04) : 542 - 555