A data labeling method for clustering categorical data

被引:14
|
作者
Cao, Fuyuan
Liang, Jiye [1 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Shanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Data labeling; Categorical data; Rough membership function; Similarity measure; K-MEANS ALGORITHM;
D O I
10.1016/j.eswa.2010.08.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the size of data growing at a rapid pace, clustering a very large data set inevitably incurs a time-consuming process. To improve the efficiency of clustering, sampling is usually used to scale down the size of data set. However, with sampling applied, how to allocate unlabeled objects into proper clusters is a very difficult problem. In this paper, based on the frequency of attribute values in a given cluster and the distributions of attribute values in different clusters, a novel similarity measure is proposed to allocate each unlabeled object into the corresponding appropriate cluster for clustering categorical data. Furthermore, a labeling algorithm for categorical data is presented, and its corresponding time complexity is analyzed as well. The effectiveness of the proposed algorithm is shown by the experiments on real-world data sets. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2381 / 2385
页数:5
相关论文
共 50 条
  • [21] Clustering categorical data in projected spaces
    Bouguessa, Mohamed
    DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 29 (01) : 3 - 38
  • [22] Clustering Categorical Data Using Rough Membership Function
    Kumar, B. Suresh
    Reddy, H. Venkateswara
    Raju, T. Ankamma
    Vennam, Preethi
    2014 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS, 2014, : 602 - 607
  • [23] Clustering Categorical Data via Ensembling Dissimilarity Matrices
    Amiri, Saeid
    Clarke, Bertrand S.
    Clarke, Jennifer L.
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2018, 27 (01) : 195 - 208
  • [24] Fuzzy rough clustering for categorical data
    Xu, Shuliang
    Liu, Shenglan
    Zhou, Jian
    Feng, Lin
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (11) : 3213 - 3223
  • [25] Mining categorical sequences from data using a hybrid clustering method
    De Angelis, Luca
    Dias, Jose G.
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2014, 234 (03) : 720 - 730
  • [26] Incremental Clustering for Categorical Data Using Clustering Ensemble
    Li Taoying
    Chne Yan
    Qu Lili
    Mu Xiangwei
    PROCEEDINGS OF THE 29TH CHINESE CONTROL CONFERENCE, 2010, : 2519 - 2524
  • [27] A fuzzy k-modes algorithm for clustering categorical data
    Huang, ZX
    Ng, MK
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 1999, 7 (04) : 446 - 452
  • [28] An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data
    Bai, Liang
    Liang, Jiye
    Dang, Chuangyin
    KNOWLEDGE-BASED SYSTEMS, 2011, 24 (06) : 785 - 795
  • [29] An entropy-based subspace clustering algorithm for categorical data
    Carbonera, Joel Luis
    Abel, Mara
    2014 IEEE 26TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2014, : 272 - 277
  • [30] Kernel Subspace Clustering Algorithm for Categorical Data
    Xu K.-P.
    Chen L.-F.
    Sun H.-J.
    Wang B.-Z.
    Ruan Jian Xue Bao/Journal of Software, 2020, 31 (11): : 3492 - 3505