A data labeling method for clustering categorical data

被引:14
|
作者
Cao, Fuyuan
Liang, Jiye [1 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Shanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Data labeling; Categorical data; Rough membership function; Similarity measure; K-MEANS ALGORITHM;
D O I
10.1016/j.eswa.2010.08.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the size of data growing at a rapid pace, clustering a very large data set inevitably incurs a time-consuming process. To improve the efficiency of clustering, sampling is usually used to scale down the size of data set. However, with sampling applied, how to allocate unlabeled objects into proper clusters is a very difficult problem. In this paper, based on the frequency of attribute values in a given cluster and the distributions of attribute values in different clusters, a novel similarity measure is proposed to allocate each unlabeled object into the corresponding appropriate cluster for clustering categorical data. Furthermore, a labeling algorithm for categorical data is presented, and its corresponding time complexity is analyzed as well. The effectiveness of the proposed algorithm is shown by the experiments on real-world data sets. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2381 / 2385
页数:5
相关论文
共 50 条
  • [1] A Roughset Based Data Labeling Method for Clustering Categorical Data
    Reddy, H. Venkateswara
    Raju, S. Viswanadha
    2014 3RD INTERNATIONAL CONFERENCE ON ECO-FRIENDLY COMPUTING AND COMMUNICATION SYSTEMS (ICECCS 2014), 2014, : 51 - 55
  • [2] On data labeling for clustering categorical data
    Chen, Hung-Leng
    Chuang, Kun-Ta
    Chen, Ming-Syan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (11) : 1458 - 1471
  • [3] Data Labeling method based on Rough Entropy for Categorical Data Clustering
    Sreenivasulu, G.
    Raju, S. Viswanadha
    Rao, N. Sambasiva
    2014 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATION AND COMPUTATIONAL ENGINEERING (ICECCE), 2014, : 173 - 178
  • [4] Data Labeling method based on Cluster Purity using Relative Rough Entropy for Categorical Data Clustering
    Reddy, H. Venkateswara
    Raju, S. Viswanadha
    Agrawal, Pratibha
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 500 - 506
  • [5] A Data Labeling method for Categorical Data Clustering using Cluster Entropies in Rough Sets
    Reddy, H. Venkateswara
    Kumar, B. Suresh
    Raju, S. Viswanadha
    2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 444 - 449
  • [6] Ordering of categorical data in hierarchical clustering
    Kazimianec, Michail
    DATABASES AND INFORMATION SYSTEMS, 2008, : 401 - 404
  • [7] Space Structure and Clustering of Categorical Data
    Qian, Yuhua
    Li, Feijiang
    Liang, Jiye
    Liu, Bing
    Dang, Chuangyin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (10) : 2047 - 2059
  • [8] Clustering categorical data streams
    He, Zengyou
    Xu, Xiaofei
    Deng, Shengchun
    Huang, Joshua Zhexue
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2011, 11 (04) : 185 - 192
  • [9] A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA
    Ohn Mar San
    Van-Nam Huynh
    Yoshiteru Nakamori
    JournalofSystemsScienceandComplexity, 2003, (04) : 562 - 571
  • [10] A method for k-means-like clustering of categorical data
    Nguyen T.-H.T.
    Dinh D.-T.
    Sriboonchitta S.
    Huynh V.-N.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (11) : 15011 - 15021