An Incremental Clustering with Attribute Unbalance Considered for Categorical Data

被引:0
作者
Chen, Jize [2 ]
Yang, Zhimin [1 ]
Yin, Jian [2 ]
Yang, Xiaobo [1 ]
Huang, Li [1 ]
机构
[1] Guangzhou Univ Chinese Med, Affiliated Hosp 2, Guangzhou 510120, Peoples R China
[2] Sun Yat Sen Univ, Sch Informat Sci & Technol, Guangzhou 510275, Guangdong, Peoples R China
来源
COMPUTATIONAL INTELLIGENCE AND INTELLIGENT SYSTEMS | 2009年 / 51卷
基金
中国国家自然科学基金;
关键词
incremental clustering; attribute unbalance; categorical data;
D O I
10.1007/978-3-642-04962-0_50
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering analysis is an important technique used in many fields. But traditional clustering algorithms generally deal with numeric data. While clustering categorical data have always attracted researchers' attentions because of their prevalence in real life. This paper analyses limitations of the categorical clustering algorithms proposed. Based on two observations, a new similarity measure is proposed for categorical data which considers the unbalance of attributes. As the data are getting much larger and more dynamic, incremental is an important quality of good clustering algorithms. The clustering algorithm present is an incremental with linear computing complexity. The experiment results indicate that it outperforms other categorical clustering algorithms referred in the paper.
引用
收藏
页码:433 / +
页数:2
相关论文
共 50 条
  • [31] DHCC: Divisive hierarchical clustering of categorical data
    Xiong, Tengke
    Wang, Shengrui
    Mayers, Andre
    Monga, Ernest
    DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 24 (01) : 103 - 135
  • [32] DHCC: Divisive hierarchical clustering of categorical data
    Tengke Xiong
    Shengrui Wang
    André Mayers
    Ernest Monga
    Data Mining and Knowledge Discovery, 2012, 24 : 103 - 135
  • [33] On clustering massive text and categorical data streams
    Charu C. Aggarwal
    Philip S. Yu
    Knowledge and Information Systems, 2010, 24 : 171 - 196
  • [34] Parallel Hierarchical Subspace Clustering of Categorical Data
    Pang, Ning
    Zhang, Jifu
    Zhang, Chaowei
    Qin, Xiao
    IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (04) : 542 - 555
  • [35] Clustering categorical data based on distance vectors
    Zhang, P
    Wang, XG
    Song, PXK
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) : 355 - 367
  • [36] Subspace Clustering with Feature Grouping for Categorical Data
    Jia, Hong
    Dong, Menghan
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, KSEM 2023, 2023, 14117 : 247 - 254
  • [37] Squeezer: An efficient algorithm for clustering categorical data
    Zengyou He
    Xiaofei Xu
    Shengchun Deng
    Journal of Computer Science and Technology, 2002, 17 : 611 - 624
  • [38] EnsCat: clustering of categorical data via ensembling
    Bertrand S. Clarke
    Saeid Amiri
    Jennifer L. Clarke
    BMC Bioinformatics, 17
  • [39] Improved Clustering for Categorical Data with Genetic Algorithm
    Sharma, Abha
    Thakur, R. S.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MICROELECTRONICS, COMPUTING & COMMUNICATION SYSTEMS, MCCS 2015, 2018, 453 : 67 - 76
  • [40] The Performance of Objective Functions for Clustering Categorical Data
    Xiang, Zhengrong
    Islam, Md Zahidul
    KNOWLEDGE MANAGEMENT AND ACQUISITION FOR SMART SYSTEMS AND SERVICES, PKAW 2014, 2014, 8863 : 16 - 28