On clustering tree structured data with categorical nature

被引:13
|
作者
Boutsinas, B. [1 ,2 ]
Papastergiou, T. [2 ]
机构
[1] Univ Patras, Dept Business Adm, GR-26500 Rion, Greece
[2] Univ Patras, Artificial Intelligence Res Ctr, GR-26500 Rion, Greece
关键词
clustering; (dis)similarity measures; data mining;
D O I
10.1016/j.patcog.2008.05.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering consists in partitioning a set of objects into disjoint and homogeneous clusters. For many years, clustering methods have been applied in a wide variety of disciplines and they also have been utilized in many scientific areas. Traditionally, clustering methods deal with numerical data, i.e. objects represented by a conjunction of numerical attribute Values. However, nowadays commercial or scientific databases Usually contain categorical data, i.e. objects represented by categorical attributes. In this paper we present a dissimilarity measure which is capable to deal with tree structured categorical data. Thus, it can be used for extending the various versions of the very popular k-means clustering algorithm to deal with Such data. We discuss how such an extension can be achieved. Moreover, we empirically prove that the proposed dissimilarity measure is accurate, compared to other well-known (dis)similarity measures for categorical data. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:3613 / 3623
页数:11
相关论文
共 50 条
  • [1] Clustering of Tree-structured Data
    Lu, Na
    Wu, Yidan
    2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 1210 - 1215
  • [2] Clustering Tree-Structured Data on Manifold
    Lu, Na
    Miao, Hongyu
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (10) : 1956 - 1968
  • [3] Clustering categorical data streams
    He, Zengyou
    Xu, Xiaofei
    Deng, Shengchun
    Huang, Joshua Zhexue
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2011, 11 (04) : 185 - 192
  • [4] Clustering Categorical Data Based on Representatives
    Aranganayagi, S.
    Thangavel, K.
    THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, : 599 - +
  • [5] Incremental Clustering for Categorical Data Using Clustering Ensemble
    Li Taoying
    Chne Yan
    Qu Lili
    Mu Xiangwei
    PROCEEDINGS OF THE 29TH CHINESE CONTROL CONFERENCE, 2010, : 2519 - 2524
  • [6] Squeezer: An efficient algorithm for clustering categorical data
    He, ZY
    Xu, XF
    Deng, SC
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2002, 17 (05) : 611 - 624
  • [7] Squeezer: An efficient algorithm for clustering categorical data
    Zengyou He
    Xiaofei Xu
    Shengchun Deng
    Journal of Computer Science and Technology, 2002, 17 : 611 - 624
  • [8] Clustering Categorical Data:A Cluster Ensemble Approach
    何增友
    High Technology Letters, 2003, (04) : 8 - 12
  • [9] On data labeling for clustering categorical data
    Chen, Hung-Leng
    Chuang, Kun-Ta
    Chen, Ming-Syan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (11) : 1458 - 1471
  • [10] Coercion: A Distributed Clustering Algorithm for Categorical Data
    Wang, Bin
    Zhou, Yang
    Hei, Xinhong
    2013 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2013, : 683 - 687