Data Labeling method based on Cluster Purity using Relative Rough Entropy for Categorical Data Clustering

被引:0
作者
Reddy, H. Venkateswara [1 ]
Raju, S. Viswanadha [2 ]
Agrawal, Pratibha [3 ]
机构
[1] Vardhaman Coll Engn, Dept Comp Sci & Engn, Hyderabad, Andhra Pradesh, India
[2] JNTUH Coll Engn, Dept Comp Sci & Engn, Nachupally, India
[3] Univ Delhi, Dept Comp Sci & Engn, New Delhi, India
来源
2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI) | 2013年
关键词
categorical Data; Clustering; Data Labeling; Outlier; Entropy; Rough set; Cluster Purity;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering is an important technique in data mining. Clustering a large data set is difficult and time consuming. An approach called data labeling has been suggested for clustering large databases using sampling technique to improve efficiency of clustering. A sampled data is selected randomly for initial clustering and data points which are not sampled and unclustered are given cluster label or an outlier based on various data labeling techniques. Data labeling is an easy task in numerical domain because it is performed based on distance between a cluster and an unlabeled data point. However, in categorical domain since the distance is not defined properly between data points and between data point with cluster, then data labeling is a difficult task for categorical data. In this paper, we have proposed a method for data labeling using Relative Rough Entropy for clustering categorical data. The concept of entropy, introduced by Shannon with particular reference to information theory is a powerful mechanism for the measurement of uncertainty information. In this method, data labeling is performed by integrating entropy with rough sets. In this paper, the cluster purity is also used for outlier detection. The experimental results show that the efficiency and clustering quality of this algorithm are better than the previous algorithms.
引用
收藏
页码:500 / 506
页数:7
相关论文
共 50 条
  • [21] Holo-Entropy Based Categorical Data Hierarchical Clustering
    Sun, Haojun
    Chen, Rongbo
    Qin, Yong
    Wang, Shengrui
    INFORMATICA, 2017, 28 (02) : 303 - 328
  • [22] Incremental entropy-based clustering on categorical data streams with concept drift
    Li, Yanhong
    Li, Deyu
    Wang, Suge
    Zhai, Yanhui
    KNOWLEDGE-BASED SYSTEMS, 2014, 59 : 33 - 47
  • [23] Clustering Categorical Data Based on Representatives
    Aranganayagi, S.
    Thangavel, K.
    THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, : 599 - +
  • [24] Efficiency Based Categorical Data Clustering
    Kalaivani, K.
    Raghavendra, A. P. V.
    2012 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2012, : 550 - 553
  • [25] MMR: An algorithm for clustering categorical data using Rough Set Theory
    Parmar, Darshit
    Wu, Teresa
    Blackhurst, Jennifer
    DATA & KNOWLEDGE ENGINEERING, 2007, 63 (03) : 879 - 893
  • [26] Performance Analysis of Various Entropy Measures in Categorical Data Clustering
    Sharma, Shachi
    Pemo, Sonam
    2020 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2020), 2020, : 592 - 595
  • [27] Detecting outliers in categorical data through rough clustering
    Suri, N. N. R. Ranga
    Murty, M. Narasimha
    Athithan, G.
    NATURAL COMPUTING, 2016, 15 (03) : 385 - 394
  • [28] Detecting outliers in categorical data through rough clustering
    N. N. R. Ranga Suri
    M. Narasimha Murty
    G. Athithan
    Natural Computing, 2016, 15 : 385 - 394
  • [29] Integrated Rough Fuzzy Clustering for Categorical data Analysis
    Saha, Indrajit
    Sarkar, Jnanendra Prasad
    Maulik, Ujjwal
    FUZZY SETS AND SYSTEMS, 2019, 361 : 1 - 32
  • [30] Mining categorical sequences from data using a hybrid clustering method
    De Angelis, Luca
    Dias, Jose G.
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2014, 234 (03) : 720 - 730