A new approach to generate diversified clusters for small data sets

被引:2
作者
Peng, Chun-Cheng [1 ]
Tsai, Cheng-Jung [2 ]
Chang, Ting-Yi [3 ]
Yeh, Jen-Yuan [4 ]
Hua, Po-Wei [2 ]
机构
[1] Chaoyang Univ Technol, Dept Informat & Commun Engn, Taichung, Taiwan
[2] Natl Changhua Univ Educ, Grad Inst Stat & Informat Sci, Changhua, Taiwan
[3] Natl Changhua Univ Educ, Dept Ind Educ & Technol, Changhua, Taiwan
[4] Natl Museum Nat Sci, Dept Operat Visitor Serv Collect & Informat Manag, Taichung, Taiwan
关键词
Data mining; Clustering; Homogeneous; Heterogeneous; Diversified clusters; BIG DATA; SYSTEM;
D O I
10.1016/j.asoc.2020.106564
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is a common data mining technique whose main principle states that the samples within a cluster are similar to one another and dissimilar to those in other clusters. This means that samples in the same cluster possess high homogeneity, while different clusters possess high heterogeneity. However, a user may require a result of diversified clustering. Compared to traditional clustering methods, the aim of diversified clustering is to make samples of the same cluster possess high heterogeneity, and different clusters possess high homogeneity. Diversified clustering can be practically applied to aspects of our daily lives such as normal class grouping, student grouping in learning, cluster sampling, balanced diets and assignment of jobs. Nevertheless, our survey of related papers in the research field of data mining found that there has been no proposed research for diversified clustering. In this paper, we formal define the problem of diversified clustering and propose a new method to solve this problem. Experimental results showed that our method can generate good diversified clustering. However, our method is currently only appropriate for small data sets since the execution time of our method increases quickly as the number of diversified clusters increases. We also hope this paper will garner interest in more research on effective methods to generate diversified clusters for use in data mining. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 36 条
  • [1] Coordinated Robot Navigation via Hierarchical Clustering
    Arslan, Omur
    Guralnik, Dan P.
    Koditschek, Daniel E.
    [J]. IEEE TRANSACTIONS ON ROBOTICS, 2016, 32 (02) : 352 - 371
  • [2] Mining bridge and brick motifs from complex biological networks for functionally and statistically significant discovery
    Cheng, Chia-Ying
    Huang, Chung-Yuan
    Sun, Chuen-Tsai
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (01): : 17 - 24
  • [3] Model-Based Clustering by Probabilistic Self-Organizing Maps
    Cheng, Shih-Sian
    Fu, Hsin-Chia
    Wang, Hsin-Min
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (05): : 805 - 826
  • [4] Mining Sequential Risk Patterns From Large-Scale Clinical Databases for Early Assessment of Chronic Diseases: A Case Study on Chronic Obstructive Pulmonary Disease
    Cheng, Yi-Ting
    Lin, Yu-Feng
    Chiang, Kuo-Hwa
    Tseng, Vincent S.
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2017, 21 (02) : 303 - 311
  • [5] Fast and effective cluster-based information retrieval using frequent closed itemsets
    Djenouri, Youcef
    Belhadi, Asma
    Fournier-Viger, Philippe
    Lin, Jerry Chun-Wei
    [J]. INFORMATION SCIENCES, 2018, 453 : 154 - 167
  • [6] Ant system: Optimization by a colony of cooperating agents
    Dorigo, M
    Maniezzo, V
    Colorni, A
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1996, 26 (01): : 29 - 41
  • [7] Eberhart R, 2002, MHS 95 P 6 INT S MIC, P39, DOI [DOI 10.1109/MHS.1995.494215, 10.1109/mhs.1995.494215]
  • [8] The DEEP Project An alternative approach to heterogeneous cluster-computing in the many-core era
    Eicker, Norbert
    Lippert, Thomas
    Moschny, Thomas
    Suarez, Estela
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (08) : 2394 - 2411
  • [9] A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis
    Fahad, Adil
    Alshatri, Najlaa
    Tari, Zahir
    Alamri, Abdullah
    Khalil, Ibrahim
    Zomaya, Albert Y.
    Foufou, Sebti
    Bouras, Abdelaziz
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2014, 2 (03) : 267 - 279
  • [10] An information-theoretic approach to hierarchical clustering of uncertain data
    Gullo, Francesco
    Ponti, Giovanni
    Tagarelli, Andrea
    Greco, Sergio
    [J]. INFORMATION SCIENCES, 2017, 402 : 199 - 215