Semisupervised clustering algorithm combining SUBCLU and constrained clustering for detecting groups in high dimensional datasets

被引：0

作者：

Alexander Calvo-Valverde, Luis ^{[1
,2
]}

Vallejos-Pena, Alonso ^{[1
]}

机构：

[1] Inst Tecnol Costa Rica, San Carlos, Costa Rica

[2] Programa Multidisciplinar & Sci, San Carlos, Costa Rica

来源：

TECNOLOGIA EN MARCHA | 2018年 / 31卷 / 03期

关键词：

Data mining; subspaces; SUBCLU; clustering; clustering by constraint;

D O I：

10.18845/tm.v31i3.3904

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

High dimensional data poses a challenge to traditional clustering algorithms, where the similarity measures are not meaningful, affecting the quality of the groups. As a result, subspace clustering algorithms have been proposed as an alternative, aiming to find all groups in all spaces of the dataset [1]. By detecting groups on lower dimensional spaces, each group may belong to different subspaces of the original dataset [2]. Therefore, attributes the user considers of interest may be excluded in some or all groups, decreasing the value of the result for the data analysts. In this project, a new algorithm is proposed, that combines SUBCLU [3] and the clustering algorithms by constraint [4], which allows the users to identify variables as attributes of interest based on prior knowledge of domain, targeting direct group detection toward spaces that include user's attributes of interest, and thereafter, generating more meaningful groups.

引用

页码：74 / 85

页数：12

共 15 条

[1] Chen LF, 2008, 2008 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2008), VOLS 1-4, P225, DOI 10.1109/ICCT.2008.4716209
[2] SYNTHESIZED CLUSTERING - A METHOD FOR AMALGAMATING ALTERNATIVE CLUSTERING BASES WITH DIFFERENTIAL WEIGHTING OF VARIABLES
DESARBO, WS
CARROLL, JD
CLARK, LA
GREEN, PE
[J]. PSYCHOMETRIKA, 1984, 49 (01) : 57 - 78
[3] Guanhua Chen, 2009, Proceedings of the 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2009), P490, DOI 10.1109/FSKD.2009.463
[4] Automated variable weighting in k-means type clustering
Huang, JZX
Ng, MK
Rong, HQ
Li, ZC
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (05) : 657 - 668
[5] Kailing K, 2003, LECT NOTES ARTIF INT, V2838, P241
[6] Clustering High-Dimensional Data: A Survey on Subspace Clustering, Pattern-Based Clustering, and Correlation Clustering
Kriegel, Hans-Peter
Kroeger, Peer
Zimek, Arthur
[J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (01)
[7] Kroger P., 2004, P 4 SIAM INT C DAT M, DOI [10.1137/1.9781611972740.23, DOI 10.1137/1.9781611972740.23]
[8] Muller E, 2009, PROC VLDB ENDOW, V2
[9] Parsons L., 2004, ACM SIGKDD EXPLORATI, V6, P90, DOI [DOI 10.1145/1007730.1007731, 10.1145/1007730.1007731]
[10] SILHOUETTES - A GRAPHICAL AID TO THE INTERPRETATION AND VALIDATION OF CLUSTER-ANALYSIS
ROUSSEEUW, PJ
[J]. JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 1987, 20 : 53 - 65

← 1 2 →