TW-k-Means: Automated Two-Level Variable Weighting Clustering Algorithm for Multiview Data

被引：147

作者：

Chen, Xiaojun ^{[1
,2
]}

Xu, Xiaofei ^{[3
]}

Huang, Joshua Zhexue ^{[2
,4
]}

Ye, Yunming ^{[1
]}

机构：

[1] Harbin Inst Technol, Shenzhen Grad Sch, C202,HIT Campus Xili Univ Town, Shenzhen 518055, Peoples R China

[2] Shenzhen Univ, Coll Comp Sci & Software, Shenzhen 518060, Peoples R China

[3] Harbin Inst Technol, Dept Comp Sci & Engn, Harbin 150001, Peoples R China

[4] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen Key Lab High Performance Data Min, Shenzhen 518055, Peoples R China

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2013年 / 25卷 / 04期

关键词：

Data mining; clustering; multiview learning; k-means; variable weighting; SELECTION; OBJECTS;

D O I：

10.1109/TKDE.2011.262

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes TW-k-means, an automated two-level variable weighting clustering algorithm for multiview data, which can simultaneously compute weights for views and individual variables. In this algorithm, a view weight is assigned to each view to identify the compactness of the view and a variable weight is also assigned to each variable in the view to identify the importance of the variable. Both view weights and variable weights are used in the distance function to determine the clusters of objects. In the new algorithm, two additional steps are added to the iterative k-means clustering process to automatically compute the view weights and the variable weights. We used two real-life data sets to investigate the properties of two types of weights in TW-k-means and investigated the difference between the weights of TW-k-means and the weights of the individual variable weighting method. The experiments have revealed the convergence property of the view weights in TW-k-means. We compared TW-k-means with five clustering algorithms on three real-life data sets and the results have shown that the TW-k-means algorithm significantly outperformed the other five clustering algorithms in four evaluation indices.

引用

页码：932 / 944

页数：13

共 35 条

[1]

[Anonymous], P 8 SIAM INT C DAT M

[2]

[Anonymous], PLAG WIK FREE ENC

[3]

[Anonymous], 2007, Proceedings of the International Conference on Machine Learning, DOI DOI 10.1145/1273496.1273642

[4] Multi-view clustering [J].

Bickel, S ;

Scheffer, T .

FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, :19-26

[5]

Blaschko M. B., 2008, PROC IEEE C COMPUTER, P1

[6] High-dimensional data clustering [J].

Bouveyron, C. ;

Girard, S. ;

Schmid, C. .

COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) :502-519

[7]

Chaudhuri K., 2009, P INT C MACH LEARN, P129, DOI DOI 10.1145/1553374.1553391

[8] Constrained Locally Weighted Clustering [J].

Cheng, Hao ;

Hua, Kien A. ;

Khanh Vu .

PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (01) :90-101

[9]

De Sa V.R., 2005, P INT C MACH LEARN W, P20

[10] Enhanced soft subspace clustering integrating within-cluster and between-cluster information [J].

Deng, Zhaohong ;

Choi, Kup-Sze ;

Chung, Fu-Lai ;

Wang, Shitong .

PATTERN RECOGNITION, 2010, 43 (03) :767-781

← 1 2 3 4 →