A hierarchical co-clustering algorithm for high-order heterogeneous data

被引:0
作者
Yang, Xinxin [1 ]
Huang, Shaobin [1 ]
机构
[1] College of Computer Science and Technology, Harbin Engineering University, Harbin
来源
Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2015年 / 52卷 / 01期
关键词
Co-clustering; Hierarchical clustering; High-order heterogeneous data; Measure of association; Multiple feature space;
D O I
10.7544/issn1000-1239.2015.20130493
中图分类号
学科分类号
摘要
The availability of high-order heterogeneous data represented with multiple features coming from heterogeneous domains is getting more and more common in real world application. High-order co-clustering algorithms can fuse multiple feature space information to improve clustering results effectivity, so recently it is becoming one of the hottest research topics. Most existing high-order co-clustering algorithms are non-hierarchical clustering algorithms. However, there are always hierarchical cluster structures hidden in high-order heterogeneous data. In order to mine the hidden patterns in datasets more effectively, we develop a high-order hierarchical co-clustering algorithm (HHCC). Goodman-Kruskal τ is used to measure the association of objects and features, which is an index measuring association of categorical variables. The objects which are strong association are partitioned into the same objects clusters, and simutaneously the features which are strong association are partitioned into the same features clusters too. HHCC algorithm uses Goodman-Kruskal τ to quantify the quality of clustering results of objects and features of every level. According to optimizing Goodman-Kruskal τ by a locally search approach, the number of clusters is automatically determined and clustering results of every hierarchy are obtained. The top-down strategy is adopted and a tree-like cluster structure is formed at last. Experimental results demonstrate that HHCC algorithm outperforms four classical homogeneous hierarchical algorithms and five previous high-order co-clustering algorithms. ©, 2015, Science Press. All right reserved.
引用
收藏
页码:200 / 210
页数:10
相关论文
共 23 条
[1]  
Long B., Wu X., Zhang Z., Et al., Unsupervised learning on k-partite graphs, Proc of the 12th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, pp. 317-326, (2006)
[2]  
Ienco D., Robardet C., Pensa R.G., Et al., Parameter-less co-clustering for star-structured heterogeneous data, Data Mining and Knowledge Discovery, 26, 2, pp. 217-254, (2013)
[3]  
Zhou Z., Wang J., Machine Learning and Application, (2007)
[4]  
Wang H., Nie F., Huang H., Et al., Nonnegative matrix tri-factorization based high-order co-clustering and its fast implementation, Proc of the 11th IEEE Int Conf on Data Mining, pp. 174-183, (2011)
[5]  
Gao B., Liu T., Zheng X., Et al., Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering, Proc of the 11th ACM SIGKDD Int Conf on Knowledge Discovery in Data Mining, pp. 41-50, (2005)
[6]  
Gao B., Liu T., Ma W., Star-structured high-order heterogeneous data co-clustering based on consistent information theory, Proc of the 6th IEEE Int Conf on Data Mining, pp. 880-884, (2006)
[7]  
Shao J., Yin W., Ma S., Et al., Topic discovery of web video using star-structured k-partite graph, Proc of the 18th Int Conf on Multimedia, pp. 915-918, (2010)
[8]  
Gao B., Liu T., Qin T., Et al., Web image clustering by consistent utilization of visual features and surrounding texts, Proc of the 13th Annual ACM Int Conf on Multimedia, pp. 112-121, (2005)
[9]  
Rege M., Dong M., Hua J., Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering, Proc of the 17th Int Conf on World Wide Web, pp. 317-326, (2008)
[10]  
Gao B., Liu T., Feng G., Et al., Hierarchical taxonomy preparation for text categorization using consistent bipartite spectral graph co-partitioning, IEEE Trans on Knowledge and Data Engineering, 17, 9, pp. 1263-1273, (2005)