Towards information-theoretic K-means clustering for image indexing

被引:30
作者
Cao, Jie [1 ]
Wu, Zhiang [1 ]
Wu, Junjie [2 ]
Liu, Wenjie [3 ]
机构
[1] Nanjing Univ Finance & Econ, Jiangsu Prov Key Lab E Business, Nanjing, Jiangsu, Peoples R China
[2] Beihang Univ, Sch Econ & Management, Dept Informat Syst, Beijing, Peoples R China
[3] Nanjing Univ, Dept Comp Sci & Technol, Nanjing 210008, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Information-theoretic clustering; K-means; KL-divergence; Variable Neighborhood Search (VNS); PATTERN DISCOVERY;
D O I
10.1016/j.sigpro.2012.07.030
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Information-theoretic K-means (Info-Kmeans) aims to cluster high-dimensional data, such as images featured by the bag-of-features (BOF) model, using K-means algorithm with KL-divergence as the distance. While research efforts along this line have shown promising results, a remaining challenge is to deal with the high sparsity of image data. Indeed, the centroids may contain many zero-value features that create a dilemma in assigning objects to centroids during the iterative process of Info-Kmeans. To meet this challenge, we propose a Summation-bAsed Incremental Learning (SAIL) algorithm for Info-Kmeans clustering in this paper. Specifically, SAIL can avoid the zero-feature dilemma by replacing the computation of KL-divergence between instances and centroids, by the computation of centroid entropies only. To further improve the clustering quality, we also introduce the Variable Neighborhood Search (VNS) meta-heuristic and propose the V-SAIL algorithm. Experimental results on various benchmark data sets clearly demonstrate the effectiveness of SAIL and V-SAIL. In particular, they help to successfully recognize nine out of 11 landmarks from extremely high-dimensional and sparse image vectors, with the presence of severe noise. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:2026 / 2037
页数:12
相关论文
共 27 条
[1]  
[Anonymous], 2006, Elements of Information Theory
[2]  
[Anonymous], P IEEE INT C COMP VI
[3]  
[Anonymous], 2007, CVPR
[4]  
[Anonymous], 2003, Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining
[5]  
[Anonymous], ACM T MULTIMEDIA COM
[6]  
Brand Louis., 2006, Advanced calculus: an introduction to classical analysis
[7]  
Dhillon I. S., 2003, Journal of Machine Learning Research, V3, P1265, DOI 10.1162/153244303322753661
[8]   Graph based k-means clustering [J].
Galluccio, Laurent ;
Michel, Olivier ;
Comon, Pierre ;
Hero, Alfred O., III .
SIGNAL PROCESSING, 2012, 92 (09) :1970-1984
[9]   NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization [J].
Guan, Naiyang ;
Tao, Dacheng ;
Luo, Zhigang ;
Yuan, Bo .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2012, 60 (06) :2882-2898
[10]   Non-negative Patch Alignment Framework [J].
Guan, Naiyang ;
Tao, Dacheng ;
Luo, Zhigang ;
Yuan, Bo .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (08) :1218-1230