Probability based document clustering and image clustering using content-based image retrieval

被引:26
作者
Karthikeyan, M. [1 ]
Aruna, P. [1 ]
机构
[1] Annamalai Univ, Dept Comp Sci & Engn, Chidambaram, Tamil Nadu, India
关键词
Document clustering; Word frequency; Content-based image retrieval; Major colour set; Global colour signature; Distribution block signature; Hue saturation value; Region of Interest; RGB histogram-based image retrieval; CLASSIFICATION;
D O I
10.1016/j.asoc.2012.09.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering of related or similar objects has long been regarded as a potentially useful contribution of helping users to navigate an information space such as a document collection. Many clustering algorithms and techniques have been developed and implemented but as the sizes of document collections have grown these techniques have not been scaled to large collections because of their computational overhead. To solve this problem, the proposed system concentrates on an interactive text clustering methodology, probability based topic oriented and semi-supervised document clustering. Recently, as web and various documents contain both text and large number of images, the proposed system concentrates on content-based image retrieval (CBIR) for image clustering to give additional effect to the document clustering approach. It suggests two kinds of indexing keys, major colour sets (MCS) and distribution block signature (DBS) to prune away the irrelevant images to given query image. Major colour sets are related with colour information while distribution block signatures are related with spatial information. After successively applying these filters to a large database, only small amount of high potential candidates that are somewhat similar to that of query image are identified. Then, the system uses quad modelling method (QM) to set the initial weight of two-dimensional cells in query image according to each major colour and retrieve more similar images through similarity association function associated with the weights. The proposed system evaluates the system efficiency by implementing and testing the clustering results with Dbscan and K-means clustering algorithms. Experiment shows that the proposed document clustering algorithm performs with an average efficiency of 94.4% for various document categories. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:959 / 966
页数:8
相关论文
共 21 条
[1]   Clustering of document collection - A weighting approach [J].
Aliguliyev, Ramiz M. .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) :7904-7916
[2]  
Aptoula E., 2010, IEEE T KNOWLEDGE DAT, V22
[3]  
Chim H., 2008, IEEE T KNOWLEDGE DAT, V20
[4]   Recurrent-Neural-Network-Based Boolean Factor Analysis and Its Application to Word Clustering [J].
Frolov, Alexander A. ;
Husek, Dusan ;
Polyakov, Pavel Yu. .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (07) :1073-1086
[5]   Text stream clustering algorithm based on adaptive feature selection [J].
Gong, Linghui ;
Zeng, Jianping ;
Zhang, Shiyong .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (03) :1393-1399
[6]   Active learning methods for interactive image retrieval [J].
Gosselin, Philippe Henri ;
Cord, Matthieu .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2008, 17 (07) :1200-1211
[7]   Hierarchically SVM classification based on support vector clustering method and its application to document categorization [J].
Hao, Pei-Yi ;
Chiang, Jung-Hsien ;
Tu, Yi-Kun .
EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (03) :627-635
[8]   Hierarchical fuzzy clustering decision tree for classifying recipes of ion implanter [J].
Horng, Shih-Cheng ;
Yang, Feng-Yi ;
Lin, Shieh-Shing .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (01) :933-940
[9]   Using the self organizing map for clustering of text documents [J].
Isa, Dino ;
Kallimani, V. P. ;
Lee, Lam Hong .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (05) :9584-9591
[10]  
Karray F., 2010, IEEE T KNOWLEDGE DAT, V22