Clustering and searching WWW images using link and page layout analysis

被引:31
作者
He, Xiaofei
Cai, Deng
Wen, Ji-Rong
Ma, Wei-Ying
Zhang, Hong-Jiang
机构
[1] Yahoo Res Labs, Burbank, CA 91504 USA
[2] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[3] Microsoft Res Asia, Beijing, Peoples R China
关键词
algorithms; management; performance; experimentation; web mining; image search; image clustering; link analysis;
D O I
10.1145/1230812.1230816
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the rapid growth of the number of digital images on the Web, there is an increasing demand for an effective and efficient method for organizing and retrieving the available images. This article describes iFind, a system for clustering and searching WWW images. By using a vision-based page segmentation algorithm, a Web page is partitioned into blocks, and the textual and link information of an image can be accurately extracted from the block containing that image. The textual information is used for image indexing. By extracting the page-to-block, block-to-image, block-to-page relationships through link structure and page layout analysis, we construct an image graph. Our method is less sensitive to noisy links than previous methods like PageRank, HITS, and PicASHOW, and hence the image graph can better reflect the semantic relationship between images. Using the notion of Markov Chain, we can compute the limiting probability distributions of the images, ImageRanks, which characterize the importance of the images. The ImageRanks are combined with the relevance scores to produce the final ranking for image search. With the graph models, we can also use techniques from spectral graph theory for image clustering and embedding, or 2-D visualization. Some experimental results on 11.6 million images downloaded from the Web are provided in the article.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] Situation-Oriented Clustering of Sightseeing Spot Images Using Visual and Tag Information
    Chen, Chia-Huang
    Takama, Yasufumi
    6TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS, AND THE 13TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS, 2012, : 416 - 421
  • [22] Using text mining and link analysis for software mining
    Grcar, Miha
    Grobehlik, Marko
    Mladenic, Dunja
    MINING COMPLEX DATA, 2008, 4944 : 1 - 12
  • [23] Hierarchical Clustering of Hyperspectral Images Using Rank-Two Nonnegative Matrix Factorization
    Gillis, Nicolas
    Kuang, Da
    Park, Haesun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2015, 53 (04): : 2066 - 2078
  • [24] Statistical Analysis of Microarray Data Clustering using NMF, Spectral Clustering, Kmeans, and GMM
    Mirzal, Andri
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (02) : 1173 - 1192
  • [25] Clustering analysis using manifold kernel concept factorization
    Li, Ping
    Chen, Chun
    Bu, Jiajun
    NEUROCOMPUTING, 2012, 87 : 120 - 131
  • [26] Automatic Decision Support for Clinical Diagnostic Literature Using Link Analysis in a Weighted Keyword Network
    Li, Shuqing
    Sun, Ying
    Soergel, Dagobert
    JOURNAL OF MEDICAL SYSTEMS, 2018, 42 (02)
  • [27] Gaussian mixture models clustering using Markov random field for multispectral remote sensing images
    Liu, Xiao-Yun
    Liao, Zhi-Wu
    Wang, Zhen-Song
    Chen, Wu-Fan
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 4155 - +
  • [28] Improved BTC Algorithm for Gray Scale Images Using K-Means Quad Clustering
    Mathews, Jayamol
    Nair, Madhu S.
    Jo, Liza
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT IV, 2012, 7666 : 9 - 17
  • [30] Multi-Featured and Fuzzy Based Dual Analysis Approach to Optimize the Subspace Clustering for Images
    Kapil Juneja
    Wireless Personal Communications, 2020, 114 : 2417 - 2447