Clustering and searching WWW images using link and page layout analysis

被引:31
|
作者
He, Xiaofei
Cai, Deng
Wen, Ji-Rong
Ma, Wei-Ying
Zhang, Hong-Jiang
机构
[1] Yahoo Res Labs, Burbank, CA 91504 USA
[2] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[3] Microsoft Res Asia, Beijing, Peoples R China
关键词
algorithms; management; performance; experimentation; web mining; image search; image clustering; link analysis;
D O I
10.1145/1230812.1230816
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the rapid growth of the number of digital images on the Web, there is an increasing demand for an effective and efficient method for organizing and retrieving the available images. This article describes iFind, a system for clustering and searching WWW images. By using a vision-based page segmentation algorithm, a Web page is partitioned into blocks, and the textual and link information of an image can be accurately extracted from the block containing that image. The textual information is used for image indexing. By extracting the page-to-block, block-to-image, block-to-page relationships through link structure and page layout analysis, we construct an image graph. Our method is less sensitive to noisy links than previous methods like PageRank, HITS, and PicASHOW, and hence the image graph can better reflect the semantic relationship between images. Using the notion of Markov Chain, we can compute the limiting probability distributions of the images, ImageRanks, which characterize the importance of the images. The ImageRanks are combined with the relevance scores to produce the final ranking for image search. With the graph models, we can also use techniques from spectral graph theory for image clustering and embedding, or 2-D visualization. Some experimental results on 11.6 million images downloaded from the Web are provided in the article.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] LABA: Logical Layout Analysis of Book Page Images in Arabic Using Multiple Support Vector Machines
    Qin, Wenda
    Elanwar, Randa
    Betke, Margrit
    2018 IEEE 2ND INTERNATIONAL WORKSHOP ON ARABIC AND DERIVED SCRIPT ANALYSIS AND RECOGNITION (ASAR), 2018, : 35 - 40
  • [2] Link Proximity Analysis - Clustering Websites by Examining Link Proximity
    Gipp, Bela
    Taylor, Adriana
    Beel, Joeran
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2010, 6273 : 449 - 452
  • [3] Using Link-Based Consensus Clustering for Mixed-Type Data Analysis
    Boongoen, Tossapon
    Iam-On, Natthakan
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (01): : 1993 - 2011
  • [4] Clustering of lung adenocarcinomas classes using automated texture analysis on CT images
    Pires, Antonio
    Rusinek, Henry
    Suh, James
    Naidich, David P.
    Pass, Harvey
    Ko, Jane P.
    MEDICAL IMAGING 2013: IMAGE PROCESSING, 2013, 8669
  • [5] Selection of canonical images of travel attractions using image clustering and aesthetics analysis
    Liu, Jen-Chang
    Liang, Yin-Chen
    Lin, Shih-Wei
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2013, 8 (04) : 324 - 335
  • [6] Developing urban residential reference buildings using clustering analysis of satellite images
    Li, Xinyi
    Yao, Runming
    Liu, Meng
    Costanzo, Vincenzo
    Yu, Wei
    Wang, Wenbo
    Short, Alan
    Li, Baizhan
    ENERGY AND BUILDINGS, 2018, 169 : 417 - 429
  • [7] An improved clustering ensemble method based link analysis
    Hao, Zhi-Feng
    Wang, Li-Juan
    Cai, Rui-Chu
    Wen, Wen
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2015, 18 (02): : 185 - 195
  • [8] An improved clustering ensemble method based link analysis
    Zhi-Feng Hao
    Li-Juan Wang
    Rui-Chu Cai
    Wen Wen
    World Wide Web, 2015, 18 : 185 - 195
  • [9] Web Page Recognition Algorithm Based on Link Analysis in Theme Search Engine
    Chen, Zude
    Liu, Jianxun
    Zhai, Haijun
    Jiang, Lei
    Cao, Buqing
    SECOND INTERNATIONAL CONFERENCE ON CLOUD AND GREEN COMPUTING / SECOND INTERNATIONAL CONFERENCE ON SOCIAL COMPUTING AND ITS APPLICATIONS (CGC/SCA 2012), 2012, : 405 - 409
  • [10] A web page usage prediction scheme using sequence indexing and clustering techniques
    Dimopoulos, Costantinos
    Makris, Christos
    Panagis, Yannis
    Theodoridis, Evangelos
    Tsakalidis, Athanasios
    DATA & KNOWLEDGE ENGINEERING, 2010, 69 (04) : 371 - 382