Learning Web Page Block Functions using Roles of Images

被引:1
作者
Yang, Xin [1 ]
Shi, Yuanchun [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
来源
2008 3RD INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND APPLICATIONS, VOLS 1 AND 2 | 2008年
关键词
Web Page Block; Block Function; Role of Image; Machine Learning;
D O I
10.1109/ICPCA.2008.4783565
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Making use of block information in Web IR and Data Mining tasks calls for a good understanding of the function of each block. Existing works on classifying block functions and judging block importance have not made full use of the image factor, and only simple image features were considered. We regard image as a strong indicator of Web page blocks with various functions and propose to learn block functions using roles of images as part of block features. Blocks are generated from Web page segmentation and roles of images are automatically decided by image classification. We experiment on 140 Web pages and demonstrate that utilizing roles of images can significantly improve the classification quality of learning Web page block functions. We also measure the usefulness of different roles of images and evaluate the effect of two page segmentation methods.
引用
收藏
页码:151 / 156
页数:6
相关论文
共 17 条
  • [1] [Anonymous], 2003, VIPS VISION BASED PA
  • [2] [Anonymous], P 8 ACM SIGKDD INT C
  • [3] [Anonymous], 2004, KDD '03, DOI DOI 10.1145/988672.988700
  • [4] [Anonymous], P 16 WWW
  • [5] [Anonymous], 2001, International Conference on World Wide Web
  • [6] Baluja S., 2006, WWW '06: proceedings of the 15th international conference on World Wide Web, P33
  • [7] Fernandes D., 2007, P 16 ACM C C INF KNO, P165
  • [8] GAEREMYNCK Y, 2003, P INT C INT US INT M, P69
  • [9] Guo H, 2007, PROC INT CONF DOC, P929
  • [10] Gupta S, 2003, P 12 INT C WORLD WID, P207, DOI [DOI 10.1145/775152.775182, 10.1145/775152.775182]