Effective page segmentation combining pattern analysis and visual separators for browsing on small screens

被引:4
作者
Xiang, Peifeng [1 ]
Yang, Xin [1 ]
Shi, Yuanchun [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
来源
2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS) | 2006年
关键词
D O I
10.1109/WI.2006.67
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Page segmentation plays a key role in browsing on small screens. It breaks a large page into smaller segments according to their semantic relationships. Then, various approaches such as single column adaptation and thumbnail view with zooming links can be implemented based on these page segments. However, for current flexible web pages, segmentation remains a challenging task. This paper proposes an effective automatic segmentation method which combining pattern analysis and visual separators. The basic idea is that a page's semantic structure is largely reflected by repeated continuous patterns and visual separators, which coincides with human's visual perception. The proposed method works in three steps: generating a refined tag tree from the DOM tree, recognizing and merging inexact patterns recursively, and segmenting the others by visual separators. Our experimental results show that the proposed method outperforms existing methods, especially for pages automatically generated from templates.
引用
收藏
页码:831 / +
页数:2
相关论文
共 15 条
  • [1] [Anonymous], P 8 ACM SIGKDD INT C
  • [2] [Anonymous], 1997, ALGORITHMS STRINGS T, DOI DOI 10.1017/CBO9780511574931
  • [3] Baluja S., 2006, WWW '06: proceedings of the 15th international conference on World Wide Web, P33
  • [4] BUTTLER D, 2001, WWW ACM, P361
  • [5] Buyukkokten O., 2000, CHI 2000 Conference Proceedings. Conference on Human Factors in Computing Systems. CHI 2000. The Future is Here, P430, DOI 10.1145/332040.332470
  • [6] CAI D, 2003, MSRTR200370
  • [7] CHAKRABARTI S, 2001, WWW ACM, P211
  • [8] CHEN J, 2001, WWW, P587
  • [9] CHEN Y, 2003, WWW 03, P225
  • [10] Embley DW, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P467, DOI 10.1145/304181.304223