Learning Visual Features from Snapshots for Web Search

被引:7
作者
Fan, Yixing [1 ,2 ]
Guo, Jiafeng [2 ]
Lan, Yanyan [2 ]
Xu, Jun [2 ]
Pang, Liang [1 ,2 ]
Cheng, Xueqi [2 ]
机构
[1] Univ Chinese Acad Sci, Beijing, Peoples R China
[2] Chinese Acad Sci, Key Lab Network Data Sci & Technol, Inst Comp Technol, Beijing, Peoples R China
来源
CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT | 2017年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Web Search; Visual Feature; Snapshot; RETRIEVAL; LAYOUT; COLOR; MODEL; TEXT;
D O I
10.1145/3132847.3132943
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When applying learning to rank algorithms to Web search, a large number of features are usually designed to capture the relevance signals. Most of these features are computed based on the extracted textual elements, link analysis, and user logs. However, Web pages are not solely linked texts, but have structured layout organizing a large variety of elements in different styles. Such layout itself can convey useful visual information, indicating the relevance of a Web page. For example, the query-independent layout (i.e., raw page layout) can help identify the page quality, while the query-dependent layout (i.e., page rendered with matched query words) can further tell rich structural information (e.g., size, position and proximity) of the matching signals. However, such visual information of layout has been seldom utilized in Web search in the past. In this work, we propose to learn rich visual features automatically from the layout of Web pages (i.e., Web page snapshots) for relevance ranking. Both query-independent and query-dependent snapshots are considered as the new inputs. We then propose a novel visual perception model inspired by human's visual search behaviors on page viewing to extract the visual features. This model can be learned end-to-end together with traditional human-crafted features. We also show that such visual features can be efficiently acquired in the online setting with an extended inverted indexing scheme. Experiments on benchmark collections demonstrate that learning visual features from Web page snapshots can significantly improve the performance of relevance ranking in ad-hoc Web retrieval tasks.
引用
收藏
页码:247 / 256
页数:10
相关论文
共 37 条
[1]  
[Anonymous], 2006, Hypertext, DOI [10.1145/1149941.1149957, DOI 10.1145/1149941.1149957]
[2]  
[Anonymous], 2003, Journal of machine learning research
[3]   Multiscale Combinatorial Grouping [J].
Arbelaez, Pablo ;
Pont-Tuset, Jordi ;
Barron, Jonathan T. ;
Marques, Ferran ;
Malik, Jitendra .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :328-335
[4]   An Experimental Study of Text Representation Methods for Cross-Site Purchase Preference Prediction Using the Social Text Data [J].
Bai, Ting ;
Dou, Hong-Jian ;
Zhao, Wayne Xin ;
Yang, Ding-Yi ;
Wen, Ji-Rong .
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (04) :828-842
[5]  
Bendersky M., 2011, P 4 ACM INT C WEB SE, P95, DOI DOI 10.1145/1935826.1935849
[6]   Weblint: Quality assurance for the World Wide Web [J].
Bowers, N .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1996, 28 (7-11) :1283-1290
[7]  
Burges C. J., 2010, Learning, V11, DOI DOI 10.1111/J.1467-8535
[8]  
Chakraverty S, 2002, ASP-DAC/VLSI DESIGN 2002: 7TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE AND 15TH INTERNATIONAL CONFERENCE ON VLSI DESIGN, PROCEEDINGS, P251, DOI 10.1109/ASPDAC.2002.994931
[9]  
Chengxiang Zhai, 2001, SIGIR Forum, P334
[10]   Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks [J].
Ciresan, Dan C. ;
Giusti, Alessandro ;
Gambardella, Luca M. ;
Schmidhuber, Juergen .
MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION - MICCAI 2013, PT II, 2013, 8150 :411-418