Research of Information Retrieval Based on Web Page Segmentation

被引:0
作者
Yu, Yangxin [1 ]
机构
[1] Huaiyin Inst Technol, Fac Comp Engn, Huaian 223003, Peoples R China
来源
PROGRESS IN INDUSTRIAL AND CIVIL ENGINEERING, PTS. 1-5 | 2012年 / 204-208卷
关键词
Page Segment; Information Retrieval; !text type='HTML']HTML[!/text] Tag; Similarity;
D O I
10.4028/www.scientific.net/AMM.204-208.4928
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
A Web information retrieval algorithm based on Web page segment is designed, the key idea of which is to segment each Web page into different topic areas or segments according to its HTML tags and contents since Web pages are semi-structure. First, the algorithm builds a HTML tag tree, and then it combines nodes in the tree under the rule of content similarity and visual similarity. During the process of retrieval and ranking, the algorithm makes full use of the segmentation information to sequence the relevant pages. The experimental results show that this method is able to improve the precision in search significantly and it is also a good reference for the design of the future search engines.
引用
收藏
页码:4928 / 4931
页数:4
相关论文
共 4 条
[1]  
[Anonymous], P SIGMOD 1995 INT C
[2]  
Liu Yajun, 2004, Journal of Southeast University (Natural Science Edition), V34, P609
[3]  
[俞扬信 Yu Yangxin], 2007, [计算机与应用化学, Computers and Applied Chemistry], V24, P1277
[4]  
Zhu Zhengyu, 2007, METHOD INFORM RETRIE, V43, P176