XML-Based Web Data Pattern Discovery and Extraction

被引：0

作者：

Jia, Rui ^{[1
]}

Xu, Shicheng ^{[1
]}

Peng, Chengbao ^{[1
]}

机构：

[1] Neusoft Corp, Shenyang, Peoples R China

来源：

INFORMATION COMPUTING AND APPLICATIONS, PT 1 | 2012年 / 307卷

关键词：

Web data extraction; XML Clustering; Pattern Discovery;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents an XML-based web data extraction method. This method translates web page into XML document, analyze XML document by using XPath/XSLT, discover web page data pattern and similarity by using XML clustering algorithm, construct XPath-based data extraction rule template. This method improves the robustness and versatility of web data extraction system. Experiment result shows that the data extraction method has high precision and is adaptive to web pages in different sites and with different structures.

引用

页码：708 / 715

页数：8