XML-Based Web Data Pattern Discovery and Extraction

被引:0
|
作者
Jia, Rui [1 ]
Xu, Shicheng [1 ]
Peng, Chengbao [1 ]
机构
[1] Neusoft Corp, Shenyang, Peoples R China
关键词
Web data extraction; XML Clustering; Pattern Discovery;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents an XML-based web data extraction method. This method translates web page into XML document, analyze XML document by using XPath/XSLT, discover web page data pattern and similarity by using XML clustering algorithm, construct XPath-based data extraction rule template. This method improves the robustness and versatility of web data extraction system. Experiment result shows that the data extraction method has high precision and is adaptive to web pages in different sites and with different structures.
引用
收藏
页码:708 / 715
页数:8
相关论文
共 50 条
  • [1] Active XML-based Web data integration
    Rashed Salem
    Omar Boussaïd
    Jérôme Darmont
    Information Systems Frontiers, 2013, 15 : 371 - 398
  • [2] Active XML-based Web data integration
    Salem, Rashed
    Boussaid, Omar
    Darmont, Jerome
    INFORMATION SYSTEMS FRONTIERS, 2013, 15 (03) : 371 - 398
  • [3] An XML-based wrapper generator for Web information extraction
    Liu, L
    Han, W
    Buttler, D
    Pu, C
    Tang, W
    SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999: SIGMOD99: PROCEEDINGS OF THE 1999 ACM SIGMOD - INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 1999, : 540 - 543
  • [4] XML-based Web Information Extraction System Design and Implementation
    Jun, Ma
    Li Tihong
    PROCEEDINGS OF 2010 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (ICCSIT 2010), VOL 8, 2010, : 551 - 554
  • [5] XML-based web will be next
    Anon
    Newspaper Techniques, 2002, (JUL.):
  • [6] An XML-based database for knowledge discovery
    Meo, Rosa
    Psaila, Giuseppe
    CURRENT TRENDS IN DATABASE TECHNOLOGY - EDBT 2006, 2006, 4254 : 814 - 828
  • [7] The XML-based information extraction on data-intensive page
    Li, Yanheng
    2007 IFIP INTERNATIONAL CONFERENCE ON NETWORK AND PARALLEL COMPUTING WORKSHOPS, PROCEEDINGS, 2007, : 1027 - 1030
  • [8] Research on XML-based Web application
    Qi, Ketao
    Wang, Liangzhu
    Zhang, Shensheng
    2002, Shanghai Computer Society (28):
  • [9] XML-based data cube
    Wang, XL
    Dong, YS
    2001 INTERNATIONAL CONFERENCES ON INFO-TECH AND INFO-NET PROCEEDINGS, CONFERENCE A-G: INFO-TECH & INFO-NET: A KEY TO BETTER LIFE, 2001, : E48 - E53
  • [10] Toward XML-based knowledge discovery systems
    Meo, R
    Psaila, G
    2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 665 - 668