Design and Implement of Information Extraction System Based on XML

被引:0
|
作者
Xuan, Yanyan [1 ]
Hu, Yan [1 ]
机构
[1] Wuhan Univ Technol, Dept Comp Sci & Technol, Wuhan 430070, Peoples R China
关键词
Information Extraction; XML; XPath; XSLT; Extraction Rule;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
By studying the structure of HTML documents, this paper solves the problem of web information extraction through the standard XML technology and poses an information extraction method based on XML: construct HTMLDOM tree to implement Web cleaning and generate XHTML documents by analyzing HTML web, then analyze the XHTML files through the Xerces-J's DOM methods and construct an XPath generation algorithm; use the advantages of XSLT and XPath technology in the aspects of data location and conversion to automatically learn and generate the information extraction rules and implement the Web information extraction according to the generated XPath.
引用
收藏
页码:1400 / 1404
页数:5
相关论文
共 50 条
  • [1] XML-based Web Information Extraction System Design and Implementation
    Jun, Ma
    Li Tihong
    PROCEEDINGS OF 2010 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (ICCSIT 2010), VOL 8, 2010, : 551 - 554
  • [2] The Design and Implement of Tourism Information System Based on GIS
    Fu Chunchang
    Zhang Nan
    INTERNATIONAL CONFERENCE ON APPLIED PHYSICS AND INDUSTRIAL ENGINEERING 2012, PT A, 2012, 24 : 528 - 533
  • [3] The Design and Implement of Tourism Information System Based on GIS
    Fu Chunchang
    Zhang Nan
    2010 INTERNATIONAL CONFERENCE ON COMMUNICATION AND VEHICULAR TECHNOLOGY (ICCVT 2010), VOL II, 2010, : 21 - 24
  • [4] Design and Implementation of XML Schema Based Information System
    Cheng, Zheng
    Wu, Jiaju
    Chen, Quangeng
    Ma, Yongqi
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 4652 - 4655
  • [5] Design and implement of customer information retrieval system based on semantic web
    Gu, Mi Sug
    Hwang, Jeong Hee
    Ryu, Keun Ho
    COMPUTATIONAL INTELLIGENCE, PT 2, PROCEEDINGS, 2006, 4114 : 367 - 378
  • [6] Research on Web Information Extraction Based on XML
    Hu, Yan
    Xuan, Yanyan
    SECOND INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING: WGEC 2008, PROCEEDINGS, 2008, : 201 - 204
  • [7] The Design and Implement of Image Information Identify System
    Guo, Fengying
    INFORMATION TECHNOLOGY FOR MANUFACTURING SYSTEMS, PTS 1 AND 2, 2010, : 753 - 755
  • [8] Design of Marine Information Metadata and Directory Service System Based On XML
    Jiang, Yongguo
    Lu, Lianying
    Guo, Zhongwen
    FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 574 - 577
  • [9] Design of an XML-based extensible multimedia information retrieval system
    Milosavljevic, B
    Konjovic, Z
    FOURTH INTERNATIONAL SYMPOSIUM ON MULTIMEDIA SOFTWARE ENGINEERING, PROCEEDINGS, 2002, : 114 - 121
  • [10] Design and Implementation of Inertial Measurement Information Acquisition System Based on XML
    Yu, Pei
    Wang, Ting
    Zhang, Jinyun
    Li, Jing
    3RD INTERNATIONAL CONFERENCE ON AUTOMATION, CONTROL AND ROBOTICS ENGINEERING (CACRE 2018), 2018, 428