Study on Text Information Extraction Model and Algorithm of HTML']HTML Documents

被引:0
|
作者
Li Chunyan
Jiang Ilaiyang
机构
来源
PROCEEDINGS OF 2010 CROSS-STRAIT CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY | 2010年
关键词
Data Extraction; Equivalent Class; Tagging; Document Object Model;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article improves the automatic data extraction method of Web information based on HTML. This method can extract structure data from non-structure information on the Web. What this article shows is as follows:Firstly, the method of EXALG system is analyzed and its problems are found out. Secondly, the improved EXALG system is provided Thirdly, the privilege of preciseness and completeness of the new system is examined by data resource and experiment results of the author of EXALG.
引用
收藏
页码:399 / 403
页数:5
相关论文
共 5 条
  • [1] ALBERTO TV, 2005, LECT NOTES COMPUTER, V3406, P539
  • [2] Laender AHF, 2002, SIGMOD REC, V31, P84
  • [3] RAMSHAW LA, 2005, IEEE INT C AC SPEECH, V5, P69
  • [4] WIEDERHOLD G, 1992, IEEE COMPUT, P33
  • [5] YI L, ELIMINATING NOISY IN