Integration of HTML']HTML Tables in Web Pages

被引:0
|
作者
Akbar, Memen [1 ]
Azizah, Fazat Nur [2 ]
Saptawati, G. A. Putri [2 ]
机构
[1] Inst Teknol Bandung, Sch Elect Engn & Informat, Informat Study Program, Bandung, Indonesia
[2] Inst Teknol Bandung, Sch Elect Engn & Informat, Data & Software Engn Res Grp, Bandung, Indonesia
来源
2015 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE) | 2015年
关键词
data integration; !text type='HTML']HTML[!/text] table; ontology; table integration; web page;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The growing number of web pages on the internet introduces a need to combine and integrate information from HTML tables of different web pages that contain similar information into a single web page, especially information from the same domain of interest. This paper presents an approach of HTML table integration by combining several existing methods that are proved to solve different issues in the integration processes. The integration of HTML table consists of three phases: (1) extraction of the structure of the tables; (2) integration of the tables' schema; (3) integration of the data values. To solve the conflicts in semantics and naming in the tables schema, domain-ontology is used. To improve quality of integration of data values in the tables, the vector space model is used to check the duplications of data values. As the integration result, a single HTML table is obtained. The approach is implemented on an engine built using Phyton. Results of the experiment shows that the engine can successfully integrate two HTML tables into single table.
引用
收藏
页码:132 / 137
页数:6
相关论文
共 50 条
  • [1] Automating the extraction of data from HTML']HTML tables with unknown structure
    Embley, DW
    Tao, C
    Liddle, SW
    DATA & KNOWLEDGE ENGINEERING, 2005, 54 (01) : 3 - 28
  • [2] Information extraction from HTML']HTML tables base on domain ontology
    Hsiao, SL
    Chou, SC
    Chang, LP
    IKE'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2003, : 70 - 76
  • [3] Managing knowledge on the Web - Extracting ontology from HTML']HTML Web
    Du, Timon C.
    Li, Feng
    King, Irwin
    DECISION SUPPORT SYSTEMS, 2009, 47 (04) : 319 - 331
  • [4] A Method for Materials Knowledge Extraction from HTML']HTML Tables Based on Sibling Comparison
    Zhang, Xiaoming
    Lv, Pengtao
    Zhao, Chongchong
    Wang, Jianxian
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2016, 26 (06) : 897 - 926
  • [5] Wikxhibit: Using HTML']HTML and Wikidata to Author Applications that Link Data Across the Web
    Alrashed, Tarfah
    Verou, Lea
    Karger, David R.
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, UIST 2022, 2022,
  • [6] A hybrid method to categorize HTML']HTML documents
    Khordad, M
    Shamsfard, M
    Kazemeyni, F
    Data Mining VI: Data Mining, Text Mining and Their Business Applications, 2005, : 331 - 340
  • [7] Study on the Technology of Information Hiding Based on HTML']HTML Tags
    Wang, Xiaofeng
    ADVANCES IN APPLIED SCIENCE AND INDUSTRIAL TECHNOLOGY, PTS 1 AND 2, 2013, 798-799 : 423 - 426
  • [8] Enriching Product Ads with Metadata from HTML']HTML Annotations
    Ristoski, Petar
    Mika, Peter
    SEMANTIC WEB: LATEST ADVANCES AND NEW DOMAINS, 2016, 9678 : 151 - 167
  • [9] Research on HTML']HTML5 technology and application mode
    Guo, C. M.
    Zhu, X. W.
    INFORMATION SCIENCE AND ELECTRONIC ENGINEERING, 2017, : 217 - 220
  • [10] Extracting personalised ontology from data-intensive web application: an HTML']HTML forms-based reverse engineering approach
    Benslimane, Sidi Mohamed
    Malki, Mimoun
    Rahmouni, Mustapha Kamal
    Benslimane, Djamal
    INFORMATICA, 2007, 18 (04) : 511 - 534