Integration of HTML']HTML Tables in Web Pages

被引:0
作者
Akbar, Memen [1 ]
Azizah, Fazat Nur [2 ]
Saptawati, G. A. Putri [2 ]
机构
[1] Inst Teknol Bandung, Sch Elect Engn & Informat, Informat Study Program, Bandung, Indonesia
[2] Inst Teknol Bandung, Sch Elect Engn & Informat, Data & Software Engn Res Grp, Bandung, Indonesia
来源
2015 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE) | 2015年
关键词
data integration; !text type='HTML']HTML[!/text] table; ontology; table integration; web page;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The growing number of web pages on the internet introduces a need to combine and integrate information from HTML tables of different web pages that contain similar information into a single web page, especially information from the same domain of interest. This paper presents an approach of HTML table integration by combining several existing methods that are proved to solve different issues in the integration processes. The integration of HTML table consists of three phases: (1) extraction of the structure of the tables; (2) integration of the tables' schema; (3) integration of the data values. To solve the conflicts in semantics and naming in the tables schema, domain-ontology is used. To improve quality of integration of data values in the tables, the vector space model is used to check the duplications of data values. As the integration result, a single HTML table is obtained. The approach is implemented on an engine built using Phyton. Results of the experiment shows that the engine can successfully integrate two HTML tables into single table.
引用
收藏
页码:132 / 137
页数:6
相关论文
共 50 条
  • [31] Ontology-based automatic classification of web pages
    Song, Mu-Hee
    Lim, Soo-Yeon
    Park, Seong-Bae
    Kang, Dong-Jin
    Lee, Sang-Jo
    APPLIED SOFT COMPUTING TECHNOLOGIES: THE CHALLENGE OF COMPLEXITY, 2006, 34 : 483 - 493
  • [32] Retargeting web pages to other computing platforms with VAQUITA
    Bouillon, L
    Vanderdonckt, J
    NINTH WORKING CONFERENCE ON REVERSE ENGINEERING, PROCEEDINGS, 2002, : 339 - 348
  • [33] Training the genre classifier for automatic classification of web pages
    Vidulin, Vedrana
    Lustrek, Mitja
    Gams, Matjaz
    PROCEEDINGS OF THE ITI 2007 29TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2007, : 93 - +
  • [34] Chinese organization entity recognition and association on web pages
    Zhang, Qi
    Hu, Guopin
    Yue, Lihua
    BUSINESS INFORMATION SYSTEMS, 2008, 7 : 12 - +
  • [35] WEB2ONTO: Automatic Ontology Construction Approach from Web pages
    Elmesalmy, Naglaa
    Hadhoud, Mayada
    Fayeka, Magda
    2019 15TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO 2019), 2019, : 175 - 182
  • [36] Related web page retrieval based on semantic concepts and features of-web pages
    Zhang, R. L.
    Xu, H. S.
    Li, Y. F.
    2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES: ITESS 2008, VOL 2, 2008, : 318 - 324
  • [37] Nested Dolls: Towards Unsupervised Clustering of Web Tables
    Khan, Rituparna
    Gubanov, Michael
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 5357 - 5359
  • [38] Sorting Topic Specific Web Pages Based on Ontology Knowledge
    Song, Qiuxia
    Liu, Jin
    Ni, Ming
    Chen, Liang
    Shen, Jialiang
    2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, : 880 - 883
  • [39] Webformer: Pre-training with Web Pages for Information Retrieval
    Guo, Yu
    Ma, Zhengyi
    Mao, Jiaxin
    Qian, Hongjin
    Zhang, Xinyu
    Jiang, Hao
    Cao, Zhao
    Dou, Zhicheng
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1502 - 1512
  • [40] Schema Inference and Data Extraction from Templatized Web Pages
    Krishna, Shinde Santaji
    Dattatraya, Joshi Shashank
    2015 INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING (ICPC), 2015,