Integration of HTML']HTML Tables in Web Pages

被引:0
作者
Akbar, Memen [1 ]
Azizah, Fazat Nur [2 ]
Saptawati, G. A. Putri [2 ]
机构
[1] Inst Teknol Bandung, Sch Elect Engn & Informat, Informat Study Program, Bandung, Indonesia
[2] Inst Teknol Bandung, Sch Elect Engn & Informat, Data & Software Engn Res Grp, Bandung, Indonesia
来源
2015 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE) | 2015年
关键词
data integration; !text type='HTML']HTML[!/text] table; ontology; table integration; web page;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The growing number of web pages on the internet introduces a need to combine and integrate information from HTML tables of different web pages that contain similar information into a single web page, especially information from the same domain of interest. This paper presents an approach of HTML table integration by combining several existing methods that are proved to solve different issues in the integration processes. The integration of HTML table consists of three phases: (1) extraction of the structure of the tables; (2) integration of the tables' schema; (3) integration of the data values. To solve the conflicts in semantics and naming in the tables schema, domain-ontology is used. To improve quality of integration of data values in the tables, the vector space model is used to check the duplications of data values. As the integration result, a single HTML table is obtained. The approach is implemented on an engine built using Phyton. Results of the experiment shows that the engine can successfully integrate two HTML tables into single table.
引用
收藏
页码:132 / 137
页数:6
相关论文
共 50 条
  • [21] Flexible reverse engineering of web pages with VAQUISTA
    Vanderdonckt, J
    Bouillon, L
    Souchon, N
    EIGHTH WORKING CONFERENCE ON REVERSE ENGINEERING, PROCEEDINGS, 2001, : 241 - 248
  • [22] Model of determination of coverings with web pages for a website
    Popescu, Doru Anastasiu
    Popescu, Ion Alexandra
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON VIRTUAL LEARNING, 2015, : 279 - 283
  • [23] Discovering Knowledge from Conference Web Pages
    You, Yue
    Wang, Peng
    Zhang, Xiang
    2011 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2011), 2011, : 173 - 178
  • [24] Classification of Web Pages by Automatically Generated Categories
    Khalilian, Madjid
    Esmaili, Kyumars Sheykh
    Abolhassani, Hassan
    INNOVATIONS AND ADVANCED TECHNIQUES IN SYSTEMS, COMPUTING SCIENCES AND SOFTWARE ENGINEERING, 2008, : 319 - +
  • [25] How Useful are Orthopedic Surgery Residency Web Pages?
    Oladeji, Lasun O.
    Yu, Jonathan C.
    Oladeji, Afolayan K.
    Ponce, Brent A.
    JOURNAL OF SURGICAL EDUCATION, 2015, 72 (06) : 1185 - 1189
  • [26] Using an integrated ontology database to categorize web pages
    Bai, Rujiang
    Wang, Xiaoyue
    JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2012, 35 (05) : 509 - 514
  • [27] Structrued and semantic data extraction from Web pages
    Gan, Y
    Zhang, SZ
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 2930 - 2935
  • [28] Using an Integrated Ontology Database to Categorize Web Pages
    Bai, Rujiang
    Wang, Xiaoyue
    Liao, Junhua
    ADVANCES IN COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, PROCEEDINGS, 2010, 6059 : 300 - 309
  • [29] Evaluating Web pages for small businesses: A preliminary study
    Xu, H
    IOLS 2000: INTEGRATED ONLINE LIBRARY SYSTEMS, PROCEEDINGS, 2000, : 181 - 191
  • [30] Context-Aware Summary Generation for Web Pages
    Oleshchuk, Vladimir
    Klyuev, Vitaly
    2009 IEEE INTERNATIONAL WORKSHOP ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS: TECHNOLOGY AND APPLICATIONS, 2009, : 561 - +