Integration of HTML']HTML Tables in Web Pages

被引:0
作者
Akbar, Memen [1 ]
Azizah, Fazat Nur [2 ]
Saptawati, G. A. Putri [2 ]
机构
[1] Inst Teknol Bandung, Sch Elect Engn & Informat, Informat Study Program, Bandung, Indonesia
[2] Inst Teknol Bandung, Sch Elect Engn & Informat, Data & Software Engn Res Grp, Bandung, Indonesia
来源
2015 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE) | 2015年
关键词
data integration; !text type='HTML']HTML[!/text] table; ontology; table integration; web page;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The growing number of web pages on the internet introduces a need to combine and integrate information from HTML tables of different web pages that contain similar information into a single web page, especially information from the same domain of interest. This paper presents an approach of HTML table integration by combining several existing methods that are proved to solve different issues in the integration processes. The integration of HTML table consists of three phases: (1) extraction of the structure of the tables; (2) integration of the tables' schema; (3) integration of the data values. To solve the conflicts in semantics and naming in the tables schema, domain-ontology is used. To improve quality of integration of data values in the tables, the vector space model is used to check the duplications of data values. As the integration result, a single HTML table is obtained. The approach is implemented on an engine built using Phyton. Results of the experiment shows that the engine can successfully integrate two HTML tables into single table.
引用
收藏
页码:132 / 137
页数:6
相关论文
共 50 条
  • [41] Hierarchical Classification of Web Pages Using Support Vector Machine
    Wang, Yi
    Gong, Zhiguo
    [J]. DIGITAL LIBRARIES: UNIVERSAL AND UBIQUITOUS ACCESS TO INFORMATION, PROCEEDINGS, 2008, 5362 : 12 - 21
  • [42] A novel content and style based measurement of web pages distance
    Zhang, QP
    Liang, M
    Lai, LL
    [J]. PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 429 - 435
  • [43] Mining key information of web pages: A method and its application
    Wang, Chao
    Lu, Jie
    Zhang, Guangquan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (02) : 425 - 433
  • [44] PCI: Plants Classification & Identification Classification of Web pages for Constructing Plants Web-Directory
    Khalilian, Madjid
    Abolhassani, Hassan
    Alijamaat, Ali
    Boroujeni, Farsad Zamani
    [J]. PROCEEDINGS OF THE 2009 SIXTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, VOLS 1-3, 2009, : 1373 - +
  • [45] EXTRACTING THE SEMANTIC CONTENT OF WEB PAGES VIA REPEATED STRUCTURES
    He, Zheng
    Luo, Hangzai
    Fan, Jianping
    Liu, Xiao
    [J]. ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2013,
  • [46] Automatic classification of Web pages based on the concept of domain ontology
    Song, MH
    Lim, SY
    Kang, DJ
    Lee, SJ
    [J]. 12th Asia-Pacific Software Engineering Conference, Proceedings, 2005, : 645 - 651
  • [47] Optimal and efficient integration of heterogeneous summary tables in a distributed database
    Scotney, B
    McClean, S
    Rodgers, M
    [J]. DATA & KNOWLEDGE ENGINEERING, 1999, 29 (03) : 337 - 350
  • [48] Knowledge Extraction from Web Pages with an Auto-Adaptive System
    Havas, Camille
    Larue, Othalia
    Camus, Mickael
    [J]. COMPUTATIONAL ENGINEERING IN SYSTEMS APPLICATIONS, 2008, : 126 - 131
  • [49] Determining the titles of Web pages using anchor text and link analysis
    Jeong, Ok-Ran
    Oh, Jehwan
    Kim, Dong-Jin
    Lyu, Heetae
    Kim, Won
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (09) : 4322 - 4329
  • [50] A NEW TECHNIQUE FOR IDENTIFICATION OF RELEVANT WEB PAGES IN INFORMATIONAL QUERIES RESULTS
    Clarizia, Fabio
    Greco, Luca
    Napoletano, Paolo
    [J]. ICEIS 2010: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 3: INFORMATION SYSTEMS ANALYSIS AND SPECIFICATION, 2010, : 70 - 79