Web Data Extraction from Retailers' Site using Semantic Density and Case Based Reasoning

被引:0
作者
Umamageswari, B. [1 ]
Kalpana, R. [2 ]
机构
[1] New Prince Shri Bhavani Coll Engn & Technol, Dept IT, Chennai, Tamil Nadu, India
[2] Pondicherry Engn Coll, Dept CSE, Pillaichavadi, Puducherry, India
来源
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATICS AND ANALYTICS (ICIA' 16) | 2016年
关键词
Information retrieval; wrappers; semantic density; case based reasoning;
D O I
10.1145/2980258.2980265
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Deep web or hidden web acts as a major source of information for many data analytical and data mining applications like product intelligence, competitive intelligence, online market intelligence etc. The web pages containing data from deep web are dynamic pages which are generated using server-side templates as a result of submission of query search form. They are not indexed to search engines and therefore cannot be retrieved using traditional keyword search. The web pages are designed to improve user experience but it makes automated processing, a unwieldy task. This makes WDE an on-going research area facing many challenges. Many solutions have been proposed over the past decades for WDE ranging from hand crafted rules to automatic template deduction and extraction. This paper explores a new framework for WDE which uses semantic density for data rich region detection and case based reasoning which helps in adapting the system to learn new templates and carry out extraction process even for unseen newly structured web pages.
引用
收藏
页数:5
相关论文
共 21 条
  • [1] [Anonymous], 1997, APPL CASE BASED REAS
  • [2] [Anonymous], 1998, P COLING ACL WORK US
  • [3] Arasu A., 2003, P 2003 ACM SIGMOD IN, P337, DOI DOI 10.1145/872757.872799
  • [4] Chang C.-H., 2001, P 10 INT C WORLD WID, P223
  • [5] Olera: Semisupervised web-data extraction with visual support
    Chang, CH
    Kuo, SC
    [J]. IEEE INTELLIGENT SYSTEMS, 2004, 19 (06) : 56 - 64
  • [6] Grammars have exceptions
    Crescenzi, V
    Mecca, G
    [J]. INFORMATION SYSTEMS, 1998, 23 (08) : 539 - 565
  • [7] Crescenzi V., 2002, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, SIGMOD '02, P624
  • [8] Conceptual-model-based data extraction from multiple-record Web pages
    Embley, DW
    Campbell, DM
    Jiang, YS
    Liddle, SW
    Lonsdale, DW
    Ng, YK
    Smith, RD
    [J]. DATA & KNOWLEDGE ENGINEERING, 1999, 31 (03) : 227 - 251
  • [9] Unsupervised named-entity extraction from the Web: An experimental study
    Etzioni, O
    Cafarella, M
    Downey, D
    Popescu, AM
    Shaked, T
    Soderland, S
    Weld, DS
    Yates, A
    [J]. ARTIFICIAL INTELLIGENCE, 2005, 165 (01) : 91 - 134
  • [10] Grigalis T., SPRINGER BOOK CHAPTE, V7387, P435