A class of neural-network-based transducers for web information extraction

被引:13
作者
Sleiman, Hassan A. [1 ]
Corchuelo, Rafael [1 ]
机构
[1] Univ Seville, ETSI Informat, E-41012 Seville, Spain
关键词
web wrappers; web information extraction; neural networks; finite automata; machine learning; supervised method; WRAPPER INDUCTION;
D O I
10.1016/j.neucom.2013.05.057
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Web is a huge and still growing information repository that has attracted the attention of many companies. Many such companies rely on information extractors to integrate information that is buried into semi-structured web documents into automatic business processes. Many information extractors build on extraction rules, which can be handcrafted or learned using supervised or unsupervised techniques. The literature provides a variety of techniques to learn information extraction rules that build on ad hoc machine learning techniques. In this paper, we propose a hybrid approach that explores the use of standard machine-learning techniques to extract web information. We have specifically explored using neural networks; our results show that our proposal outperforms three state-of-the-art techniques in the literature, which opens up quite a new approach to information extraction. (c) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:61 / 68
页数:8
相关论文
共 30 条
  • [21] Building intelligent Web applications using lightweight wrappers
    Sahuguet, A
    Azavant, F
    [J]. DATA & KNOWLEDGE ENGINEERING, 2001, 36 (03) : 283 - 316
  • [22] Information Extraction
    Sarawagi, Sunita
    [J]. FOUNDATIONS AND TRENDS IN DATABASES, 2007, 1 (03): : 261 - 377
  • [23] Simon K., 2005, CIKM 05, P381, DOI 10.1145/1099554.1099672
  • [24] Sleiman H. A., 2011, Proceedings of the 2011 11th International Conference on Intelligent Systems Design and Applications (ISDA), P18, DOI 10.1109/ISDA.2011.6121624
  • [25] A Survey on Region Extractors from Web Documents
    Sleiman, Hassan A.
    Corchuelo, Rafael
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (09) : 1960 - 1981
  • [26] TEX: An efficient and effective unsupervised Web information extractor
    Sleiman, Hassan A.
    Corchuelo, Rafael
    [J]. KNOWLEDGE-BASED SYSTEMS, 2013, 39 : 109 - 123
  • [27] Sleiman Hassan A., 2012, WISE, P631
  • [28] Learning information extraction rules for semi-structured and free text
    Soderland, S
    [J]. MACHINE LEARNING, 1999, 34 (1-3) : 233 - 272
  • [29] Automatic hidden-web table interpretation, conceptualization, and semantic annotation
    Tao, Cui
    Embley, David W.
    [J]. DATA & KNOWLEDGE ENGINEERING, 2009, 68 (07) : 683 - 703
  • [30] Adaptive information extraction
    Turmo, Jordi
    Ageno, Alicia
    Catala, Neus
    [J]. ACM COMPUTING SURVEYS, 2006, 38 (02)