Unsupervised learning of tree alignment models for information extraction

被引:0
|
作者
Zigoris, Philip [1 ]
Eads, Damian [1 ]
Zhang, Yi [1 ]
机构
[1] Univ Calif Santa Cruz, Dept Comp Sci, 1156 High St, Santa Cruz, CA 95064 USA
来源
ICDM 2006: SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, WORKSHOPS | 2006年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table- a data structure that better lends itself to high-level data mining and information exploitation. Our algorithm effectively combines tree and string alignment algorithms, as well as domain-specific feature extraction to match semantically related data across search results. The applications of our approach are vast and include hidden web crawling, semantic tagging, and federated search. We build on earlier research on the use of tree alignment for information extraction. In contrast to previous approaches that rely on hand tuned parameters, our algorithm makes use of a variant of Support Vector Machines (SVMs) to learn a. parameterized, site-independent tree alignment model. This model can then be used to deduce common structural and textual elements of a set of HTML parse trees. We report some preliminary results of our system's performance on data from websites with a variety of different layouts.
引用
收藏
页码:45 / +
页数:2
相关论文
共 50 条
  • [1] Supervising Unsupervised Open Information Extraction Models
    Roy, Arpita
    Park, Youngja
    Lee, Taesung
    Pan, Shimei
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 728 - 737
  • [2] Unsupervised Sub-tree Alignment for Tree-to-Tree Translation
    Xiao, Tong
    Zhu, Jingbo
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 48 : 733 - 782
  • [4] Web Data Extraction Based On Visual Information and Partial Tree Alignment
    Fan, Siwu
    Wang, Xinjun
    Dong, Yongquan
    2014 11TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA), 2014, : 18 - 23
  • [5] Evaluation of Unsupervised Information Extraction
    Wang, Wei
    Besancon, Romaric
    Ferret, Olivier
    Grau, Brigitte
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 552 - 558
  • [6] Unsupervised Learning Enables Extraction of Tactile Information From Text Database
    Nagatomo, Tatsuho
    Hiraki, Takefumi
    Ishizuka, Hiroki
    Miki, Norihisa
    IEEE ACCESS, 2023, 11 (101155-101166): : 101155 - 101166
  • [7] Learning as the unsupervised alignment of conceptual systems
    Roads, Brett D.
    Love, Bradley C.
    NATURE MACHINE INTELLIGENCE, 2020, 2 (01) : 76 - 82
  • [8] Kernel Alignment for Unsupervised Transfer Learning
    Redko, Ievgen
    Bennani, Younes
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 525 - 530
  • [9] Learning as the unsupervised alignment of conceptual systems
    Brett D. Roads
    Bradley C. Love
    Nature Machine Intelligence, 2020, 2 : 76 - 82
  • [10] Learning (k,l)-contextual tree languages for information extraction
    Raeymaekers, S
    Bruynooghe, M
    Van den Bussche, J
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 305 - 316