Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web

被引:10
|
作者
Dong, Xin Luna [1 ]
Hajishirzi, Hannaneh [2 ]
Lockard, Colin [3 ]
Shiralkar, Prashant [1 ]
机构
[1] Amazon, Seattle, WA 98109 USA
[2] Univ Washington, Allen Inst AI, Seattle, WA USA
[3] Univ Washington, Amazon, Seattle, WA USA
来源
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2020年
关键词
Information extraction; Web extraction; Semi-structured data; Web mining;
D O I
10.1145/3394486.3406468
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How do we surface the large amount of information present in HTML documents on the Web, from news articles to Rotten Tomatoes pages to tables of sports scores? Such information can enable a variety of applications including knowledge base construction, question answering, recommendation, and more. In this tutorial, we present approaches for information extraction (IE) from Web data that can be differentiated along two key dimensions: 1) the diversity in data modality that is leveraged, e.g. text, visual, XML/HTML, and 2) the thrust to develop scalable approaches with zero to limited human supervision.
引用
收藏
页码:3543 / 3544
页数:2
相关论文
共 50 条
  • [1] Interactive Data Extraction from Semi-Structured Text
    Broman, Per
    Thalheim, Bernhard
    INFORMATION MODELLING AND KNOWLEDGE BASES XXIII, 2012, 237 : 1 - 19
  • [2] Information extraction from Web pages using semi-structured data alignment
    Kuboyama, Tetsuji
    Miyahara, Tetsuhiro
    Hirokawa, Sachio
    Itou, Eisuke
    WMSCI 2005: 9th World Multi-Conference on Systemics, Cybernetics and Informatics, Vol 1, 2005, : 42 - 47
  • [3] Automatic information extraction from semi-structured Web pages by pattern discovery
    Chang, CH
    Hsu, CN
    Lui, SC
    DECISION SUPPORT SYSTEMS, 2003, 35 (01) : 129 - 147
  • [4] Learning information extraction rules for semi-structured and free text
    Soderland, S
    MACHINE LEARNING, 1999, 34 (1-3) : 233 - 272
  • [5] Learning Information Extraction Rules for Semi-Structured and Free Text
    Stephen Soderland
    Machine Learning, 1999, 34 : 233 - 272
  • [6] Information Extraction of Strategic Activities based on Semi-structured Text
    Ma, Xubu
    Guo, Ju-E
    Ma, Xubu
    2014 SEVENTH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL SCIENCES AND OPTIMIZATION (CSO), 2014, : 579 - 583
  • [7] Generating finite-state transducers for semi-structured data extraction from the Web
    Hsu, CN
    Dung, MT
    INFORMATION SYSTEMS, 1998, 23 (08) : 521 - 538
  • [8] Business information extraction from semi-structured webpages
    Sung, NH
    Chang, YS
    EXPERT SYSTEMS WITH APPLICATIONS, 2004, 26 (04) : 575 - 582
  • [9] Scalable Attribute-Value Extraction from Semi-Structured Text
    Wong, Yuk Wah
    Widdows, Dominic
    Lokovic, Tom
    Nigam, Kamal
    2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 302 - 307
  • [10] Information extraction from semi-structured data in the protein data bank by induction of a data description pattern
    Kawaguchi, Y
    Kaneta, Y
    Ohkawa, T
    Nakamura, H
    Ito, N
    METMBS'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MATHEMATICS AND ENGINEERING TECHNIQUES IN MEDICINE AND BIOLOGICAL SCIENCES, 2003, : 94 - 99