Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web

被引:10
|
作者
Dong, Xin Luna [1 ]
Hajishirzi, Hannaneh [2 ]
Lockard, Colin [3 ]
Shiralkar, Prashant [1 ]
机构
[1] Amazon, Seattle, WA 98109 USA
[2] Univ Washington, Allen Inst AI, Seattle, WA USA
[3] Univ Washington, Amazon, Seattle, WA USA
来源
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2020年
关键词
Information extraction; Web extraction; Semi-structured data; Web mining;
D O I
10.1145/3394486.3406468
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How do we surface the large amount of information present in HTML documents on the Web, from news articles to Rotten Tomatoes pages to tables of sports scores? Such information can enable a variety of applications including knowledge base construction, question answering, recommendation, and more. In this tutorial, we present approaches for information extraction (IE) from Web data that can be differentiated along two key dimensions: 1) the diversity in data modality that is leveraged, e.g. text, visual, XML/HTML, and 2) the thrust to develop scalable approaches with zero to limited human supervision.
引用
收藏
页码:3543 / 3544
页数:2
相关论文
共 50 条
  • [41] Information Extraction System for Transforming Unstructured Text Data in Fire Reports into Structured Forms: A Polish Case Study
    Marcin Michał Mirończuk
    Fire Technology, 2020, 56 : 545 - 581
  • [42] Map as a Service: A Framework for Visualising and Maximising Information Return from Multi-Modal Wireless Sensor Networks
    Hammoudeh, Mohammad
    Newman, Robert
    Dennett, Christopher
    Mount, Sarah
    Aldabbas, Omar
    SENSORS, 2015, 15 (09) : 22970 - 23003
  • [43] Associative Feature Information Extraction Using Text Mining from Health Big Data
    Joo-Chang Kim
    Kyungyong Chung
    Wireless Personal Communications, 2019, 105 : 691 - 707
  • [44] Information Extraction from Web Sources Based on Multi-aspect Content Analysis
    Milicka, Martin
    Burget, Radek
    SEMANTIC WEB EVALUATION CHALLENGES, 2015, 548 : 81 - 92
  • [45] Associative Feature Information Extraction Using Text Mining from Health Big Data
    Kim, Joo-Chang
    Chung, Kyungyong
    WIRELESS PERSONAL COMMUNICATIONS, 2019, 105 (02) : 691 - 707
  • [46] Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web
    Michelson, Matthew
    Knoblock, Craig A.
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2007, 10 (3-4) : 211 - 226
  • [47] Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web
    Matthew Michelson
    Craig A. Knoblock
    International Journal of Document Analysis and Recognition (IJDAR), 2007, 10 : 211 - 226
  • [48] Intelligent agent for hurricane emergency identification and text information extraction from streaming social media big data
    Huang, Jingwei
    Khallouli, Wael
    Rabadi, Ghaith
    Seck, Mamadou
    INTERNATIONAL JOURNAL OF CRITICAL INFRASTRUCTURES, 2023, 19 (02) : 124 - 139
  • [49] STAVIES: A system for information extraction from unknown Web data sources through automatic Web wrapper generation using clustering techniques
    Papadakis, NK
    Skoutas, D
    Raftopoulos, K
    Varvarigou, TA
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (12) : 1638 - 1652
  • [50] CASIA-KB: A Multi-source Chinese Semantic Knowledge Base Built from Structured and Unstructured Web Data
    Zeng, Yi
    Wang, Dongsheng
    Zhang, Tielin
    Wang, Hao
    Hao, Hongwei
    Xu, Bo
    SEMANTIC TECHNOLOGY, 2014, 8388 : 75 - 88