Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web

被引:10
|
作者
Dong, Xin Luna [1 ]
Hajishirzi, Hannaneh [2 ]
Lockard, Colin [3 ]
Shiralkar, Prashant [1 ]
机构
[1] Amazon, Seattle, WA 98109 USA
[2] Univ Washington, Allen Inst AI, Seattle, WA USA
[3] Univ Washington, Amazon, Seattle, WA USA
来源
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2020年
关键词
Information extraction; Web extraction; Semi-structured data; Web mining;
D O I
10.1145/3394486.3406468
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How do we surface the large amount of information present in HTML documents on the Web, from news articles to Rotten Tomatoes pages to tables of sports scores? Such information can enable a variety of applications including knowledge base construction, question answering, recommendation, and more. In this tutorial, we present approaches for information extraction (IE) from Web data that can be differentiated along two key dimensions: 1) the diversity in data modality that is leveraged, e.g. text, visual, XML/HTML, and 2) the thrust to develop scalable approaches with zero to limited human supervision.
引用
收藏
页码:3543 / 3544
页数:2
相关论文
共 50 条
  • [31] Multi-dimensional Index over a Key-Value Store for Semi-structured Data
    Gao, Xin
    Qi, Yong
    Hou, Di
    BIG SCIENTIFIC DATA MANAGEMENT, 2019, 11473 : 165 - 175
  • [32] Learning knowledge bases for information extraction from multiple text based web sites
    Gao, XY
    Zhang, MJ
    IEEE/WIC INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2003, : 119 - 125
  • [33] Structuring Semi-structured Data from Building Inspection Reports Using a Large Language Model
    Svennberg, Kaisa
    Ekman, Jan
    MULTIPHYSICS AND MULTISCALE BUILDING PHYSICS, IBPC 2024, VOL 3, 2025, 554 : 508 - 513
  • [34] Structured data extraction from the web based on partial tree alignment
    Zhai, Yanhong
    Liu, Bing
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (12) : 1614 - 1628
  • [35] A Semi-Supervised Approach for Temporal Information Extraction from Clinical Text
    Moharasan, Gandhimathi
    Tu Bao Ho
    2016 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING & COMMUNICATION TECHNOLOGIES, RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2016, : 7 - 12
  • [36] Earthquake Information Extraction and Comparison from Different Sources Based on Web Text
    Han, Xuehua
    Wang, Juanle
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2019, 8 (06)
  • [37] From One Tree to a Forest: a Unified Solution for Structured Web Data Extraction
    Hao, Qiang
    Cai, Rui
    Pang, Yanwei
    Zhang, Lei
    PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), 2011, : 775 - 784
  • [38] A novel text mining approach for scholar information extraction from web content in Chinese
    Xie, Xia
    Fu, Yu
    Jin, Hai
    Zhao, Yaliang
    Cao, Wenzhi
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 111 : 859 - 872
  • [39] Mining information from sentences through Semantic Web data and Information Extraction tasks
    Martinez-Rodriguez, Jose L.
    Lopez-Arevalo, Ivan
    Rios-Alvarado, Ana B.
    JOURNAL OF INFORMATION SCIENCE, 2022, 48 (01) : 3 - 20
  • [40] Information Extraction System for Transforming Unstructured Text Data in Fire Reports into Structured Forms: A Polish Case Study
    Mironczuk, Marcin Michal
    FIRE TECHNOLOGY, 2020, 56 (02) : 545 - 581