Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web

被引：10

作者：

Dong, Xin Luna ^{[1
]}

Hajishirzi, Hannaneh ^{[2
]}

Lockard, Colin ^{[3
]}

Shiralkar, Prashant ^{[1
]}

机构：

[1] Amazon, Seattle, WA 98109 USA

[2] Univ Washington, Allen Inst AI, Seattle, WA USA

[3] Univ Washington, Amazon, Seattle, WA USA

来源：

KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2020年

关键词：

Information extraction; Web extraction; Semi-structured data; Web mining;

D O I：

10.1145/3394486.3406468

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

How do we surface the large amount of information present in HTML documents on the Web, from news articles to Rotten Tomatoes pages to tables of sports scores? Such information can enable a variety of applications including knowledge base construction, question answering, recommendation, and more. In this tutorial, we present approaches for information extraction (IE) from Web data that can be differentiated along two key dimensions: 1) the diversity in data modality that is leveraged, e.g. text, visual, XML/HTML, and 2) the thrust to develop scalable approaches with zero to limited human supervision.

引用

页码：3543 / 3544

页数：2

共 50 条

[41] Information Extraction System for Transforming Unstructured Text Data in Fire Reports into Structured Forms: A Polish Case Study
Marcin Michał Mirończuk
Fire Technology, 2020, 56 : 545 - 581
[42] Map as a Service: A Framework for Visualising and Maximising Information Return from Multi-Modal Wireless Sensor Networks
Hammoudeh, Mohammad
Newman, Robert
Dennett, Christopher
Mount, Sarah
Aldabbas, Omar
SENSORS, 2015, 15 (09) : 22970 - 23003
[43] Associative Feature Information Extraction Using Text Mining from Health Big Data
Joo-Chang Kim
Kyungyong Chung
Wireless Personal Communications, 2019, 105 : 691 - 707
[44] Information Extraction from Web Sources Based on Multi-aspect Content Analysis
Milicka, Martin
Burget, Radek
SEMANTIC WEB EVALUATION CHALLENGES, 2015, 548 : 81 - 92
[45] Associative Feature Information Extraction Using Text Mining from Health Big Data
Kim, Joo-Chang
Chung, Kyungyong
WIRELESS PERSONAL COMMUNICATIONS, 2019, 105 (02) : 691 - 707
[46] Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web
Michelson, Matthew
Knoblock, Craig A.
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2007, 10 (3-4) : 211 - 226
[47] Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web
Matthew Michelson
Craig A. Knoblock
International Journal of Document Analysis and Recognition (IJDAR), 2007, 10 : 211 - 226
[48] Intelligent agent for hurricane emergency identification and text information extraction from streaming social media big data
Huang, Jingwei
Khallouli, Wael
Rabadi, Ghaith
Seck, Mamadou
INTERNATIONAL JOURNAL OF CRITICAL INFRASTRUCTURES, 2023, 19 (02) : 124 - 139
[49] STAVIES: A system for information extraction from unknown Web data sources through automatic Web wrapper generation using clustering techniques
Papadakis, NK
Skoutas, D
Raftopoulos, K
Varvarigou, TA
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (12) : 1638 - 1652
[50] CASIA-KB: A Multi-source Chinese Semantic Knowledge Base Built from Structured and Unstructured Web Data
Zeng, Yi
Wang, Dongsheng
Zhang, Tielin
Wang, Hao
Hao, Hongwei
Xu, Bo
SEMANTIC TECHNOLOGY, 2014, 8388 : 75 - 88

← 1 2 3 4 5 →