Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web

被引:10
|
作者
Dong, Xin Luna [1 ]
Hajishirzi, Hannaneh [2 ]
Lockard, Colin [3 ]
Shiralkar, Prashant [1 ]
机构
[1] Amazon, Seattle, WA 98109 USA
[2] Univ Washington, Allen Inst AI, Seattle, WA USA
[3] Univ Washington, Amazon, Seattle, WA USA
来源
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2020年
关键词
Information extraction; Web extraction; Semi-structured data; Web mining;
D O I
10.1145/3394486.3406468
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How do we surface the large amount of information present in HTML documents on the Web, from news articles to Rotten Tomatoes pages to tables of sports scores? Such information can enable a variety of applications including knowledge base construction, question answering, recommendation, and more. In this tutorial, we present approaches for information extraction (IE) from Web data that can be differentiated along two key dimensions: 1) the diversity in data modality that is leveraged, e.g. text, visual, XML/HTML, and 2) the thrust to develop scalable approaches with zero to limited human supervision.
引用
收藏
页码:3543 / 3544
页数:2
相关论文
共 50 条
  • [21] Consideration of the Word's Neighborhood in GATs for Information Extraction in Semi-structured Documents
    Belhadj, Djedjiga
    Belaid, Yolande
    Belaid, Abdel
    DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II, 2021, 12822 : 854 - 869
  • [22] A survey on semi-structured web data manipulations by non-expert users
    Tekli, Gilbert
    COMPUTER SCIENCE REVIEW, 2021, 40
  • [23] Automating Data Mart Construction from Semi-structured Data Sources
    Scriney, Michael
    McCarthy, Suzanne
    McCarren, Andrew
    Cappellari, Paolo
    Roantree, Mark
    COMPUTER JOURNAL, 2019, 62 (03) : 394 - 413
  • [24] Information Extraction from Semi-structured Resources: A Two-Phase Finite State Transducers Approach
    Pajic, Vesna
    Lazetic, Gordana Pavlovic
    Pajic, Milos
    IMPLEMENTATION AND APPLICATION OF AUTOMATA, 2011, 6807 : 282 - +
  • [25] Research on the Application of Web Information Extraction Based On Semi Structured XML
    Yang, Guo-Jun
    2016 INTERNATIONAL CONFERENCE ON SERVICE SCIENCE, TECHNOLOGY AND ENGINEERING (SSTE 2016), 2016, : 317 - 323
  • [26] Research on Semi-Structured and Unstructured Data Storage and Management Model for Multi-Tenant
    Hu, Xin
    Xu, Yabin
    JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2019, 12 (01) : 49 - 62
  • [27] A method of semi-automated ontology population from multiple semi-structured data sources
    Leshcheva, Irina
    Begler, Alena
    JOURNAL OF INFORMATION SCIENCE, 2022, 48 (02) : 223 - 236
  • [28] A Rule-based Information Extraction System for Human-readable Semi-structured Scientific Documents
    Chen, Gang
    An, Baoran
    Zeng, Sifeng
    PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 75 - 84
  • [29] Constructing social networks from semi-structured chat-log data
    Tavassoli, Sude
    Moessner, Markus
    Zweig, Katharina Anna
    2014 PROCEEDINGS OF THE IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2014), 2014, : 146 - 149
  • [30] Recognition of Data Records in Semi-structured Web-Pages Using Ontology and χ2 Statistical Distribution
    Keshavarzi, Amin
    Rahmani, Amir Masoud
    Mohsenzadeh, Mehran
    Keshavarzi, Reza
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2008, 5139 : 675 - +