An Annotated Corpus and Method for Analysis of Ad-Hoc Structures Embedded in Text

被引:0
|
作者
Yeh, Eric [1 ]
Niekrasz, John [1 ]
Freitag, Dayne [1 ]
Rohwer, Richard [1 ]
机构
[1] SRI Int, 333 Ravenswood Ave, Menlo Pk, CA 94025 USA
来源
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2016年
关键词
table recognition; semistructured data; information extraction; INFORMATION;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
We describe a method for identifying and performing functional analysis of structured regions that are embedded in natural language documents, such as tables or key-value lists. Such regions often encode information according to ad hoc schemas and avail themselves of visual cues in place of natural language grammar, presenting problems for standard information extraction algorithms. Unlike previous work in table extraction, which assumes a relatively noiseless two-dimensional layout, our aim is to accommodate a wide variety of naturally occurring structure types. Our approach has three main parts. First, we collect and annotate a a diverse sample of "naturally" occurring structures from several sources. Second, we use probabilistic text segmentation techniques, featurized by skip bigrams over spatial and token category cues, to automatically identify contiguous regions of structured text that share a common schema. Finally, we identify the records and fields within each structured region using a combination of distributional similarity and sequence alignment methods, guided by minimal supervision in the form of a single annotated record. We evaluate the last two components individually, and conclude with a discussion of further work.
引用
收藏
页码:2063 / 2070
页数:8
相关论文
共 50 条
  • [41] Streaming Video Timing Analysis in Wireless Ad-hoc Networks
    Politis, Anastasios
    Kotini, Isabella
    Manitsaris, Athanasios
    2008 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, VOLS 1-3, 2008, : 1111 - 1116
  • [42] TCP Performance Analysis and Algorithm Improved in Ad-Hoc Network
    Cao Xin
    Liu Dan
    Shi Hongjie
    2015 INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION, BIG DATA AND SMART CITY (ICITBS), 2016, : 817 - 821
  • [43] Social Criticism via Myths and Metaphors: an Ad-hoc Analysis
    Osman, Maizura
    Jalaluddin, Nor Hashimah
    INTERNATIONAL CONFERENCE ON KNOWLEDGE-INNOVATION-EXCELLENCE: SYNERGY IN LANGUAGE RESEARCH AND PRACTICE (2013), 2014, 118 : 265 - 272
  • [44] Wireless ad-hoc networking: Analysis and validation of simulation results
    Hallani, H.
    Shahrestani, S. A.
    Stootman, F. H.
    TENCON 2005 - 2005 IEEE REGION 10 CONFERENCE, VOLS 1-5, 2006, : 1163 - 1168
  • [45] Analysis of Realistic Attack Scenarios in Vehicle Ad-hoc Networks
    Lastinec, Jan
    Keszeli, Mario
    2019 7TH INTERNATIONAL SYMPOSIUM ON DIGITAL FORENSICS AND SECURITY (ISDFS), 2019,
  • [46] Routing and Security Analysis in Vehicular Ad-Hoc Networks (VANETs)
    Deepak
    Kumar, Raj
    Rishi, Rahul
    PROCEEDINGS OF THE FIRST IEEE INTERNATIONAL CONFERENCE ON POWER ELECTRONICS, INTELLIGENT CONTROL AND ENERGY SYSTEMS (ICPEICES 2016), 2016,
  • [47] A Probabilistic Model Checking Analysis of Vehicular Ad-hoc Networks
    Ferreira, Bruno
    Braz, Fernando A. F.
    Loureiro, Antonio A. F.
    Campos, Sergio V. A.
    2015 IEEE 81ST VEHICULAR TECHNOLOGY CONFERENCE (VTC SPRING), 2015,
  • [48] Asymptotic Analysis of the Outage Probability for MIMO Ad-Hoc Networks
    Keshavarz, Hengameh
    Ahmadi-Shokouh, Javad
    2011 IEEE 22ND INTERNATIONAL SYMPOSIUM ON PERSONAL INDOOR AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2011, : 955 - 958
  • [49] Performance Analysis of Routing Protocols In Mobile Ad-hoc Network
    Rana, Gayatree
    Ballav, Bikram
    Pattanayak, Binod Kumar
    2015 14TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (ICIT 2015), 2015, : 65 - 70
  • [50] Reliability analysis of cluster-based ad-hoc networks
    Cook, Jason L.
    Ramirez-Marquez, Jose Emmanuel
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2008, 93 (10) : 1512 - 1522