An Annotated Corpus and Method for Analysis of Ad-Hoc Structures Embedded in Text

被引:0
|
作者
Yeh, Eric [1 ]
Niekrasz, John [1 ]
Freitag, Dayne [1 ]
Rohwer, Richard [1 ]
机构
[1] SRI Int, 333 Ravenswood Ave, Menlo Pk, CA 94025 USA
来源
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2016年
关键词
table recognition; semistructured data; information extraction; INFORMATION;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
We describe a method for identifying and performing functional analysis of structured regions that are embedded in natural language documents, such as tables or key-value lists. Such regions often encode information according to ad hoc schemas and avail themselves of visual cues in place of natural language grammar, presenting problems for standard information extraction algorithms. Unlike previous work in table extraction, which assumes a relatively noiseless two-dimensional layout, our aim is to accommodate a wide variety of naturally occurring structure types. Our approach has three main parts. First, we collect and annotate a a diverse sample of "naturally" occurring structures from several sources. Second, we use probabilistic text segmentation techniques, featurized by skip bigrams over spatial and token category cues, to automatically identify contiguous regions of structured text that share a common schema. Finally, we identify the records and fields within each structured region using a combination of distributional similarity and sequence alignment methods, guided by minimal supervision in the form of a single annotated record. We evaluate the last two components individually, and conclude with a discussion of further work.
引用
收藏
页码:2063 / 2070
页数:8
相关论文
共 50 条
  • [21] Analysis of security approaches for vehicular ad-hoc networks
    Mihaita , Alexandra
    Dobre, Ciprian
    Mocanu, Bogdan
    Pop, Florin
    Cristea, Valentin
    2015 10TH INTERNATIONAL CONFERENCE ON P2P, PARALLEL, GRID, CLOUD AND INTERNET COMPUTING (3PGCIC), 2015, : 304 - 309
  • [22] Analysis to Random Direction Model of Ad-Hoc Networks
    Liu, Yan-tao
    Lu, Ji-hua
    Liu, Heng
    IEICE TRANSACTIONS ON COMMUNICATIONS, 2010, E93B (10) : 2773 - 2776
  • [23] Ad-hoc segmentation pipeline for microarray image analysis
    Battiato, S.
    Di Blasi, G.
    Farinella, G. M.
    Gallo, G.
    Guarnera, G. C.
    IMAGE PROCESSING: ALGORITHMS AND SYSTEMS, NEURAL NETWORKS, AND MACHINE LEARNING, 2006, 6064
  • [24] A geomulticast architecture and analysis model for ad-hoc networks
    An, B
    Dohyeon, K
    NETWORKING 2004: NETWORKING TECHNOLOGIES, SERVICES, AND PROTOCOLS; PERFORMANCE OF COMPUTER AND COMMUNICATION NETWORKS; MOBILE AND WIRELESS COMMUNICATIONS, 2004, 3042 : 1270 - 1275
  • [25] Application of eigenspace analysis techniques to Ad-Hoc networks
    Nagaraj, S
    Bates, S
    Schlegel, C
    AD-HOC, MOBILE, AND WIRELESS NETWORKS, PROCEEDINGS, 2004, 3158 : 300 - 305
  • [26] Performance analysis of ad-hoc networks partitioning on TCP
    Lin, Q
    Chan, KM
    Tan, KS
    Yeo, BS
    VTC2005-SPRING: 2005 IEEE 61ST VEHICULAR TECHNOLOGY CONFERENCE, VOLS 1-5, PROCEEDINGS, 2005, : 2444 - 2448
  • [27] APACHE DRILL: Interactive Ad-Hoc Analysis at Scale
    Hausenblas, Michael
    Nadeau, Jacques
    BIG DATA, 2013, 1 (02) : 100 - 104
  • [28] An analysis of a basic routing algorithm for ad-hoc networks
    Boumerdassi, S
    Renault, É
    Wei, A
    VTC2004-SPRING: 2004 IEEE 59TH VEHICULAR TECHNOLOGY CONFERENCE, VOLS 1-5, PROCEEDINGS, 2004, : 2210 - 2214
  • [29] Design and realization of ad-hoc VoIP with embedded p-SIP server
    Chang, Lin-huang
    Sung, Chun-hui
    Chiu, Shih-yi
    Lin, Yen-wen
    JOURNAL OF SYSTEMS AND SOFTWARE, 2010, 83 (12) : 2536 - 2555
  • [30] Intelligent VoIP system in Ad-hoc network with embedded pseudo SIP server
    Chang, Lin-huang
    Sung, Chun-hui
    Chiu, Shih-yi
    Liaw, Jiun-jian
    AUTONOMIC AND TRUSTED COMPUTING, PROCEEDINGS, 2008, 5060 : 641 - +