Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium

被引:0
|
作者
Maeda, Kazuaki [1 ]
Lee, Haejoong [1 ]
Medero, Shawn [1 ]
Medero, Julie [1 ]
Parker, Robert [1 ]
Strassel, Stephanie [1 ]
机构
[1] Univ Penn, Linguist Data Consortium, Philadelphia, PA 19104 USA
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
The Linguistic Data Consortium (LDC) creates a variety of linguistic resources - data, annotations, tools, standards and best practices - for many sponsored projects. The programming staff at LDC has created the tools and technical infrastructures to support the data creation efforts for these projects, creating tools and technical infrastructures for all aspects of data creation projects: data scouting, data collection, data selection, annotation, search, data tracking and work flow management. This paper introduces a number of samples of LDC programming staff's work, with particular focus on the recent additions and updates to the suite of software tools developed by LDC. Tools introduced include the GScout Web Data Scouting Tool, LDC Data Selection Toolkit, ACK - Annotation Collection Kit, XTrans Transcription and Speech Annotation Tool, GALE Distillation Toolkit, and the GALE MT Post Editing Work flow Management System.
引用
收藏
页码:3052 / 3056
页数:5
相关论文
共 50 条
  • [21] MANAGING DATA FROM LARGE-SCALE CONTINUOUS MONITORING PROJECTS
    MCMORRIS, RL
    GRAVLEY, RJ
    CHEMICAL ENGINEERING PROGRESS, 1993, 89 (03) : 111 - 115
  • [22] The ENIGMA Consortium: large-scale collaborative analyses of neuroimaging and genetic data
    Thompson, Paul M.
    Stein, Jason L.
    Medland, Sarah E.
    Hibar, Derrek P.
    Vasquez, Alejandro Arias
    Renteria, Miguel E.
    Toro, Roberto
    Jahanshad, Neda
    Schumann, Gunter
    Franke, Barbara
    Wright, Margaret J.
    Martin, Nicholas G.
    Agartz, Ingrid
    Alda, Martin
    Alhusaini, Saud
    Almasy, Laura
    Almeida, Jorge
    Alpert, Kathryn
    Andreasen, Nancy C.
    Andreassen, Ole A.
    Apostolova, Liana G.
    Appel, Katja
    Armstrong, Nicola J.
    Aribisala, Benjamin
    Bastin, Mark E.
    Bauer, Michael
    Bearden, Carrie E.
    Bergmann, Orjan
    Binder, Elisabeth B.
    Blangero, John
    Bockholt, Henry J.
    Boen, Erlend
    Bois, Catherine
    Boomsma, Dorret I.
    Booth, Tom
    Bowman, Ian J.
    Bralten, Janita
    Brouwer, Rachel M.
    Brunner, Han G.
    Brohawn, David G.
    Buckner, Randy L.
    Buitelaar, Jan
    Bulayeva, Kazima
    Bustillo, Juan R.
    Calhoun, Vince D.
    Cannon, Dara M.
    Cantor, Rita M.
    Carless, Melanie A.
    Caseras, Xavier
    Cavalleri, Gianpiero L.
    BRAIN IMAGING AND BEHAVIOR, 2014, 8 (02) : 153 - 182
  • [23] The ENIGMA Consortium: large-scale collaborative analyses of neuroimaging and genetic data
    Paul M. Thompson
    Jason L. Stein
    Sarah E. Medland
    Derrek P. Hibar
    Alejandro Arias Vasquez
    Miguel E. Renteria
    Roberto Toro
    Neda Jahanshad
    Gunter Schumann
    Barbara Franke
    Margaret J. Wright
    Nicholas G. Martin
    Ingrid Agartz
    Martin Alda
    Saud Alhusaini
    Laura Almasy
    Jorge Almeida
    Kathryn Alpert
    Nancy C. Andreasen
    Ole A. Andreassen
    Liana G. Apostolova
    Katja Appel
    Nicola J. Armstrong
    Benjamin Aribisala
    Mark E. Bastin
    Michael Bauer
    Carrie E. Bearden
    Ørjan Bergmann
    Elisabeth B. Binder
    John Blangero
    Henry J. Bockholt
    Erlend Bøen
    Catherine Bois
    Dorret I. Boomsma
    Tom Booth
    Ian J. Bowman
    Janita Bralten
    Rachel M. Brouwer
    Han G. Brunner
    David G. Brohawn
    Randy L. Buckner
    Jan Buitelaar
    Kazima Bulayeva
    Juan R. Bustillo
    Vince D. Calhoun
    Dara M. Cannon
    Rita M. Cantor
    Melanie A. Carless
    Xavier Caseras
    Gianpiero L. Cavalleri
    Brain Imaging and Behavior, 2014, 8 : 153 - 182
  • [24] The utility of rapid application development in large-scale, complex projects
    Berger, Hilary
    Beynon-Davies, Paul
    INFORMATION SYSTEMS JOURNAL, 2009, 19 (06) : 549 - 570
  • [25] A Large-Scale Corpus for Conversation Disentanglement
    Kummerfeld, Jonathan K.
    Athreya, Vignesh
    Patel, Siva Sankalp
    Gouravajhala, Sai R.
    Gunasekara, Chulaka
    Polymenakos, Lazaros
    Peper, Joseph J.
    Ganhotra, Jatin
    Lasecki, Walter S.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3846 - 3856
  • [26] A Corpus for Large-Scale Phonetic Typology
    Salesky, Elizabeth
    Chodroff, Eleanor
    Pimentel, Tiago
    Wiesner, Matthew
    Cotterell, Ryan
    Black, Alan W.
    Eisner, Jason
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 4526 - 4546
  • [27] Socioeconomic value creation and the role of local participation in large-scale mining projects in the Arctic
    Kadenic, Maja Due
    EXTRACTIVE INDUSTRIES AND SOCIETY-AN INTERNATIONAL JOURNAL, 2015, 2 (03): : 562 - 571
  • [28] VisRepo: A Visual Retrieval Tool for Large-Scale Open-Source Projects
    Yue, Xiaoqi
    Liu, Chao
    Zhang, Neng
    Hu, Haibo
    Zhang, Xiaohong
    PROCEEDINGS OF THE 15TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2024, 2024, : 499 - 502
  • [29] Corpus creation and linguistic data mining: methods, models, tools
    Mirovsky, Jiri
    SLOVO A SLOVESNOST, 2017, 78 (04): : 349 - 352
  • [30] Large-scale annotation of proteins with labelling methods
    Casadio, R.
    Martelli, P. L.
    Savojardo, C.
    Fariselli, P.
    NUOVO CIMENTO C-COLLOQUIA AND COMMUNICATIONS IN PHYSICS, 2012, 35 (05): : 7 - 25