Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium

被引：0

作者：

Maeda, Kazuaki ^{[1
]}

Lee, Haejoong ^{[1
]}

Medero, Shawn ^{[1
]}

Medero, Julie ^{[1
]}

Parker, Robert ^{[1
]}

Strassel, Stephanie ^{[1
]}

机构：

[1] Univ Penn, Linguist Data Consortium, Philadelphia, PA 19104 USA

来源：

SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008 | 2008年

关键词：

D O I：

暂无

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

The Linguistic Data Consortium (LDC) creates a variety of linguistic resources - data, annotations, tools, standards and best practices - for many sponsored projects. The programming staff at LDC has created the tools and technical infrastructures to support the data creation efforts for these projects, creating tools and technical infrastructures for all aspects of data creation projects: data scouting, data collection, data selection, annotation, search, data tracking and work flow management. This paper introduces a number of samples of LDC programming staff's work, with particular focus on the recent additions and updates to the suite of software tools developed by LDC. Tools introduced include the GScout Web Data Scouting Tool, LDC Data Selection Toolkit, ACK - Annotation Collection Kit, XTrans Transcription and Speech Annotation Tool, GALE Distillation Toolkit, and the GALE MT Post Editing Work flow Management System.

引用

页码：3052 / 3056

页数：5

共 50 条

[21] MANAGING DATA FROM LARGE-SCALE CONTINUOUS MONITORING PROJECTS
MCMORRIS, RL
GRAVLEY, RJ
CHEMICAL ENGINEERING PROGRESS, 1993, 89 (03) : 111 - 115
[22] The ENIGMA Consortium: large-scale collaborative analyses of neuroimaging and genetic data
Thompson, Paul M.
Stein, Jason L.
Medland, Sarah E.
Hibar, Derrek P.
Vasquez, Alejandro Arias
Renteria, Miguel E.
Toro, Roberto
Jahanshad, Neda
Schumann, Gunter
Franke, Barbara
Wright, Margaret J.
Martin, Nicholas G.
Agartz, Ingrid
Alda, Martin
Alhusaini, Saud
Almasy, Laura
Almeida, Jorge
Alpert, Kathryn
Andreasen, Nancy C.
Andreassen, Ole A.
Apostolova, Liana G.
Appel, Katja
Armstrong, Nicola J.
Aribisala, Benjamin
Bastin, Mark E.
Bauer, Michael
Bearden, Carrie E.
Bergmann, Orjan
Binder, Elisabeth B.
Blangero, John
Bockholt, Henry J.
Boen, Erlend
Bois, Catherine
Boomsma, Dorret I.
Booth, Tom
Bowman, Ian J.
Bralten, Janita
Brouwer, Rachel M.
Brunner, Han G.
Brohawn, David G.
Buckner, Randy L.
Buitelaar, Jan
Bulayeva, Kazima
Bustillo, Juan R.
Calhoun, Vince D.
Cannon, Dara M.
Cantor, Rita M.
Carless, Melanie A.
Caseras, Xavier
Cavalleri, Gianpiero L.
BRAIN IMAGING AND BEHAVIOR, 2014, 8 (02) : 153 - 182
[23] The ENIGMA Consortium: large-scale collaborative analyses of neuroimaging and genetic data
Paul M. Thompson
Jason L. Stein
Sarah E. Medland
Derrek P. Hibar
Alejandro Arias Vasquez
Miguel E. Renteria
Roberto Toro
Neda Jahanshad
Gunter Schumann
Barbara Franke
Margaret J. Wright
Nicholas G. Martin
Ingrid Agartz
Martin Alda
Saud Alhusaini
Laura Almasy
Jorge Almeida
Kathryn Alpert
Nancy C. Andreasen
Ole A. Andreassen
Liana G. Apostolova
Katja Appel
Nicola J. Armstrong
Benjamin Aribisala
Mark E. Bastin
Michael Bauer
Carrie E. Bearden
Ørjan Bergmann
Elisabeth B. Binder
John Blangero
Henry J. Bockholt
Erlend Bøen
Catherine Bois
Dorret I. Boomsma
Tom Booth
Ian J. Bowman
Janita Bralten
Rachel M. Brouwer
Han G. Brunner
David G. Brohawn
Randy L. Buckner
Jan Buitelaar
Kazima Bulayeva
Juan R. Bustillo
Vince D. Calhoun
Dara M. Cannon
Rita M. Cantor
Melanie A. Carless
Xavier Caseras
Gianpiero L. Cavalleri
Brain Imaging and Behavior, 2014, 8 : 153 - 182
[24] The utility of rapid application development in large-scale, complex projects
Berger, Hilary
Beynon-Davies, Paul
INFORMATION SYSTEMS JOURNAL, 2009, 19 (06) : 549 - 570
[25] A Large-Scale Corpus for Conversation Disentanglement
Kummerfeld, Jonathan K.
Athreya, Vignesh
Patel, Siva Sankalp
Gouravajhala, Sai R.
Gunasekara, Chulaka
Polymenakos, Lazaros
Peper, Joseph J.
Ganhotra, Jatin
Lasecki, Walter S.
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3846 - 3856
[26] A Corpus for Large-Scale Phonetic Typology
Salesky, Elizabeth
Chodroff, Eleanor
Pimentel, Tiago
Wiesner, Matthew
Cotterell, Ryan
Black, Alan W.
Eisner, Jason
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 4526 - 4546
[27] Socioeconomic value creation and the role of local participation in large-scale mining projects in the Arctic
Kadenic, Maja Due
EXTRACTIVE INDUSTRIES AND SOCIETY-AN INTERNATIONAL JOURNAL, 2015, 2 (03): : 562 - 571
[28] VisRepo: A Visual Retrieval Tool for Large-Scale Open-Source Projects
Yue, Xiaoqi
Liu, Chao
Zhang, Neng
Hu, Haibo
Zhang, Xiaohong
PROCEEDINGS OF THE 15TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2024, 2024, : 499 - 502
[29] Corpus creation and linguistic data mining: methods, models, tools
Mirovsky, Jiri
SLOVO A SLOVESNOST, 2017, 78 (04): : 349 - 352
[30] Large-scale annotation of proteins with labelling methods
Casadio, R.
Martelli, P. L.
Savojardo, C.
Fariselli, P.
NUOVO CIMENTO C-COLLOQUIA AND COMMUNICATIONS IN PHYSICS, 2012, 35 (05): : 7 - 25

← 1 2 3 4 5 →