Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium

被引:0
|
作者
Maeda, Kazuaki [1 ]
Lee, Haejoong [1 ]
Medero, Shawn [1 ]
Medero, Julie [1 ]
Parker, Robert [1 ]
Strassel, Stephanie [1 ]
机构
[1] Univ Penn, Linguist Data Consortium, Philadelphia, PA 19104 USA
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
The Linguistic Data Consortium (LDC) creates a variety of linguistic resources - data, annotations, tools, standards and best practices - for many sponsored projects. The programming staff at LDC has created the tools and technical infrastructures to support the data creation efforts for these projects, creating tools and technical infrastructures for all aspects of data creation projects: data scouting, data collection, data selection, annotation, search, data tracking and work flow management. This paper introduces a number of samples of LDC programming staff's work, with particular focus on the recent additions and updates to the suite of software tools developed by LDC. Tools introduced include the GScout Web Data Scouting Tool, LDC Data Selection Toolkit, ACK - Annotation Collection Kit, XTrans Transcription and Speech Annotation Tool, GALE Distillation Toolkit, and the GALE MT Post Editing Work flow Management System.
引用
收藏
页码:3052 / 3056
页数:5
相关论文
共 50 条
  • [31] Large-Scale Training Framework for Video Annotation
    Hwang, Seong Jae
    Lee, Joonseok
    Varadarajan, Balakrishnan
    Gordon, Ariel
    Xu, Zheng
    Natsev, Apostol
    KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 2394 - 2402
  • [32] LARGE-SCALE INFRASTRUCTURE PROJECTS IN EUROPE
    EKENGER, P
    TECHNOLOGY IN SOCIETY, 1987, 9 (01) : 87 - 95
  • [33] NIH TO FUND LARGE-SCALE PROJECTS
    不详
    CHEMICAL & ENGINEERING NEWS, 2009, 87 (16) : 32 - 32
  • [34] Challenges in large-scale bioinformatics projects
    Morrison-Smith, Sarah
    Boucher, Christina
    Sarcevic, Aleksandra
    Noyes, Noelle
    O'Brien, Catherine
    Cuadros, Nazaret
    Ruiz, Jaime
    HUMANITIES & SOCIAL SCIENCES COMMUNICATIONS, 2022, 9 (01):
  • [35] Challenges in large-scale bioinformatics projects
    Sarah Morrison-Smith
    Christina Boucher
    Aleksandra Sarcevic
    Noelle Noyes
    Catherine O’Brien
    Nazaret Cuadros
    Jaime Ruiz
    Humanities and Social Sciences Communications, 9
  • [36] Creation of a large-scale genetic data bank for cardiovascular association studies
    Agah, R
    Ellis, S
    Chase, S
    Henderson, M
    Mlady, L
    Murugesan, G
    Tubbs, R
    Marchant, K
    Warshawsky, I
    Rouse, C
    Hughes, K
    Welch, P
    Topol, EJ
    AMERICAN HEART JOURNAL, 2005, 150 (03) : 500 - 506
  • [37] Live IT Projects at a University in Large-Scale
    Porubaen, Jaroslav
    Bacikova, Michaela
    2016 INTERNATIONAL CONFERENCE ON EMERGING ELEARNING TECHNOLOGIES AND APPLICATIONS (ICETA), 2016,
  • [38] Grid technologies for large-scale projects
    Dolbilov, A.
    Korenkov, V.
    Mitsyn, V.
    Palichik, V.
    Shmatov, S.
    Strizh, T.
    Tikhonenko, E.
    Trofimov, V.
    Voytishin, N.
    PROCEEDINGS 2015 8TH ROMANIA TIER 2 FEDERATION GRID, CLOUD & HIGH PERFORMANCE COMPUTING IN SCIENCE (RO-LCG), 2015,
  • [39] Selection and Execution of large-scale projects
    Ahrens, G. -A.
    Beckmann, K. J.
    Boltze, M.
    Eisenkopf, A.
    Fricke, H.
    Knieps, G.
    Knorr, A.
    Mitusch, K.
    Oeter, S.
    Radermacher, F. -J
    Sieg, G.
    Siegmann, J.
    Schlag, B.
    Stoelzle, W.
    Vallee, D.
    Winner, H.
    BAUINGENIEUR, 2015, 90 : 129 - 139
  • [40] Large-Scale Automatic Audiobook Creation
    Walsh, Brendan
    Hamilton, Mark
    Newby, Greg
    Wang, Xi
    Ruan, Serena
    Zhao, Sheng
    He, Lei
    Zhang, Shaofei
    Dettinger, Eric
    Freeman, William T.
    Weimer, Markus
    INTERSPEECH 2023, 2023, : 3675 - 3676