Listening to speakers: Crowdsourcing the language resources for non-standardized languages

被引:0
作者
Millour, Alice [1 ]
Fort, Karen [1 ]
机构
[1] Sorbonne Univ, STIH EA 4509, 28 Rue Serpente, F-75006 Paris, France
来源
TRAITEMENT AUTOMATIQUE DES LANGUES | 2018年 / 59卷 / 03期
关键词
non-standardized languages; crowdsourcing; part-of-speech annotation;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Citizen science, in particular voluntary crowdsourcing, is still little experimented solution to produce language resources for less-resourced languages with enough connected speakers. We present here experiments we led on part-of-speech annotation for non standardized languages, namely Alsatian and Guadeloupean Creole. We detail the methodology we used and show that it is adaptable to other languages, then we present the results we obtained. An analysis of the limits of this platform led us to develop a new one, that allows the creation of raw corpora and part-of-speech annotations, and the construction of a multivariant lexicon. The created platforms, language resources and tagging models are all freely available.
引用
收藏
页码:41 / 65
页数:25
相关论文
共 63 条
  • [21] Cox C, 2010, LANG COMPUT, V71, P213
  • [22] Crevenat-Werner D., 2008, ORTHOGRAPHE ALSACIEN
  • [23] Dandapat Sandipan, 2009, P 3 LING ANN WORKSH, P10
  • [24] Denis P., 2010, ACT TRAIT AUT LANG N
  • [25] Diki-Kidiri M., 2007, COMMENT ASSURER PRES, V31, P2007
  • [26] Fiser D., 2014, ACT 9 INT C LANG RES
  • [27] Fort K., 2010, P 4 LING ANN WORKSH, P56
  • [28] Fort K., 2017, GAMES4NLP 2017 USING, P2
  • [29] Amazon Mechanical Turk: Gold Mine or Coal Mine?
    Fort, Karen
    Adda, Gilles
    Cohen, K. Bretonnel
    [J]. COMPUTATIONAL LINGUISTICS, 2011, 37 (02) : 413 - 420
  • [30] Garcia M, 2014, PROCES LENG NAT, P95