Listening to speakers: Crowdsourcing the language resources for non-standardized languages

被引:0
作者
Millour, Alice [1 ]
Fort, Karen [1 ]
机构
[1] Sorbonne Univ, STIH EA 4509, 28 Rue Serpente, F-75006 Paris, France
来源
TRAITEMENT AUTOMATIQUE DES LANGUES | 2018年 / 59卷 / 03期
关键词
non-standardized languages; crowdsourcing; part-of-speech annotation;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Citizen science, in particular voluntary crowdsourcing, is still little experimented solution to produce language resources for less-resourced languages with enough connected speakers. We present here experiments we led on part-of-speech annotation for non standardized languages, namely Alsatian and Guadeloupean Creole. We detail the methodology we used and show that it is adaptable to other languages, then we present the results we obtained. An analysis of the limits of this platform led us to develop a new one, that allows the creation of raw corpora and part-of-speech annotations, and the construction of a multivariant lexicon. The created platforms, language resources and tagging models are all freely available.
引用
收藏
页码:41 / 65
页数:25
相关论文
共 63 条
  • [1] Avanzi M., 2017, BELGIAN J LINGUISTIC
  • [2] Barre C., 2004, ENQUETE ETUDE HIST F
  • [3] Barteld F., 2017, P STUDENT RES WORKSH, P22
  • [4] Benjamin M., 2018, P LREC 2018 WORKSH C, P26
  • [5] Berment Vincent, 2004, THESIS
  • [6] Bernabe J., 2001, LA GRAPHIE CREOLE
  • [7] Bernhard D., 2018, ACT LANG RES EV C LR
  • [8] Bernhard D., 2017, ACT DILITAL DIV LING
  • [9] Bernhard D., 2013, ACTES TALARE TRAITEM, P209
  • [10] Bettinson Mat, 2017, 2 WORKSH COMP METH E, P156