Corpora with Part-of-Speech Annotations for Three Regional Languages of France: Alsatian, Occitan and Picard

被引:0
作者
Bernhard, Delphine [1 ]
Ligozat, Anne-Laure [2 ]
Martin, Fanny [3 ]
Bras, Myriam [4 ]
Magistry, Pierre [5 ]
Vergez-Couret, Marianne [6 ]
Steible, Lucie [1 ]
Erhart, Pascale [1 ]
Hathout, Nabil [4 ]
Huck, Dominique [1 ]
Rey, Christophe [7 ]
Reynes, Philippe [3 ]
Rosset, Sophie [5 ]
Sibille, Jean [4 ]
Lavergne, Thomas [8 ]
机构
[1] Univ Strasbourg, LiLPa, Strasbourg, France
[2] Univ Paris Saclay, ENSILE, CNRS, LIMSI, F-91405 Orsay, France
[3] Univ Picardie Jules Verne, Lab Habiter Monde HM EA 4278, Amiens, France
[4] Univ Toulouse, UT2J, CNRS, CLLE, Toulouse, France
[5] Univ Paris Saclay, CNRS, LIMSI, F-91405 Orsay, France
[6] Queens Univ, Belfast, Antrim, North Ireland
[7] Univ Cergy Pontoise, IUF, LT2D Lex Textes Discours Dictionnaires, Cergy, France
[8] Univ Paris Saclay, Univ Paris Sud, CNRS, LIMSI, F-91405 Orsay, France
来源
PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018) | 2018年
关键词
corpus; annotation; part-of-speech; Alsatian; Occitan; Picard;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This article describes the creation of corpora with part-of-speech annotations for three regional languages of France: Alsatian, Occitan and Picard. These manual annotations were performed in the context of the RESTAURE project, whose goal is to develop resources and tools for these under-resourced French regional languages. The article presents the tagsets used in the annotation process as well as the resulting annotated corpora.
引用
收藏
页码:3917 / 3924
页数:8
相关论文
共 33 条
[1]  
Abeille A., 2003, TECHNICAL REPORT, P7
[2]  
[Anonymous], 1999, Guidelines fur das Tagging deutscher Textcorpora mit STTS (kleines und grosses Tagset)
[3]  
Armentano I, 2008, 9 C INT ASS INT ET O
[4]  
Bernhard D., 2018, PART OF SPEECH ANNOT, DOI [10.5281/zenodo.1171925, DOI 10.5281/ZENODO.1171925]
[5]  
Bernhard D., 2017, DILITAL 2017, P14
[6]  
Bernhard D, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
[7]  
Bernhard Delphine., 2013, Non-Standard Data Sources in Corpus Based-Research, P85
[8]  
Bras M., 2018, PART OF SPEECH ANNOT, DOI [10.5281/zen- odo.1173113, DOI 10.5281/ZEN-0D0.1173113]
[9]  
Bras M., 2014, 11 C ASS INT ET OCC
[10]  
Bras M, 2016, LANGUAGE DOCUMENTATI, P133