The Uppsala Corpus of Student Writings Corpus Creation, Annotation, and Analysis

被引:0
作者
Megyesi, Beata [1 ]
Nasman, Jesper [1 ]
Palmer, Anne [2 ]
机构
[1] Uppsala Univ, Dept Linguist & Philol 2, Uppsala, Sweden
[2] Uppsala Univ, Dept Scandinavian Languages, Uppsala, Sweden
来源
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2016年
基金
瑞典研究理事会;
关键词
student writings; digital humanities; educational applications;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
The Uppsala Corpus of Student Writings consists of Swedish texts produced as part of a national test of students ranging in age from nine (in year three of primary school) to nineteen (the last year of upper secondary school) who are studying either Swedish or Swedish as a second language. National tests have been collected since 1996. The corpus currently consists of 2,500 texts containing over 1.5 million tokens. Parts of the texts have been annotated on several linguistic levels using existing state-of-the-art natural language processing tools. In order to make the corpus easy to interpret for scholars in the humanities, we chose the CoNLL format instead of an XML-based representation. Since spelling and grammatical errors are common in student writings, the texts are automatically corrected while keeping the original tokens in the corpus. Each token is annotated with part-of-speech and morphological features as well as syntactic structure. The main purpose of the corpus is to facilitate the systematic and quantitative empirical study of the writings of various student groups based on gender, geographic area, age, grade awarded or a combination of these, synchronically or diachronically. The intention is for this to be a monitor corpus, currently under development.
引用
收藏
页码:3192 / 3199
页数:8
相关论文
共 20 条
[1]  
[Anonymous], SKRIVFORMAGA STUDIER
[2]  
[Anonymous], 2014, P 8 WORKSHOP LANGUAG, DOI 10.3115/v1/W14-0605
[3]  
[Anonymous], 2007, P 45 ANN M ASS COMP
[4]  
Garme B., 1988, TEXT TANKE SKRIVSTRA
[5]  
Gustafson-Capkova S., 2006, DOCUMENTATION STOCKH
[6]  
Hultman T G., 1977, Gymnasistsvenska
[7]  
Kallgren Gunnel, 2014, STOCKHOLM UMEA CORPU
[8]  
Magnusson U., 2009, SPRAK LARANDE
[9]  
Megyesi B., 2008, OPEN SOURCE TAGGER H
[10]  
Nivre J., 2006, Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy, May 22-28, V6, P2216