Corpus REDEWIEDERGABE

被引:0
作者
Brunner, Annelen [1 ]
Engelberg, Stefan [1 ]
Jannidis, Fotis [2 ]
Tu, Ngoc Duyen Tanja [1 ]
Weimer, Lukas [2 ]
机构
[1] Leibniz Inst Deutsch Sprache Mannheim, R5 6-13, D-68161 Mannheim, Germany
[2] Julius Maximilian Univ Wurzburg, D-97074 Wurzburg, Germany
来源
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020) | 2020年
关键词
corpus; annotation; speech thought writing representation; machine learning;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This article presents the corpus REDEWIEDERGABE, a German-language historical corpus with detailed annotations for speech, thought and writing representation (ST&WR). With approximately 490,000 tokens, it is the largest resource of its kind. It can be used to answer literary and linguistic research questions and serve as training material for machine learning. This paper describes the composition of the corpus and the annotation structure, discusses some methodological decisions and gives basic statistics about the forms of ST&WR found in this corpus.
引用
收藏
页码:803 / 812
页数:10
相关论文
共 37 条
[1]  
[Anonymous], 2015, SPRACHWISSENSCHAFTEN
[2]  
[Anonymous], 2013, GRUNDRISS DTSCH GRAM
[3]  
Banfield Ann., 1982, Unspeakable Sentences: Narration and Representation in the Language of Fiction
[4]  
Brunner A, 2020, REDEWIEDERGABE HEFTR, P190, DOI [10.5281/zenodo.3666689, DOI 10.5281/ZENODO.3666689]
[5]  
Brunner A, P 15 C NAT LANG PROC, P241
[6]  
Brunner A., 2019, ZENODO, DOI [10.5281/zenodo.2634994, DOI 10.5281/ZENODO.2634994]
[7]  
Brunner Annelen., 2015, Automatische Erkennung von Redewiedergabe
[8]  
Cohn Dorrit:., 1978, Transparent Minds - Narrative Modes for Presenting Consciousness in Fiction
[9]  
de Haan-Vis K, 2016, APPL COGN LINGUIST, V33, P137
[10]  
Elson DK, 2010, AAAI CONF ARTIF INTE, P1013