Grammatical Annotation of Historical Portuguese: Generating a Corpus-Based Diachronic Dictionary

被引:1
作者
Bick, Eckhard [1 ]
Zampieri, Marcos [2 ,3 ]
机构
[1] Univ Southern Denmark, Odense, Denmark
[2] Univ Saarland, Saarbrucken, Germany
[3] German Res Ctr Artificial Intelligence DFKI, Saarbrucken, Germany
来源
TEXT, SPEECH, AND DIALOGUE | 2016年 / 9924卷
关键词
Historical corpus; Corpus annotation; Dictionary;
D O I
10.1007/978-3-319-45510-5_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present an automatic system for the morphosyntactic annotation and lexicographical evaluation of historical Portuguese corpora. Using rule-based orthographical normalization, we were able to apply a standard parser (PALAVRAS) to historical data (Colonia corpus) and to achieve accurate annotation for both POS and syntax. By aligning original and standardized word forms, our method allows to create tailor-made standardization dictionaries for historical Portuguese with optional period or author frequencies.
引用
收藏
页码:3 / 11
页数:9
相关论文
共 17 条
[1]   Building a historical dictionary: the Historical Dictionary of Brazilian Portuguese in the 16th, 17th and 18th Centuries [J].
Azevedo Murakawa, Clotilde de Almeida .
ESTUDOS DE LINGUISTICA GALEGA, 2014, 6 :199-216
[2]  
Bick E., 2014, Working with Portuguese Corpora, chapter 14, P279
[3]  
Bick E., 2005, ROMANCE CORPUS LINGU, P271
[4]  
Britto H., 2002, ROMANCE CORPUS LINGU, P137
[5]  
Candido A, 2009, TRAIT AUTOM LANG, V50, P73
[6]  
Davies Mark, 2014, WORKING PORTUGUESE C, P89
[7]  
Galves Charlotte., 2010, Tycho Brahe Parsed Corpus of Historical Portuguese
[8]  
Hendrickx I., 2011, Journal for Language Technology and Computational Linguistics, V26, P65
[9]  
Hirohashi A., 2005, APRENDIZADO REGRAS S
[10]   The rise and fall of the L-shaped morphome: diachronic and experimental studies [J].
Nevins, Andrew ;
Rodrigues, Cilene ;
Tang, Kevin .
PROBUS, 2015, 27 (01) :101-155