Text Processing Procedures for Analysing a Corpus with Medieval Marian Miracle Tales in Old Swedish

被引:0
作者
Dahlqvist, Bengt [1 ]
机构
[1] Uppsala Univ, Dept Linguist & Philol, POB 635, S-75126 Uppsala, Sweden
来源
ICAART: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1 | 2020年
关键词
Text Mining; Medieval Texts; Miracle Stories; Old Swedish; Stop Words; Word Similarity; Spelling Variations; Key Words;
D O I
10.5220/0009372204520458
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A text corpus of one hundred and one Marian Miracle stories in Old Swedish written between c. 1272 and 1430 has been digitally compiled from three transcribed sources from the 19th Century. Highly specialized knowledge is needed to interpret these texts, since the medieval variant of Swedish differs significantly from the modern form of the language. Both the vocabulary and spelling as well as the grammar show substantial variances compared to modern Swedish. To advance the understanding of these texts, automated tools for textual processing are needed. This paper preliminary investigates a number of strategies, such as frequency list analysis and methods for identifying spelling variations for producing stop word lists and exposing the key words of the texts. This can be a help to understand the texts, identifying different word forms of the same word, to ease a lexicon lookup and be a starting point for lemmatisation.
引用
收藏
页码:452 / 458
页数:7
相关论文
共 8 条
[1]  
[Anonymous], P 10 SIGHUM WORKSH L
[2]  
Delsing Lars-Olof, 2017, NORDIC LANGUAGES INT, P925
[3]  
Klemming G. E., 1871, SJALENS TROST TIO GU
[4]  
Klemming G. E., 1877, KLOSTERLASNING SAMLI
[5]  
Soderwall Knut Fredrik, 1884, Ordbok ofver svenska medeltids-sprket
[6]  
Stephens George., 1847, FORN SVENSKT LEGENDA
[7]  
Stephens George., 1874, FORN SVENSKT LEGENDA
[8]  
Winkler W.E., 1990, P SECT SURV RES METH, P354, DOI DOI 10.1007/978-1-4612-2856-1_101