Automating Transliteration of Cuneiform from Parallel Lines with Sparse Data

被引:5
作者
Bogacz, Bartosz [1 ]
Klingmann, Maximilian [1 ]
Mara, Hubert [1 ]
机构
[1] Heidelberg Univ, Interdisciplinary Ctr Sci Comp IWR, FCGL, Heidelberg, Germany
来源
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1 | 2017年
关键词
D O I
10.1109/ICDAR.2017.106
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cuneiform tablets appertain to the oldest textual artifacts and are in extent comparable to texts written in Latin or ancient Greek. The Cuneiform Commentaries Project (CPP) from Yale University provides tracings of cuneiform tablets with annotated transliterations and translations. As a part of our work analyzing cuneiform script computationally with 3D-acquisition and word-spotting, we present a first approach for automatized learning of transliterations of cuneiform tablets based on a corpus of parallel lines. These consist of manually drawn cuneiform characters and their transliteration into an alphanumeric code. Since the Cuneiform script is only available as raster-data, we segment lines with a projection profile, extract Histogram of oriented Gradients (HoG) features, detect outliers caused by tablet damage, and align those features with the transliteration. We apply methods from part-of-speech tagging to learn a correspondence between features and transliteration tokens. We evaluate point-wise classification with K-Nearest Neighbors (KNN) and a Support Vector Machine (SVM); sequence classification with a Hidden Markov Model (HMM) and a Structured Support Vector Machine (SVM-HMM). Analyzing our findings, we reach the conclusion that the sparsity of data, inconsistent labeling and the variety of tracing styles do currently not allow for fully automatized transliterations with the presented approach. However, the pursuit of automated learning of transliterations is of great relevance as manual annotation in larger quantities is not viable, given the few experts capable of transcribing cuneiform tablets.
引用
收藏
页码:615 / 620
页数:6
相关论文
共 25 条
[1]  
[Anonymous], T INFORM THEORY
[2]  
[Anonymous], 2003, COMPUTATIONAL LINGUI
[3]  
[Anonymous], WORKSH HIST DOC IM P
[4]  
[Anonymous], 2014, P C COMP NAT LANG LE
[5]  
[Anonymous], INT C DOC AN REC
[6]  
[Anonymous], 1970, ANN MATH STAT
[7]  
[Anonymous], 1901, Philo- sophical Magazine
[8]  
[Anonymous], PATTERN ANAL MACHINE
[9]  
[Anonymous], SOVIET PHYS DOKLADY
[10]  
[Anonymous], INT C LANG RES EV