Automatic disambiguation of Latin abbreviations in early modern texts for humanities digital libraries

被引:3
作者
Rydberg-Cox, JA [1 ]
机构
[1] Univ Missouri, Dept English, Kansas City, MO 64110 USA
来源
2003 JOINT CONFERENCE ON DIGITAL LIBRARIES, PROCEEDINGS | 2003年
关键词
digitization; tagging early modem texts; history of science;
D O I
10.1109/JCDL.2003.1204892
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Early modem books written in Latin contain many abbreviations of common words that are derived from earlier manuscript practice. While these abbreviations are usually easily deciphered by a reader well-versed in Latin, they pose technical problems for full text digitization: they are difficult to OCR or have typed and - if they are not expanded correctly - they limit the effectiveness of information retrieval and reading support tools in the digital library. In this paper, I will describe a method for the automatic expansion and disambiguation of these abbreviations.
引用
收藏
页码:372 / 373
页数:2
相关论文
共 5 条
[1]  
CAPPELLI A, 1990, DIZIONARIO ABBREVIAT
[2]  
Crane G., 1991, Literary & Linguistic Computing, V6, P243, DOI 10.1093/llc/6.4.243
[3]  
CRANE G, 2000, P 5 ANN ACM DIG LIB
[4]  
JOHNSON J, 1824, TYPOGRAPHIA PRINTERS
[5]  
MCKERROW R, 1927, INTRO BIBLIOGRAPHY