CuneiML: A Cuneiform Dataset for Machine Learning

被引:0
作者
Chen, Danlu [1 ]
Agarwal, Aditi [1 ]
Berg-Kirkpatrick, Taylor [1 ]
Myerston, Jacobo [2 ]
机构
[1] Univ Calif San Diego, Comp Sci & Engn, La Jolla, CA 92093 USA
[2] Univ Calif San Diego, Literature Dept, La Jolla, CA USA
关键词
cuneiform; machine learning; computational paleography; image processing;
D O I
10.5334/johd.151
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
The cuneiform writing system holds a vast reservoir of ancient literature, encompassing over 3000 years of history. Originating around the mid -fourth millennium BCE and enduring until the late first millennium BCE, cuneiform writing spans various genres such as administrative, legal, medical, and scientific documents, among others. This article introduces a curated dataset, CuneiML, featuring 38,947 high -resolution 2D photos of Sumerian and Akkadian cuneiform tablets, accompanied by their cuneiform Unicode transcriptions, transliterations, lineart, and metadata. This dataset aims to support the development of machine learning tools for processing and analyzing Sumerian and Akkadian cuneiform artifacts - e.g. for automatically classifying genre, provenance, or period from unannotated tablet images. Thus, CuneiML is designed with consistency of format as a primary concern. Specifically, CuneiML is a result of meticulously preprocessing, segmenting, filtering, and re -transliterating data that is available online in the Cuneiform Digital Library Initiative (CDLI) collection.
引用
收藏
页数:9
相关论文
共 11 条
[1]   Restoring and attributing ancient texts using deep neural networks [J].
Assael, Yannis ;
Sommerschield, Thea ;
Shillingford, Brendan ;
Bordbar, Mahyar ;
Pavlopoulos, John ;
Chatzipanagiotou, Marita ;
Androutsopoulos, Ion ;
Prag, Jonathan ;
de Freitas, Nando .
NATURE, 2022, 603 (7900) :280-+
[2]   Period Classification of 3D Cuneiform Tablets with Geometric Neural Networks [J].
Bogacz, Bartosz ;
Mara, Hubert .
2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, :246-251
[3]   DAN: A Segmentation-Free Document Attention Network for Handwritten Document Recognition [J].
Coquenet, Denis ;
Chatelain, Clement ;
Paquet, Thierry .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) :8227-8243
[4]  
Englund R. K., 2023, Cuneiform Digital Library Initiative
[5]   Reading Akkadian cuneiform using natural language processing [J].
Gordin, Shai ;
Gutherz, Gai ;
Elazary, Ariel ;
Romach, Avital ;
Jimenez, Enrique ;
Berant, Jonathan ;
Cohen, Yoram .
PLOS ONE, 2020, 15 (10)
[6]   Translating Akkadian to English with neural machine translation [J].
Gutherz, Gai ;
Gordin, Shai ;
Saenz, Luis ;
Levy, Omer ;
Berant, Jonathan .
PNAS NEXUS, 2023, 2 (05)
[7]   Image-to-Image Translation with Conditional Adversarial Networks [J].
Isola, Phillip ;
Zhu, Jun-Yan ;
Zhou, Tinghui ;
Efros, Alexei A. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5967-5976
[8]  
Kirillov A, 2023, Arxiv, DOI arXiv:2304.02643
[9]  
Lazar K., 2021, P 2021 C EMP METH NA, P4682, DOI [DOI 10.18653/V1/2021.EMNLP-MAIN.384, 10.18653/ v1/2021.emnlp-main.384]
[10]   High-Resolution Image Synthesis with Latent Diffusion Models [J].
Rombach, Robin ;
Blattmann, Andreas ;
Lorenz, Dominik ;
Esser, Patrick ;
Ommer, Bjoern .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :10674-10685