HistRED: A Historical Document-Level Relation Extraction Dataset

被引:0
作者
Yang, Soyoung [1 ]
Choi, Minseok [1 ]
Cho, Youngwoo [1 ]
Choo, Jaegul [1 ]
机构
[1] KAIST AI, Daejeon, South Korea
来源
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1 | 2023年
关键词
RECORDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the extensive applications of relation extraction (RE) tasks in various domains, little has been explored in the historical context, which contains promising data across hundreds and thousands of years. To promote the historical RE research, we present HistRED constructed from Yeonhaengnok. Yeonhaengnok is a collection of records originally written in Hanja, the classical Chinese writing, which has later been translated into Korean. HistRED provides bilingual annotations such that RE can be performed on Korean and Hanja texts. In addition, HistRED supports various self-contained subtexts with different lengths, from a sentence level to a document level, supporting diverse context settings for researchers to evaluate the robustness of their RE models. To demonstrate the usefulness of our dataset, we propose a bilingual RE model that leverages both Korean and Hanja contexts to predict relations between entities. Our model outperforms monolingual baselines on HistRED, showing that employing multiple language contexts supplements the RE predictions. The dataset is publicly available at: https://huggingface.co/ datasets/Soyoung/HistRED under CC BYNC-ND 4.0 license.
引用
收藏
页码:3207 / 3224
页数:18
相关论文
共 34 条
[1]  
Alt C, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P1558
[2]   Restoring and attributing ancient texts using deep neural networks [J].
Assael, Yannis ;
Sommerschield, Thea ;
Shillingford, Brendan ;
Bordbar, Mahyar ;
Pavlopoulos, John ;
Chatzipanagiotou, Marita ;
Androutsopoulos, Ion ;
Prag, Jonathan ;
de Freitas, Nando .
NATURE, 2022, 603 (7900) :280-+
[3]  
Assael Y, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P6368
[4]  
Bak J. Y., 2015, Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), P10
[5]  
Bak J, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P956
[6]  
Cabot PLH, 2021, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, P2370
[7]  
Cai R, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P756
[8]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9]  
Doddington George., 2004, P LREC 04
[10]  
Han X, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P4803