Cross-Domain Authorship Attribution Using Pre-trained Language Models

被引:34
作者
Barlas, Georgios [1 ]
Stamatatos, Efstathios [1 ]
机构
[1] Univ Aegean, Karlovassi 83200, Greece
来源
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2020, PT I | 2020年 / 583卷
关键词
Authorship Attribution; Neural network language models; Pre-trained language models;
D O I
10.1007/978-3-030-49161-1_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics. An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre. In this paper, we modify a successful authorship verification approach based on a multi-headed neural network language model and combine it with pre-trained language models. Based on experiments on a controlled corpus covering several text genres where topic and genre is specifically controlled, we demonstrate that the proposed approach achieves very promising results. We also demonstrate the crucial effect of the normalization corpus in cross-domain attribution.
引用
收藏
页码:255 / 266
页数:12
相关论文
共 24 条
[1]  
Bagnall D., 2015, CLEF 2015 C LABS EV, P1
[2]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[3]   Learning Stylometric Representations for Authorship Analysis [J].
Ding, Steven H. H. ;
Fung, Benjamin C. M. ;
Iqbal, Farkhund ;
Cheung, William K. .
IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (01) :107-121
[4]   Language models and fusion for authorship attribution [J].
Fourkioti, Olga ;
Symeonidis, Symeon ;
Arampatzis, Avi .
INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (06)
[5]  
Ge ZH, 2016, AAAI CONF ARTIF INTE, P4212
[6]  
Goldstein-Stewart J., 2009, P 12 C EUROPEAN CHAP, P336
[7]  
Howard J, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P328
[8]   Authenticating the writings of Julius Caesar [J].
Iestemont, Mike ;
Stover, Justin ;
Koppel, Moshe ;
Karsdorp, Folgert ;
Daelemans, Walter .
EXPERT SYSTEMS WITH APPLICATIONS, 2016, 63 :86-96
[9]   Distributed language representation for authorship attribution [J].
Kocher, Mirco ;
Savoy, Jacques .
DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2018, 33 (02) :425-441
[10]  
Madigan David., 2005, P M CLASS SOC N AM, P13