Cross-Domain Authorship Attribution Using Pre-trained Language Models

被引：22

作者：

Barlas, Georgios ^{[1
]}

Stamatatos, Efstathios ^{[1
]}

机构：

[1] Univ Aegean, Karlovassi 83200, Greece

来源：

ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2020, PT I | 2020年 / 583卷

关键词：

Authorship Attribution; Neural network language models; Pre-trained language models;

D O I：

10.1007/978-3-030-49161-1_22

中图分类号：

学科分类号：

摘要：

Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics. An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre. In this paper, we modify a successful authorship verification approach based on a multi-headed neural network language model and combine it with pre-trained language models. Based on experiments on a controlled corpus covering several text genres where topic and genre is specifically controlled, we demonstrate that the proposed approach achieves very promising results. We also demonstrate the crucial effect of the normalization corpus in cross-domain attribution.

引用

页码：255 / 266

页数：12

共 50 条

[1] A transfer learning approach to cross-domain authorship attribution
Barlas, Georgios
Stamatatos, Efstathios
EVOLVING SYSTEMS, 2021, 12 (03) : 625 - 643
[2] A transfer learning approach to cross-domain authorship attribution
Georgios Barlas
Efstathios Stamatatos
Evolving Systems, 2021, 12 : 625 - 643
[3] Pre-trained Language Models in Biomedical Domain: A Systematic Survey
Wang, Benyou
Xie, Qianqian
Pei, Jiahuan
Chen, Zhihong
Tiwari, Prayag
Li, Zhao
Fu, Jie
ACM COMPUTING SURVEYS, 2024, 56 (03)
[4] Pre-trained language models with domain knowledge for biomedical extractive summarization
Xie Q.
Bishop J.A.
Tiwari P.
Ananiadou S.
Knowledge-Based Systems, 2022, 252
[5] Identifying Styles of Cross-Language Classics with Pre-Trained Models
Zhang Y.
Deng S.
Hu H.
Wang D.
Data Analysis and Knowledge Discovery, 2023, 7 (10) : 50 - 62
[6] Pre-trained Language Models for the Legal Domain: A Case Study on Indian Law
Paul, Shounak
Mandal, Arpan
Goyal, Pawan
Ghosh, Saptarshi
PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND LAW, ICAIL 2023, 2023, : 187 - 196
[7] Enhancing Turkish Sentiment Analysis Using Pre-Trained Language Models
Koksal, Omer
29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
[8] Leveraging Pre-trained Language Models for Gender Debiasing
Jain, Nishtha
Popovic, Maja
Groves, Declan
Specia, Lucia
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2188 - 2195
[9] InA: Inhibition Adaption on pre-trained language models
Kang, Cheng
Prokop, Jindrich
Tong, Lei
Zhou, Huiyu
Hu, Yong
Novak, Daniel
NEURAL NETWORKS, 2024, 178
[10] A Survey of Knowledge Enhanced Pre-Trained Language Models
Hu, Linmei
Liu, Zeyi
Zhao, Ziwang
Hou, Lei
Nie, Liqiang
Li, Juanzi
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (04) : 1413 - 1430

← 1 2 3 4 5 →