Cross-Domain Authorship Attribution Using Pre-trained Language Models

被引：22

作者：

Barlas, Georgios ^{[1
]}

Stamatatos, Efstathios ^{[1
]}

机构：

[1] Univ Aegean, Karlovassi 83200, Greece

来源：

ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2020, PT I | 2020年 / 583卷

关键词：

Authorship Attribution; Neural network language models; Pre-trained language models;

D O I：

10.1007/978-3-030-49161-1_22

中图分类号：

学科分类号：

摘要：

Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics. An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre. In this paper, we modify a successful authorship verification approach based on a multi-headed neural network language model and combine it with pre-trained language models. Based on experiments on a controlled corpus covering several text genres where topic and genre is specifically controlled, we demonstrate that the proposed approach achieves very promising results. We also demonstrate the crucial effect of the normalization corpus in cross-domain attribution.

引用

页码：255 / 266

页数：12

共 50 条

[21] Exploring Pre-trained Language Models for Vocabulary Alignment in the UMLS
Hao, Xubing
Abeysinghe, Rashmie
Shi, Jay
Cui, Licong
ARTIFICIAL INTELLIGENCE IN MEDICINE, PT I, AIME 2024, 2024, 14844 : 273 - 278
[22] Rethinking Textual Adversarial Defense for Pre-Trained Language Models
Wang, Jiayi
Bao, Rongzhou
Zhang, Zhuosheng
Zhao, Hai
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2526 - 2540
[23] Cross-Language Authorship Attribution
Bogdanova, Dasha
Lazaridou, Angeliki
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2015 - 2020
[24] Language models and fusion for authorship attribution
Fourkioti, Olga
Symeonidis, Symeon
Arampatzis, Avi
INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (06)
[25] Improving Braille-Chinese translation with jointly trained and pre-trained language models
Huang, Tianyuan
Su, Wei
Liu, Lei
Cai, Chuan
Yu, Hailong
Yuan, Yongna
DISPLAYS, 2024, 82
[26] Natural language generation from Universal Dependencies using data augmentation and pre-trained language models
Nguyen D.T.
Tran T.
International Journal of Intelligent Information and Database Systems, 2023, 16 (01) : 89 - 105
[27] Refining Pre-trained Language Models for Domain Adaptation with Entity-Aware Discriminative and Contrastive Learning
Yang, Jian
Hu, Xinyu
Shen, Yulong
Xiao, Gang
PROCEEDINGS OF THE 2024 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2024, : 409 - 417
[28] Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models
Wang, Zhiyi
Mao, Shaoguang
Wu, Wenshan
Xia, Yan
Deng, Yan
Tien, Jonathan
INTERSPEECH 2023, 2023, : 4194 - 4198
[29] Gauging, enriching and applying geography knowledge in Pre-trained Language Models
Ramrakhiyani, Nitin
Varma, Vasudeva
Palshikar, Girish Keshav
Pawar, Sachin
INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)
[30] Intelligent Completion of Ancient Texts Based on Pre-trained Language Models
Li J.
Ming C.
Guo Z.
Qian T.
Peng Z.
Wang X.
Li X.
Li J.
Data Analysis and Knowledge Discovery, 2024, 8 (05) : 59 - 67

← 1 2 3 4 5 →