Cross-Domain Authorship Attribution Using Pre-trained Language Models

被引：22

作者：

Barlas, Georgios ^{[1
]}

Stamatatos, Efstathios ^{[1
]}

机构：

[1] Univ Aegean, Karlovassi 83200, Greece

来源：

ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2020, PT I | 2020年 / 583卷

关键词：

Authorship Attribution; Neural network language models; Pre-trained language models;

D O I：

10.1007/978-3-030-49161-1_22

中图分类号：

学科分类号：

摘要：

Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics. An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre. In this paper, we modify a successful authorship verification approach based on a multi-headed neural network language model and combine it with pre-trained language models. Based on experiments on a controlled corpus covering several text genres where topic and genre is specifically controlled, we demonstrate that the proposed approach achieves very promising results. We also demonstrate the crucial effect of the normalization corpus in cross-domain attribution.

引用

页码：255 / 266

页数：12

共 50 条

[41] KG-prompt: Interpretable knowledge graph prompt for pre-trained language models
Chen, Liyi
Liu, Jie
Duan, Yutai
Wang, Runze
KNOWLEDGE-BASED SYSTEMS, 2025, 311
[42] An empirical study of pre-trained language models in simple knowledge graph question answering
Hu, Nan
Wu, Yike
Qi, Guilin
Min, Dehai
Chen, Jiaoyan
Pan, Jeff Z.
Ali, Zafar
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (05): : 2855 - 2886
[43] A Comparison of SVM Against Pre-trained Language Models (PLMs) for Text Classification Tasks
Wahba, Yasmen
Madhavji, Nazim
Steinbacher, John
MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2022, PT II, 2023, 13811 : 304 - 313
[44] A survey on moral foundation theory and pre-trained language models: current advances and challenges
Zangari, Lorenzo
Greco, Candida Maria
Picca, Davide
Tagarelli, Andrea
AI & SOCIETY, 2025,
[45] An empirical study of pre-trained language models in simple knowledge graph question answering
Nan Hu
Yike Wu
Guilin Qi
Dehai Min
Jiaoyan Chen
Jeff Z Pan
Zafar Ali
World Wide Web, 2023, 26 : 2855 - 2886
[46] Authorship Attribution of Small Messages Through Language Models
Theophilo, Antonio
Rocha, Anderson
2022 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS), 2022,
[47] UniRaG: Unification, Retrieval, and Generation for Multimodal Question Answering With Pre-Trained Language Models
Lim, Qi Zhi
Lee, Chin Poo
Lim, Kian Ming
Samingan, Ahmad Kamsani
IEEE ACCESS, 2024, 12 : 71505 - 71519
[48] Evaluating the Effectiveness of Pre-trained Language Models in Predicting the Helpfulness of Online Product Reviews
Boluki, Ali
Sharami, Javad Pourmostafa Roshan
Shterionov, Dimitar
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 4, INTELLISYS 2023, 2024, 825 : 15 - 35
[49] POLYPHONE DISAMBIGUATION AND ACCENT PREDICTION USING PRE-TRAINED LANGUAGE MODELS IN JAPANESE TTS FRONT-END
Hida, Rem
Hamada, Masaki
Kamada, Chie
Tsunoo, Emiru
Sekiya, Toshiyuki
Kumakura, Toshiyuki
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7132 - 7136
[50] Pre-Trained Transformer-Based Models for Text Classification Using Low-Resourced Ewe Language
Agbesi, Victor Kwaku
Chen, Wenyu
Yussif, Sophyani Banaamwini
Hossin, Md Altab
Ukwuoma, Chiagoziem C.
Kuadey, Noble A.
Agbesi, Colin Collinson
Samee, Nagwan Abdel
Jamjoom, Mona M.
Al-antari, Mugahed A.
SYSTEMS, 2024, 12 (01):

← 1 2 3 4 5 →