Cross-Domain Authorship Attribution Using Pre-trained Language Models

被引：22

作者：

Barlas, Georgios ^{[1
]}

Stamatatos, Efstathios ^{[1
]}

机构：

[1] Univ Aegean, Karlovassi 83200, Greece

来源：

ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2020, PT I | 2020年 / 583卷

关键词：

Authorship Attribution; Neural network language models; Pre-trained language models;

D O I：

10.1007/978-3-030-49161-1_22

中图分类号：

学科分类号：

摘要：

Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics. An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre. In this paper, we modify a successful authorship verification approach based on a multi-headed neural network language model and combine it with pre-trained language models. Based on experiments on a controlled corpus covering several text genres where topic and genre is specifically controlled, we demonstrate that the proposed approach achieves very promising results. We also demonstrate the crucial effect of the normalization corpus in cross-domain attribution.

引用

页码：255 / 266

页数：12

共 50 条

[31] A Brief Review of Relation Extraction Based on Pre-Trained Language Models
Xu, Tiange
Zhang, Fu
FUZZY SYSTEMS AND DATA MINING VI, 2020, 331 : 775 - 789
[32] A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models
Zhang, Hanqing
Song, Haolin
Li, Shaoyu
Zhou, Ming
Song, Dawei
ACM COMPUTING SURVEYS, 2024, 56 (03)
[33] Recommending metamodel concepts during modeling activities with pre-trained language models
Martin Weyssow
Houari Sahraoui
Eugene Syriani
Software and Systems Modeling, 2022, 21 : 1071 - 1089
[34] On the Transferability of Pre-trained Language Models for Low-Resource Programming Languages
Chen, Fuxiang
Fard, Fatemeh H.
Lo, David
Bryksin, Timofey
30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 401 - 412
[35] Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries
Al-Kaswan, Ali
Ahmed, Toufique
Izadi, Maliheh
Sawant, Anand Ashok
Devanbu, Premkumar
van Deursen, Arie
2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 260 - 271
[36] Recommending metamodel concepts during modeling activities with pre-trained language models
Weyssow, Martin
Sahraoui, Houari
Syriani, Eugene
SOFTWARE AND SYSTEMS MODELING, 2022, 21 (03) : 1071 - 1089
[37] 'Am I the Bad One'? Predicting the Moral Judgement of the Crowd Using Pre-trained Language Models
Alhassan, Areej
Zhang, Jinkai
Schlegel, Viktor
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 267 - 276
[38] LMCK: pre-trained language models enhanced with contextual knowledge for Vietnamese natural language inference
Ngan Luu-Thuy Nguyen
Khoa Thi-Kim Phan
Tin Van Huynh
Kiet Van Nguyen
Multimedia Tools and Applications, 2025, 84 (15) : 15575 - 15595
[39] The quality improvement method for detecting attacks on web applications using pre-trained natural language models
Kovaleva, O. A.
Samokhvalov, A., V
Liashkov, M. A.
Pchelintsev, S. Yu.
IZVESTIYA OF SARATOV UNIVERSITY MATHEMATICS MECHANICS INFORMATICS, 2024, 24 (03): : 442 - 451
[40] Relational Prompt-Based Pre-Trained Language Models for Social Event Detection
Li, Pu
Yu, Xiaoyan
Peng, Hao
Xian, Yantuan
Wang, Linqin
Sun, Li
Zhang, Jingyun
Yu, Philip S.
ACM Transactions on Information Systems, 2024, 43 (01)

← 1 2 3 4 5 →