Cross-Domain Authorship Attribution Using Pre-trained Language Models

被引:22
|
作者
Barlas, Georgios [1 ]
Stamatatos, Efstathios [1 ]
机构
[1] Univ Aegean, Karlovassi 83200, Greece
来源
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2020, PT I | 2020年 / 583卷
关键词
Authorship Attribution; Neural network language models; Pre-trained language models;
D O I
10.1007/978-3-030-49161-1_22
中图分类号
学科分类号
摘要
Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics. An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre. In this paper, we modify a successful authorship verification approach based on a multi-headed neural network language model and combine it with pre-trained language models. Based on experiments on a controlled corpus covering several text genres where topic and genre is specifically controlled, we demonstrate that the proposed approach achieves very promising results. We also demonstrate the crucial effect of the normalization corpus in cross-domain attribution.
引用
收藏
页码:255 / 266
页数:12
相关论文
共 50 条
  • [31] A Brief Review of Relation Extraction Based on Pre-Trained Language Models
    Xu, Tiange
    Zhang, Fu
    FUZZY SYSTEMS AND DATA MINING VI, 2020, 331 : 775 - 789
  • [32] A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models
    Zhang, Hanqing
    Song, Haolin
    Li, Shaoyu
    Zhou, Ming
    Song, Dawei
    ACM COMPUTING SURVEYS, 2024, 56 (03)
  • [33] Recommending metamodel concepts during modeling activities with pre-trained language models
    Martin Weyssow
    Houari Sahraoui
    Eugene Syriani
    Software and Systems Modeling, 2022, 21 : 1071 - 1089
  • [34] On the Transferability of Pre-trained Language Models for Low-Resource Programming Languages
    Chen, Fuxiang
    Fard, Fatemeh H.
    Lo, David
    Bryksin, Timofey
    30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 401 - 412
  • [35] Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries
    Al-Kaswan, Ali
    Ahmed, Toufique
    Izadi, Maliheh
    Sawant, Anand Ashok
    Devanbu, Premkumar
    van Deursen, Arie
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 260 - 271
  • [36] Recommending metamodel concepts during modeling activities with pre-trained language models
    Weyssow, Martin
    Sahraoui, Houari
    Syriani, Eugene
    SOFTWARE AND SYSTEMS MODELING, 2022, 21 (03) : 1071 - 1089
  • [37] 'Am I the Bad One'? Predicting the Moral Judgement of the Crowd Using Pre-trained Language Models
    Alhassan, Areej
    Zhang, Jinkai
    Schlegel, Viktor
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 267 - 276
  • [38] LMCK: pre-trained language models enhanced with contextual knowledge for Vietnamese natural language inference
    Ngan Luu-Thuy Nguyen
    Khoa Thi-Kim Phan
    Tin Van Huynh
    Kiet Van Nguyen
    Multimedia Tools and Applications, 2025, 84 (15) : 15575 - 15595
  • [39] The quality improvement method for detecting attacks on web applications using pre-trained natural language models
    Kovaleva, O. A.
    Samokhvalov, A., V
    Liashkov, M. A.
    Pchelintsev, S. Yu.
    IZVESTIYA OF SARATOV UNIVERSITY MATHEMATICS MECHANICS INFORMATICS, 2024, 24 (03): : 442 - 451
  • [40] Relational Prompt-Based Pre-Trained Language Models for Social Event Detection
    Li, Pu
    Yu, Xiaoyan
    Peng, Hao
    Xian, Yantuan
    Wang, Linqin
    Sun, Li
    Zhang, Jingyun
    Yu, Philip S.
    ACM Transactions on Information Systems, 2024, 43 (01)