Cross-Domain Authorship Attribution Using Pre-trained Language Models

被引:22
|
作者
Barlas, Georgios [1 ]
Stamatatos, Efstathios [1 ]
机构
[1] Univ Aegean, Karlovassi 83200, Greece
来源
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2020, PT I | 2020年 / 583卷
关键词
Authorship Attribution; Neural network language models; Pre-trained language models;
D O I
10.1007/978-3-030-49161-1_22
中图分类号
学科分类号
摘要
Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics. An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre. In this paper, we modify a successful authorship verification approach based on a multi-headed neural network language model and combine it with pre-trained language models. Based on experiments on a controlled corpus covering several text genres where topic and genre is specifically controlled, we demonstrate that the proposed approach achieves very promising results. We also demonstrate the crucial effect of the normalization corpus in cross-domain attribution.
引用
收藏
页码:255 / 266
页数:12
相关论文
共 50 条
  • [41] KG-prompt: Interpretable knowledge graph prompt for pre-trained language models
    Chen, Liyi
    Liu, Jie
    Duan, Yutai
    Wang, Runze
    KNOWLEDGE-BASED SYSTEMS, 2025, 311
  • [42] An empirical study of pre-trained language models in simple knowledge graph question answering
    Hu, Nan
    Wu, Yike
    Qi, Guilin
    Min, Dehai
    Chen, Jiaoyan
    Pan, Jeff Z.
    Ali, Zafar
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (05): : 2855 - 2886
  • [43] A Comparison of SVM Against Pre-trained Language Models (PLMs) for Text Classification Tasks
    Wahba, Yasmen
    Madhavji, Nazim
    Steinbacher, John
    MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2022, PT II, 2023, 13811 : 304 - 313
  • [44] A survey on moral foundation theory and pre-trained language models: current advances and challenges
    Zangari, Lorenzo
    Greco, Candida Maria
    Picca, Davide
    Tagarelli, Andrea
    AI & SOCIETY, 2025,
  • [45] An empirical study of pre-trained language models in simple knowledge graph question answering
    Nan Hu
    Yike Wu
    Guilin Qi
    Dehai Min
    Jiaoyan Chen
    Jeff Z Pan
    Zafar Ali
    World Wide Web, 2023, 26 : 2855 - 2886
  • [46] Authorship Attribution of Small Messages Through Language Models
    Theophilo, Antonio
    Rocha, Anderson
    2022 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS), 2022,
  • [47] UniRaG: Unification, Retrieval, and Generation for Multimodal Question Answering With Pre-Trained Language Models
    Lim, Qi Zhi
    Lee, Chin Poo
    Lim, Kian Ming
    Samingan, Ahmad Kamsani
    IEEE ACCESS, 2024, 12 : 71505 - 71519
  • [48] Evaluating the Effectiveness of Pre-trained Language Models in Predicting the Helpfulness of Online Product Reviews
    Boluki, Ali
    Sharami, Javad Pourmostafa Roshan
    Shterionov, Dimitar
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 4, INTELLISYS 2023, 2024, 825 : 15 - 35
  • [49] POLYPHONE DISAMBIGUATION AND ACCENT PREDICTION USING PRE-TRAINED LANGUAGE MODELS IN JAPANESE TTS FRONT-END
    Hida, Rem
    Hamada, Masaki
    Kamada, Chie
    Tsunoo, Emiru
    Sekiya, Toshiyuki
    Kumakura, Toshiyuki
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7132 - 7136
  • [50] Pre-Trained Transformer-Based Models for Text Classification Using Low-Resourced Ewe Language
    Agbesi, Victor Kwaku
    Chen, Wenyu
    Yussif, Sophyani Banaamwini
    Hossin, Md Altab
    Ukwuoma, Chiagoziem C.
    Kuadey, Noble A.
    Agbesi, Colin Collinson
    Samee, Nagwan Abdel
    Jamjoom, Mona M.
    Al-antari, Mugahed A.
    SYSTEMS, 2024, 12 (01):