Cross-Domain Authorship Attribution Using Pre-trained Language Models

被引:22
|
作者
Barlas, Georgios [1 ]
Stamatatos, Efstathios [1 ]
机构
[1] Univ Aegean, Karlovassi 83200, Greece
来源
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2020, PT I | 2020年 / 583卷
关键词
Authorship Attribution; Neural network language models; Pre-trained language models;
D O I
10.1007/978-3-030-49161-1_22
中图分类号
学科分类号
摘要
Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics. An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre. In this paper, we modify a successful authorship verification approach based on a multi-headed neural network language model and combine it with pre-trained language models. Based on experiments on a controlled corpus covering several text genres where topic and genre is specifically controlled, we demonstrate that the proposed approach achieves very promising results. We also demonstrate the crucial effect of the normalization corpus in cross-domain attribution.
引用
收藏
页码:255 / 266
页数:12
相关论文
共 50 条
  • [21] Exploring Pre-trained Language Models for Vocabulary Alignment in the UMLS
    Hao, Xubing
    Abeysinghe, Rashmie
    Shi, Jay
    Cui, Licong
    ARTIFICIAL INTELLIGENCE IN MEDICINE, PT I, AIME 2024, 2024, 14844 : 273 - 278
  • [22] Rethinking Textual Adversarial Defense for Pre-Trained Language Models
    Wang, Jiayi
    Bao, Rongzhou
    Zhang, Zhuosheng
    Zhao, Hai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2526 - 2540
  • [23] Cross-Language Authorship Attribution
    Bogdanova, Dasha
    Lazaridou, Angeliki
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2015 - 2020
  • [24] Language models and fusion for authorship attribution
    Fourkioti, Olga
    Symeonidis, Symeon
    Arampatzis, Avi
    INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (06)
  • [25] Improving Braille-Chinese translation with jointly trained and pre-trained language models
    Huang, Tianyuan
    Su, Wei
    Liu, Lei
    Cai, Chuan
    Yu, Hailong
    Yuan, Yongna
    DISPLAYS, 2024, 82
  • [26] Natural language generation from Universal Dependencies using data augmentation and pre-trained language models
    Nguyen D.T.
    Tran T.
    International Journal of Intelligent Information and Database Systems, 2023, 16 (01) : 89 - 105
  • [27] Refining Pre-trained Language Models for Domain Adaptation with Entity-Aware Discriminative and Contrastive Learning
    Yang, Jian
    Hu, Xinyu
    Shen, Yulong
    Xiao, Gang
    PROCEEDINGS OF THE 2024 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2024, : 409 - 417
  • [28] Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models
    Wang, Zhiyi
    Mao, Shaoguang
    Wu, Wenshan
    Xia, Yan
    Deng, Yan
    Tien, Jonathan
    INTERSPEECH 2023, 2023, : 4194 - 4198
  • [29] Gauging, enriching and applying geography knowledge in Pre-trained Language Models
    Ramrakhiyani, Nitin
    Varma, Vasudeva
    Palshikar, Girish Keshav
    Pawar, Sachin
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)
  • [30] Intelligent Completion of Ancient Texts Based on Pre-trained Language Models
    Li J.
    Ming C.
    Guo Z.
    Qian T.
    Peng Z.
    Wang X.
    Li X.
    Li J.
    Data Analysis and Knowledge Discovery, 2024, 8 (05) : 59 - 67