Toward the Adoption of Explainable Pre-Trained Large Language Models for Classifying Human-Written and AI-Generated Sentences

被引:2
作者
Petrillo, Luca [1 ,2 ]
Martinelli, Fabio [1 ]
Santone, Antonella [3 ]
Mercaldo, Francesco [1 ,3 ]
机构
[1] CNR, Inst Informat & Telemat IIT, I-56124 Pisa, Italy
[2] IMT Sch Adv Studies Lucca, I-55100 Lucca, Italy
[3] Univ Molise, Dept Med & Hlth Sci Vincenzo Tiberio, I-86100 Campobasso, Italy
关键词
sentence detection; sentence classification; large language model;
D O I
10.3390/electronics13204057
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Pre-trained large language models have demonstrated impressive text generation capabilities, including understanding, writing, and performing many tasks in natural language. Moreover, with time and improvements in training and text generation techniques, these models are proving efficient at generating increasingly human-like content. However, they can also be modified to generate persuasive, contextual content weaponized for malicious purposes, including disinformation and novel social engineering attacks. In this paper, we present a study on identifying human- and AI-generated content using different models. Precisely, we fine-tune different models belonging to the BERT family, an open-source version of the GPT model, ELECTRA, and XLNet, and then perform a text classification task using two different labeled datasets-the first one consisting of 25,000 sentences generated by both AI and humans and the second comprising 22,929 abstracts that are ChatGPT-generated and written by humans. Furthermore, we perform an additional phase where we submit 20 sentences generated by ChatGPT and 20 sentences randomly extracted from Wikipedia to our fine-tuned models to verify the efficiency and robustness of the latter. In order to understand the prediction of the models, we performed an explainability phase using two sentences: one generated by the AI and one written by a human. We leveraged the integrated gradients and token importance techniques, analyzing the words and subwords of the two sentences. As a result of the first experiment, we achieved an average accuracy of 99%, precision of 98%, recall of 99%, and F1-score of 99%. For the second experiment, we reached an average accuracy of 51%, precision of 50%, recall of 52%, and F1-score of 51%.
引用
收藏
页数:32
相关论文
共 42 条
  • [1] Achiam J., 2023, Gpt-4 technical reppo
  • [2] Askell A, 2021, Arxiv, DOI arXiv:2112.00861
  • [3] Analysing Machine Learning Techniques for Cyberbullying Detection: A Review Study
    Aziz, Samia
    Usman, Muhammad
    Azam, Awais
    Ahmad, Farwa
    Bilal, Muhammad
    Cheema, Adeel Ashraf
    [J]. 2022 17TH INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES (ICET'22), 2022, : 247 - 252
  • [4] Bar-Tal O, 2024, Arxiv, DOI arXiv:2401.12945
  • [5] The Dark Side of Language Models: Exploring the Potential of LLMs in Multimedia Disinformation Generation and Dissemination
    Barman, Dipto
    Guo, Ziyi
    Conlan, Owen
    [J]. MACHINE LEARNING WITH APPLICATIONS, 2024, 16
  • [6] Betker J., 2023, Computer Science, P8
  • [7] Effectiveness of Fine-tuned BERT Model in Classification of Helpful and Unhelpful Online Customer Reviews
    Bilal, Muhammad
    Almazroi, Abdulwahab Ali
    [J]. ELECTRONIC COMMERCE RESEARCH, 2023, 23 (04) : 2737 - 2757
  • [8] Black S., 2021, GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow, V3
  • [9] Brown TB, 2020, ADV NEUR IN, V33
  • [10] Buyukoz Berfu, 2020, P AESPEN 2020 LANGUA, P9