Toward the Adoption of Explainable Pre-Trained Large Language Models for Classifying Human-Written and AI-Generated Sentences

被引:2
作者
Petrillo, Luca [1 ,2 ]
Martinelli, Fabio [1 ]
Santone, Antonella [3 ]
Mercaldo, Francesco [1 ,3 ]
机构
[1] CNR, Inst Informat & Telemat IIT, I-56124 Pisa, Italy
[2] IMT Sch Adv Studies Lucca, I-55100 Lucca, Italy
[3] Univ Molise, Dept Med & Hlth Sci Vincenzo Tiberio, I-86100 Campobasso, Italy
关键词
sentence detection; sentence classification; large language model;
D O I
10.3390/electronics13204057
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Pre-trained large language models have demonstrated impressive text generation capabilities, including understanding, writing, and performing many tasks in natural language. Moreover, with time and improvements in training and text generation techniques, these models are proving efficient at generating increasingly human-like content. However, they can also be modified to generate persuasive, contextual content weaponized for malicious purposes, including disinformation and novel social engineering attacks. In this paper, we present a study on identifying human- and AI-generated content using different models. Precisely, we fine-tune different models belonging to the BERT family, an open-source version of the GPT model, ELECTRA, and XLNet, and then perform a text classification task using two different labeled datasets-the first one consisting of 25,000 sentences generated by both AI and humans and the second comprising 22,929 abstracts that are ChatGPT-generated and written by humans. Furthermore, we perform an additional phase where we submit 20 sentences generated by ChatGPT and 20 sentences randomly extracted from Wikipedia to our fine-tuned models to verify the efficiency and robustness of the latter. In order to understand the prediction of the models, we performed an explainability phase using two sentences: one generated by the AI and one written by a human. We leveraged the integrated gradients and token importance techniques, analyzing the words and subwords of the two sentences. As a result of the first experiment, we achieved an average accuracy of 99%, precision of 98%, recall of 99%, and F1-score of 99%. For the second experiment, we reached an average accuracy of 51%, precision of 50%, recall of 52%, and F1-score of 51%.
引用
收藏
页数:32
相关论文
共 42 条
  • [21] FakeBERT: Fake news detection in social media with a BERT-based deep learning approach
    Kaliyar, Rohit Kumar
    Goswami, Anurag
    Narang, Pratik
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (08) : 11765 - 11788
  • [22] Will ChatGPT Get You Caught? Rethinking of Plagiarism Detection
    Khalil, Mohammad
    Er, Erkan
    [J]. LEARNING AND COLLABORATION TECHNOLOGIES, LCT 2023, PT I, 2023, 14040 : 475 - 487
  • [23] Kondratyuk D, 2024, Arxiv, DOI arXiv:2312.14125
  • [24] Kudo T, 2018, Arxiv, DOI [arXiv:1808.06226, DOI 10.48550/ARXIV.1808.06226, 10.48550/arXiv.1808.06226]
  • [25] Li LY, 2023, Arxiv, DOI arXiv:2304.14072
  • [26] Liu Y, 2023, Arxiv, DOI [arXiv:2306.05499, DOI 10.48550/ARXIV.2306.05499]
  • [27] Liu YH, 2019, Arxiv, DOI [arXiv:1907.11692, DOI 10.48550/ARXIV.1907.11692, 10.48550/arXiv.1907.11692]
  • [28] Natural language processing for analyzing online customer reviews: a survey, taxonomy, and open research challenges
    Malik, Nadia
    Bilal, Muhammad
    [J]. PEERJ COMPUTER SCIENCE, 2024, 10
  • [29] Martinelli F., 2024, P 28 INT C KNOWL BAS
  • [30] Miao YB, 2024, Arxiv, DOI arXiv:2305.16617