Classification of Human and Machine-Generated Texts Using Lexical Features and Supervised/Unsupervised Machine Learning Algorithms

被引:0
|
作者
Rojas-Simon, Jonathan [1 ]
Ledeneva, Yulia [1 ]
Arnulfo Garcia-Hernandez, Rene [1 ]
机构
[1] Autonomous Univ State Mexico, Inst Literario 100, Toluca 50000, State Of Mexico, Mexico
来源
关键词
Large-Language Models (LLMs); AuTexTification; Lexical Features; Supervised/Unsupervised Learning Algorithms; Text representation models;
D O I
10.1007/978-3-031-62836-8_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In today's digital information era, distinguishing between human- and machine-generated texts has become a focus of study in academia and industry. This is because Large-Language Models (LLMs) can produce high-quality texts, posing a challenge to the legitimacy and authenticity of texts. In this regard, it is essential to create methods and models that can differentiate whether a human or an LLM wrote a text. Therefore, this paper explores the effectiveness of supervised and unsupervised machine learning algorithms using lexical features. Mainly, we focused on traditional algorithms, such as Multilayer Perceptron (MLP), Naive Bayes (NB), Logistic Regression (LR), Agglomerative Hierarchical Clustering (AHC), and K-means Clustering (KC). Obtained results have been compared to state-of-the-art approaches presented in the Automated Text Identification (AuTexTification) shared task, serving as reference methods. Moreover, we have found that both NB and KC may achieve competitive results in the before-mentioned task.
引用
收藏
页码:331 / 341
页数:11
相关论文
共 50 条
  • [31] Unsupervised Feature Learning Classification Using An Extreme Learning Machine
    Lam, Dao
    Wunsch, Donald
    2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [32] Petrofacies classification using machine learning algorithms
    Silva, Adrielle A.
    Tavares, Monica W.
    Carrasquilla, Abel
    Missagia, Roseane
    Ceia, Marco
    GEOPHYSICS, 2020, 85 (04) : WA101 - WA113
  • [33] Petrofacies classification using machine learning algorithms
    Silva A.A.
    Tavares M.W.
    Carrasquilla A.
    Misságia R.
    Ceia M.
    Silva, Adrielle A. (adrielle@lenep.uenf.br), 1600, Society of Exploration Geophysicists (85): : WA101 - WA113
  • [34] Analysis and classification of heart diseases using heartbeat features and machine learning algorithms
    Fajr Ibrahem Alarsan
    Mamoon Younes
    Journal of Big Data, 6
  • [35] Analysis and classification of heart diseases using heartbeat features and machine learning algorithms
    Alarsan, Fajr Ibrahem
    Younes, Mamoon
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [36] Classification of Stroke Victims through Supervised Machine Learning Algorithms and Ensemble Learning
    Hensley, Dalton
    Elgazzar, Heba
    2022 IEEE 12TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2022, : 58 - 64
  • [37] Kernel Approaches to Unsupervised and Supervised Machine Learning
    Kung, Sun-Yuan
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2009, 2009, 5879 : 1 - 32
  • [38] Automatic Patents Classification Using Supervised Machine Learning
    Shahid, Muhammad
    Ahmed, Adeel
    Mushtaq, Muhammad Faheem
    Ullah, Saleem
    Matiullah
    Akram, Urooj
    RECENT ADVANCES ON SOFT COMPUTING AND DATA MINING (SCDM 2020), 2020, 978 : 297 - 307
  • [39] A Review of Supervised Machine Learning Algorithms
    Singh, Amanpreet
    Thakur, Narina
    Sharma, Aakanksha
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 1310 - 1315
  • [40] Classification of Migraine Disease using Supervised Machine Learning
    Gulati, Seema
    Guleria, Kalpna
    Goyal, Nitin
    2022 10th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions), ICRITO 2022, 2022,