Classification of Human and Machine-Generated Texts Using Lexical Features and Supervised/Unsupervised Machine Learning Algorithms

被引:0
|
作者
Rojas-Simon, Jonathan [1 ]
Ledeneva, Yulia [1 ]
Arnulfo Garcia-Hernandez, Rene [1 ]
机构
[1] Autonomous Univ State Mexico, Inst Literario 100, Toluca 50000, State Of Mexico, Mexico
来源
关键词
Large-Language Models (LLMs); AuTexTification; Lexical Features; Supervised/Unsupervised Learning Algorithms; Text representation models;
D O I
10.1007/978-3-031-62836-8_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In today's digital information era, distinguishing between human- and machine-generated texts has become a focus of study in academia and industry. This is because Large-Language Models (LLMs) can produce high-quality texts, posing a challenge to the legitimacy and authenticity of texts. In this regard, it is essential to create methods and models that can differentiate whether a human or an LLM wrote a text. Therefore, this paper explores the effectiveness of supervised and unsupervised machine learning algorithms using lexical features. Mainly, we focused on traditional algorithms, such as Multilayer Perceptron (MLP), Naive Bayes (NB), Logistic Regression (LR), Agglomerative Hierarchical Clustering (AHC), and K-means Clustering (KC). Obtained results have been compared to state-of-the-art approaches presented in the Automated Text Identification (AuTexTification) shared task, serving as reference methods. Moreover, we have found that both NB and KC may achieve competitive results in the before-mentioned task.
引用
收藏
页码:331 / 341
页数:11
相关论文
共 50 条
  • [1] Protostellar classification using supervised machine learning algorithms
    Miettinen, O.
    ASTROPHYSICS AND SPACE SCIENCE, 2018, 363 (09)
  • [2] Protostellar classification using supervised machine learning algorithms
    O. Miettinen
    Astrophysics and Space Science, 2018, 363
  • [3] Text Message Classification Using Supervised Machine Learning Algorithms
    Merugu, Suresh
    Reddy, M. Chandra Shekhar
    Goyal, Ekansh
    Piplani, Lakshay
    ICCCE 2018, 2019, 500 : 141 - 150
  • [4] Classification of lidar measurements using supervised and unsupervised machine learning methods
    Farhani, Ghazal
    Sica, Robert J.
    Daley, Mark Joseph
    ATMOSPHERIC MEASUREMENT TECHNIQUES, 2021, 14 (01) : 391 - 402
  • [5] A simple and robust wetland classification approach by using optical indices, unsupervised and supervised machine learning algorithms
    Ahmed, Kazi Rifat
    Akter, Simu
    Marandi, Andres
    Schuth, Christoph
    REMOTE SENSING APPLICATIONS-SOCIETY AND ENVIRONMENT, 2021, 23
  • [6] Perceptions of Human and Machine-Generated Articles
    Tewari, Shubhra
    Zabounidis, Renos
    Kothari, Ammina
    Bailey, Reynold
    Alm, Cecilia Ovesdotter
    DIGITAL THREATS: RESEARCH AND PRACTICE, 2021, 2 (02):
  • [7] A Comparison of Human and Machine-Generated Voice
    Abdulrahman, Amal
    Richards, Deborah
    Bilgin, Ayse Aysin
    25TH ACM SYMPOSIUM ON VIRTUAL REALITY SOFTWARE AND TECHNOLOGY (VRST 2019), 2019,
  • [8] Classification of Space Particle Events using Supervised Machine Learning Algorithms
    Saric, Rijad
    Chen, Junchao
    Krstic, Milos
    Custovic, Edhem
    Panic, Goran
    Kevric, Jasmin
    Jokic, Dejan
    2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,
  • [9] Hindi Poetry Classification using Eager Supervised Machine Learning Algorithms
    Bafna, Prafulla
    Saini, Jatinderkumar R.
    2020 INTERNATIONAL CONFERENCE ON EMERGING SMART COMPUTING AND INFORMATICS (ESCI), 2020, : 175 - 178
  • [10] Diabetic retinopathy classification for supervised machine learning algorithms
    Nakayama, Luis Filipe
    Ribeiro, Lucas Zago
    Goncalves, Mariana Batista
    Ferraz, Daniel A.
    dos Santos, Helen Nazareth Veloso
    Malerbi, Fernando Korn
    Morales, Paulo Henrique
    Maia, Mauricio
    Regatieri, Caio Vinicius Saito
    Mattos, Rubens Belfort, Jr.
    INTERNATIONAL JOURNAL OF RETINA AND VITREOUS, 2022, 8 (01)