Methodologies for Email Spam Classification using Large Language Models

被引:0
作者
De La Noval, Alejandro [1 ]
Gutierrez, Diana [1 ]
Soni, Jayesh [1 ]
Upadhyay, Himanshu [1 ]
Perez-Pons, Alexander [1 ]
Lagos, Leonel [1 ]
机构
[1] Florida Int Univ, Coll Engn & Comp, Miami, FL 33199 USA
来源
2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023 | 2023年
关键词
machine learning; natural language processing; email spam; classification;
D O I
10.1109/CSCI62032.2023.00034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Email spam classification is an issue that has been around almost since the inception of the email. These tricky emails fool individuals into giving them money, personal information, and more. Several models, such as text classification and Machine Learning, have been used for spam email classification. Despite their beneficial and widespread use, Deep Learning models, which are capable of better understanding the nuances of language, have shown incredible promise in this field; they use a transformer architecture that allows the model to grasp complex language concepts and make relationships and patterns for the data presented. Three methods for email scam classification were demonstrated in this work. Here, we compare class explain ability results from zero-shot summarization, model interpretation via feature importance extraction, and model fine-tuning.
引用
收藏
页码:179 / 185
页数:7
相关论文
共 24 条
[1]   Machine Learning Techniques for Spam Detection in Email and IoT Platforms: Analysis and Research Challenges [J].
Ahmed, Naeem ;
Amin, Rashid ;
Aldabbas, Hamza ;
Koundal, Deepika ;
Alouffi, Bader ;
Shah, Tariq .
SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
[2]  
Almeida T., 2012, SMS Spam Collection
[3]   Ham and Spam E-Mails Classification Using Machine Learning Techniques [J].
Bassiouni, M. ;
Ali, M. ;
El-Dahshan, E. A. .
JOURNAL OF APPLIED SECURITY RESEARCH, 2018, 13 (03) :315-331
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]  
Ceci L., 2023, Number of sent and received emails per day worldwide from 2017 to 2026
[6]  
Ceci L., 2022, Leadingforms ofpersonal communications among users in the United States as of January 2022
[7]  
Explosion, English pipeline optimizedfor CPU
[8]   A Support Vector Machine based Naive Bayes Algorithm for Spam Filtering [J].
Feng, Weimiao ;
Sun, Jianguo ;
Zhang, Liguo ;
Cao, Cuiling ;
Yang, Qing .
2016 IEEE 35TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2016,
[9]  
Hopkins M., 1999, Spambase, UCI Machine Learning Repository
[10]   Machine Learning Approach for Malware Detection Using Random Forest Classifier on Process List Data Structure [J].
Joshi, Santosh ;
Upadhyay, Himanshu ;
Lagos, Leonel ;
Akkipeddi, Naga Suryamitra ;
Guerra, Valerie .
2ND INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND DATA MINING (ICISDM 2018), 2018, :98-102