Email Classification and Forensics Analysis using Machine Learning

被引:11
作者
Hina, Maryam [1 ]
Ali, Mohsan [2 ]
Javed, Abdul Rehman [3 ]
Srivastava, Gautam [4 ]
Gadekallu, Thippa Reddy [5 ]
Jalil, Zunera [3 ]
机构
[1] Air Univ, Dept Comp Sci, Islamabad, Pakistan
[2] Air Univ, Natl Ctr Cyber Secur, Islamabad, Pakistan
[3] Air Univ, Dept Cyber Secur, Islamabad, Pakistan
[4] Brandon Univ, Dept Math & Comp Sci, Brandon, MB R7A 6A9, Canada
[5] Vellore Inst Technol, Sch Informat Technol & Engn, Vellore, Tamil Nadu, India
来源
2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021) | 2021年
关键词
Digital Forensics; Machine Learning; Email Forensics; Fraud Detection; Crime Investigation;
D O I
10.1109/SWC50871.2021.00093
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emails are being used as a reliable, secure, and formal mode of communication for a long time. With fast and secure communication technologies, reliance on Email has increased as well. The massive increase in email data has led to a big challenge in managing emails. Emails so far can be classified and grouped based on sender, size, and date. However, there is a need to detect and classify emails based on the contents contained therein. Several approaches have been used in the past for content-based classification of emails as Spam or Non-Spam Email. In this paper, we propose a multi-label email classification approach to organize emails. An efficient classification method has been proposed for forensic investigations of massive email data (e.g., a disk image of an email server). This method would help the investigator in Email related crimes investigations. A comparative study of machine learning algorithms identified Logistic Regression as a method that achieves the highest accuracy compared to Naive Bayes, Stochastic Gradient Descent, Random Forest, and Support Vector Machine. Experiments conducted on benchmark data sets depicted that logistic Regression performs best, with an accuracy of 91.9% with bi-gram features.
引用
收藏
页码:630 / 635
页数:6
相关论文
共 50 条
[41]   Email Reply Prediction: A Machine Learning Approach [J].
Ayodele, Taiwo ;
Zhou, Shikun ;
Khusainov, Rinat .
HUMAN INTERFACE AND THE MANAGEMENT OF INFORMATION: INFORMATION AND INTERACTION, PT II, 2009, 5618 :114-123
[42]   Melanoma Classification using Machine Learning and Deep Learning [J].
Tran Anh Vu ;
Pham Quang Son ;
Dinh Nghia Hiep ;
Hoang Quang Huy ;
Nguyen Phan Kien ;
Pham Thi Viet Huong .
2023 1ST INTERNATIONAL CONFERENCE ON HEALTH SCIENCE AND TECHNOLOGY, ICHST 2023, 2023,
[43]   An Email Cyber Threat Intelligence Method Using Domain Ontology and Machine Learning [J].
Venckauskas, Algimantas ;
Toldinas, Jevgenijus ;
Morkevicius, Nerijus ;
Sanfilippo, Filippo .
ELECTRONICS, 2024, 13 (14)
[44]   Analysis and classification of spam email using Artificial Intelligence to identify cyberthreats [J].
Janez Martino, Francisco .
PROCESAMIENTO DEL LENGUAJE NATURAL, 2024, (72) :155-158
[45]   Unlocking the Power of Machine Learning in Cybersecurity Forensics: Identifying Malicious Files [J].
Yavas, Cemil Emre ;
Das, Jiban Krishna ;
Akpomedaye, Bennett ;
Chen, Lei ;
Ji, Yiming .
SECURITY AND MANAGEMENT AND WIRELESS NETWORKS, SAM 2024, ICWN 2024, 2025, 2254 :123-139
[46]   Efficient usage of web forensics, disk forensics and email forensics in successful investigation of cyber crime [J].
Pandey B. ;
Pandey P. ;
Kulmuratova A. ;
Rzayeva L. .
International Journal of Information Technology, 2024, 16 (6) :3815-3824
[47]   Combat vehicle classification using Machine Learning [J].
Zeng, H ;
Huang, J ;
Liang, Y .
ADVANCES IN COMPUTER-ASSISTED RECOGNITION, 1999, 3584 :2-7
[48]   Ternary Function Classification Using Machine Learning [J].
Lukac, Martin ;
Podlaski, Krzysztof ;
Nagayama, Shinobu ;
Kameyama, Michitaka .
2024 IEEE 54TH INTERNATIONAL SYMPOSIUM ON MULTIPLE-VALUED LOGIC, ISMVL 2024, 2024, :65-71
[49]   Classification of Malicious URLs Using Machine Learning [J].
Abad, Shayan ;
Gholamy, Hassan ;
Aslani, Mohammad .
SENSORS, 2023, 23 (18)
[50]   Classification of Diabetes Types using Machine Learning [J].
Adigun, Oyeranmi ;
Oyeranm, Folasade ;
Yekini, Nureni ;
Babatunde, Ronke .
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (09) :152-161