Email Classification and Forensics Analysis using Machine Learning

被引:10
作者
Hina, Maryam [1 ]
Ali, Mohsan [2 ]
Javed, Abdul Rehman [3 ]
Srivastava, Gautam [4 ]
Gadekallu, Thippa Reddy [5 ]
Jalil, Zunera [3 ]
机构
[1] Air Univ, Dept Comp Sci, Islamabad, Pakistan
[2] Air Univ, Natl Ctr Cyber Secur, Islamabad, Pakistan
[3] Air Univ, Dept Cyber Secur, Islamabad, Pakistan
[4] Brandon Univ, Dept Math & Comp Sci, Brandon, MB R7A 6A9, Canada
[5] Vellore Inst Technol, Sch Informat Technol & Engn, Vellore, Tamil Nadu, India
来源
2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021) | 2021年
关键词
Digital Forensics; Machine Learning; Email Forensics; Fraud Detection; Crime Investigation;
D O I
10.1109/SWC50871.2021.00093
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emails are being used as a reliable, secure, and formal mode of communication for a long time. With fast and secure communication technologies, reliance on Email has increased as well. The massive increase in email data has led to a big challenge in managing emails. Emails so far can be classified and grouped based on sender, size, and date. However, there is a need to detect and classify emails based on the contents contained therein. Several approaches have been used in the past for content-based classification of emails as Spam or Non-Spam Email. In this paper, we propose a multi-label email classification approach to organize emails. An efficient classification method has been proposed for forensic investigations of massive email data (e.g., a disk image of an email server). This method would help the investigator in Email related crimes investigations. A comparative study of machine learning algorithms identified Logistic Regression as a method that achieves the highest accuracy compared to Naive Bayes, Stochastic Gradient Descent, Random Forest, and Support Vector Machine. Experiments conducted on benchmark data sets depicted that logistic Regression performs best, with an accuracy of 91.9% with bi-gram features.
引用
收藏
页码:630 / 635
页数:6
相关论文
共 50 条
  • [21] Comparative Analysis of Network Fault Classification Using Machine Learning
    Kawasaki, Junichi
    Mouri, Genichi
    Suzuki, Yusuke
    NOMS 2020 - PROCEEDINGS OF THE 2020 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM 2020: MANAGEMENT IN THE AGE OF SOFTWARIZATION AND ARTIFICIAL INTELLIGENCE, 2020,
  • [22] Apricot Stone Classification Using Image Analysis and Machine Learning
    Ropelewska, Ewa
    Rady, Ahmed M.
    Watson, Nicholas J.
    SUSTAINABILITY, 2023, 15 (12)
  • [23] Classification of Forest Vertical Structure Using Machine Learning Analysis
    Kwon, Soo-Kyung
    Lee, Yong-Suk
    Kim, Dae-Seong
    Jung, Hyung-Sup
    KOREAN JOURNAL OF REMOTE SENSING, 2019, 35 (02) : 229 - 239
  • [24] Email fraud attack detection using hybrid machine learning approach
    Yaseen Y.A.
    Qasaimeh M.
    Al-Qassas R.S.
    Al-Fayoumi M.
    Recent Advances in Computer Science and Communications, 2021, 14 (05) : 1370 - 1380
  • [25] Curated Datasets and Feature Analysis for Phishing Email Detection with Machine Learning
    Champa, Arifa I.
    Rabbi, Md Fazle
    Zibran, Minhaz F.
    2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
  • [26] Smartphone-sensor-based human activities classification for forensics: a machine learning approach
    Nchouwat Ndumgouo Ibrahim Moubarak
    Njutapmvoui Mbah Mohamed Omar
    Vepouyoum Njouokouo Youssef
    Journal of Electrical Systems and Information Technology, 11 (1)
  • [27] Using machine learning for communication classification
    Stefan P. Penczynski
    Experimental Economics, 2019, 22 : 1002 - 1029
  • [28] Classification of Diabetes using Machine Learning
    Ul Islam, Nair
    Khanam, Ruqaiya
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021, : 185 - +
  • [29] Using machine learning for communication classification
    Penczynski, Stefan P.
    EXPERIMENTAL ECONOMICS, 2019, 22 (04) : 1002 - 1029
  • [30] Unsupervised Machine Learning for Drone Forensics through Flight Path Analysis
    Syed, Naeem
    Khan, Majid Ali
    Mohammad, Nazeeruddin
    Ben Brahim, Ghassen
    Baig, Zubair
    2022 10TH INTERNATIONAL SYMPOSIUM ON DIGITAL FORENSICS AND SECURITY (ISDFS), 2022,