Novel set of general descriptive features for enhanced detection of malicious emails using machine learning methods

被引：22

作者：

Cohen, Aviad ^{[1
,2
]}

Nissim, Nir ^{[1
,3
]}

Elovici, Yuval ^{[1
,2
]}

机构：

[1] Ben Gurion Univ Negev, Cyber Secur Res Ctr, Malware Lab, Beer Sheva, Israel

[2] Ben Gurion Univ Negev, Dept Software & Informat Syst Engn, Beer Sheva, Israel

[3] Ben Gurion Univ Negev, Dept Ind Engn & Management, Beer Sheva, Israel

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2018年 / 110卷

关键词：

Email; Detection; Machine learning; Analysis; Malware; Features; CLASSIFICATION; ACCURACY; AUC;

D O I：

10.1016/j.eswa.2018.05.031

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, cyber-attacks against businesses and organizations have increased. Such attacks usually result in significant damage to the organization, such as the loss and/or leakage of sensitive and confidential information. Because email communication is an integral part of daily business operations, attackers frequently leverage email as an attack vector in order to initially penetrate the targeted organization. Email message allows the attacker to deliver dangerous content to the victim, such as malicious attachments or links to malicious websites. Existing email analysis solutions analyze only specific parts of the email using rule-based methods, while other important parts remain unanalyzed. Existing anti-virus engines primarily use signature-based detection methods, and therefore are insufficient for detecting new unknown malicious emails. Machine learning methods have been shown to be effective at detecting maliciousness in various domains and particularly in email. Previous works which used machine learning methods suggested sets of features which offer a limited perspective over the whole email message. In this paper, we propose a novel set of general descriptive features extracted from all email components (header, body, and attachments) for enhanced detection of malicious emails using machine learning methods. The proposed features are extracted just from the email itself; therefore, our features are independent, since the extraction process does not require an Internet connection or the use of external services or other tools, thereby meeting the needs of real-time detection systems. We conducted an extensive evaluation of our new novel features against sets of features suggested by previous academic work using a collection of 33,142 emails which contains 38.73% malicious and 61.27% benign emails. The results show that malicious emails can be detected effectively when using our novel features with machine learning algorithms. Moreover, our novel features enhance the detection of malicious emails when used in conjunction with features suggested by related work. The Random Forest classifier achieved the highest detection rates, with an AUC of 0.929, true positive rate (TPR) of 0.947, and false positive rate (FPR) of 0.03. We also present the IDR (integrated detection rate), a new measure which helps calibrate the threshold of a machine learning classifier in order to achieve the optimal TP and FP rates, which are the most important measures for a real-time and practical cyber-security application. (C) 2018 Elsevier Ltd. All rights reserved.

引用

页码：143 / 169

页数：27

共 50 条

[41] Malicious Network Traffic Detection for DNS over HTTPS using Machine Learning Algorithms
Casanova, Lionel F. Gonzalez
Lin, Po-Chiang
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2023, 12 (02)
[42] BGP Dataset-Based Malicious User Activity Detection Using Machine Learning
Park, Hansol
Kim, Kookjin
Shin, Dongil
Shin, Dongkyoo
INFORMATION, 2023, 14 (09)
[43] Time and Computation Efficient Malicious Android Application Detection Using Machine Learning Techniques
Saqlain, Sabbir Ahmed
Bin Mahamud, Navid
Paul, Mahit Kumar
Sattar, A. H. M. Sarowar
2019 5TH INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL ENGINEERING (ICAEE), 2019, : 536 - 540
[44] A Study on Detection of Malicious Behavior Based on Host Process Data Using Machine Learning
Han, Ryeobin
Kim, Kookjin
Choi, Byunghun
Jeong, Youngsik
APPLIED SCIENCES-BASEL, 2023, 13 (07):
[45] Popularity-Based Detection of Malicious Content in Facebook Using Machine Learning Approach
Sahoo, Somya Ranjan
Gupta, B. B.
FIRST INTERNATIONAL CONFERENCE ON SUSTAINABLE TECHNOLOGIES FOR COMPUTATIONAL INTELLIGENCE, 2020, 1045 : 163 - 176
[46] Black box attack and network intrusion detection using machine learning for malicious traffic
Zhu, Yiran
Cui, Lei
Ding, Zhenquan
Li, Lun
Liu, Yongji
Hao, Zhiyu
COMPUTERS & SECURITY, 2022, 123
[47] Phishing URL detection using machine learning methods
Ahammad, S. K. Hasane
Kale, Sunil D.
Upadhye, Gopal D.
Pande, Sandeep Dwarkanath
Babu, E. Venkatesh
Dhumane, Amol, V
Bahadur, Dilip Kumar Jang
ADVANCES IN ENGINEERING SOFTWARE, 2022, 173
[48] Novel Features Extraction From EEG Signals for Epilepsy Detection Using Machine Learning Model
Pandya, Vandana
Shukla, Urvashi P.
Joshi, Amit M.
IEEE SENSORS LETTERS, 2023, 7 (10)
[49] Detection of physical activity using machine learning methods
Denes-Fazakas, Lehel
Szilagyi, Laszlo
Tasic, Jelena
Kovacs, Levente
Eigner, Gyorgy
2020 IEEE 20TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS (CINTI), 2020,
[50] Detection of child depression using machine learning methods
Haque, Umme Marzia
Kabir, Enamul
Khanam, Rasheda
PLOS ONE, 2021, 16 (12):

← 1 2 3 4 5 →