Novel set of general descriptive features for enhanced detection of malicious emails using machine learning methods

被引:22
作者
Cohen, Aviad [1 ,2 ]
Nissim, Nir [1 ,3 ]
Elovici, Yuval [1 ,2 ]
机构
[1] Ben Gurion Univ Negev, Cyber Secur Res Ctr, Malware Lab, Beer Sheva, Israel
[2] Ben Gurion Univ Negev, Dept Software & Informat Syst Engn, Beer Sheva, Israel
[3] Ben Gurion Univ Negev, Dept Ind Engn & Management, Beer Sheva, Israel
关键词
Email; Detection; Machine learning; Analysis; Malware; Features; CLASSIFICATION; ACCURACY; AUC;
D O I
10.1016/j.eswa.2018.05.031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, cyber-attacks against businesses and organizations have increased. Such attacks usually result in significant damage to the organization, such as the loss and/or leakage of sensitive and confidential information. Because email communication is an integral part of daily business operations, attackers frequently leverage email as an attack vector in order to initially penetrate the targeted organization. Email message allows the attacker to deliver dangerous content to the victim, such as malicious attachments or links to malicious websites. Existing email analysis solutions analyze only specific parts of the email using rule-based methods, while other important parts remain unanalyzed. Existing anti-virus engines primarily use signature-based detection methods, and therefore are insufficient for detecting new unknown malicious emails. Machine learning methods have been shown to be effective at detecting maliciousness in various domains and particularly in email. Previous works which used machine learning methods suggested sets of features which offer a limited perspective over the whole email message. In this paper, we propose a novel set of general descriptive features extracted from all email components (header, body, and attachments) for enhanced detection of malicious emails using machine learning methods. The proposed features are extracted just from the email itself; therefore, our features are independent, since the extraction process does not require an Internet connection or the use of external services or other tools, thereby meeting the needs of real-time detection systems. We conducted an extensive evaluation of our new novel features against sets of features suggested by previous academic work using a collection of 33,142 emails which contains 38.73% malicious and 61.27% benign emails. The results show that malicious emails can be detected effectively when using our novel features with machine learning algorithms. Moreover, our novel features enhance the detection of malicious emails when used in conjunction with features suggested by related work. The Random Forest classifier achieved the highest detection rates, with an AUC of 0.929, true positive rate (TPR) of 0.947, and false positive rate (FPR) of 0.03. We also present the IDR (integrated detection rate), a new measure which helps calibrate the threshold of a machine learning classifier in order to achieve the optimal TP and FP rates, which are the most important measures for a real-time and practical cyber-security application. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:143 / 169
页数:27
相关论文
共 50 条
  • [31] Detection and classification of darknet traffic using machine learning methods
    Ugurlu, Mesut
    Dogru, Ibrahim Alper
    Arslan, Recep Sinan
    JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2023, 38 (03): : 1737 - 1746
  • [32] Investigation of Lung Sounds Features for Detection of Bronchitis and COPD Using Machine Learning Methods
    Porieva, H. S.
    Ivanko, K. O.
    Semkiv, C., I
    Vaityshyn, V., I
    VISNYK NTUU KPI SERIIA-RADIOTEKHNIKA RADIOAPARATOBUDUVANNIA, 2021, (84): : 78 - 87
  • [33] Detection of Coronavirus Disease Using a Novel Machine Learning Approach
    Salau, Ayodeji Olalekan
    2021 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATION (DASA), 2021,
  • [34] Malicious PDF Documents Detection using Machine Learning Techniques A Practical Approach with Cloud Computing Applications
    Torres, Jose
    De Los Santos, Sergio
    ICISSP: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY, 2018, : 337 - 344
  • [35] Detection of different windows PE malware using machine learning methods
    Kocak, Aynur
    Sogut, Esra
    Alkan, Mustafa
    Erdem, O. Ayhan
    JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2023, 26 (03): : 1185 - 1197
  • [36] Detection of coronavirus disease using texture analysis and machine learning methods
    Bourouis, Sami
    INTERNATIONAL JOURNAL OF INTELLIGENT ENGINEERING INFORMATICS, 2022, 10 (03) : 196 - 211
  • [37] Overview of Machine Learning Methods for Stroke Detection using Weather Data
    Ploscar, Andreea Alina
    Marc, Anastasia-Daria
    Aldea, Cristina Caterina
    Coroiu, Adriana Mihaela
    2023 25TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING, SYNASC 2023, 2023, : 324 - 331
  • [38] Hate Speech Detection in Social Networks using Machine Learning and Deep Learning Methods
    Toktarova, Aigerim
    Syrlybay, Dariga
    Myrzakhmetova, Bayan
    Anuarbekova, Gulzat
    Rakhimbayeva, Gulbarshin
    Zhylanbaeva, Balkiya
    Suieuova, Nabat
    Kerimbekov, Mukhtar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (05) : 396 - 406
  • [39] Auditory Brainstem Response Detection Using Machine Learning: A Comparison With Statistical Detection Methods
    McKearney, Richard M.
    Bell, Steven L.
    Chesnaye, Michael A.
    Simpson, David M.
    EAR AND HEARING, 2022, 43 (03) : 949 - 960
  • [40] Breast tumour detection using machine learning: review of selected methods from 2015 to 2021
    Sharma, Gouri
    Jindal, Neeru
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (22) : 32161 - 32189