NLP methods in host-based intrusion detection systems: A systematic review and future directions

被引:12
作者
Sworna, Zarrin Tasnim [1 ,2 ]
Mousavi, Zahra [1 ,3 ,4 ]
Babar, Muhammad Ali [1 ,2 ,3 ]
机构
[1] Univ Adelaide, Sch Comp Sci, Adelaide, SA, Australia
[2] Cyber Secur Cooperat Res Ctr, Joondalup, Australia
[3] Univ Adelaide, Ctr Res Engn Software Technol CREST, Adelaide, SA, Australia
[4] CSIRO Data61, Eveleigh, Australia
关键词
Natural language processing; Host-based intrusion detection; Cyber security; Anomaly detection; DATASET;
D O I
10.1016/j.jnca.2023.103761
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Host-based Intrusion Detection System (HIDS) is an effective last line of defense for defending against cyber security attacks after perimeter defenses (e.g., Network-based Intrusion Detection System and Firewall) have failed or been bypassed. HIDS is widely adopted in the industry as HIDS is ranked among the top two most used security tools by Security Operation Centers (SOC) of organizations. Although effective and efficient HIDS is highly desirable for industrial organizations, the evolution of increasingly complex attack patterns causes several challenges resulting in performance degradation of HIDS (e.g., high false alert rate creating alert fatigue for SOC staff). Since Natural Language Processing (NLP) methods are better suited for identifying complex attack patterns, an increasing number of HIDS are leveraging the advances in NLP that have shown effective and efficient performance in precisely detecting low footprint, zero-day attacks and predicting an attacker's next steps. This active research trend of using NLP in HIDS demands a synthesized and comprehensive body of knowledge of NLP-based HIDS. Despite the drastically growing adoption of NLP in HIDS development, there has been relatively little effort allocated to systematically analyze and synthesize the available peer review literature to understand how NLP is used in HIDS development. The lack of a synthesized and comprehensive body of knowledge on such an important topic motivated us to conduct a Systematic Literature Review (SLR) of the papers on the end-to-end pipeline of the use of NLP in HIDS development. For the end-to-end NLP-based HIDS development pipeline, we identify, taxonomically categorize and systematically compare the state-of-the-art of NLP methods usage in HIDS, attacks detected by these NLP methods, datasets and evaluation metrics which are used to evaluate the NLP-based HIDS. We highlight the relevant prevalent practices, considerations, advantages and limitations to support the HIDS developers. We also outline the future research directions for the NLP-based HIDS development.
引用
收藏
页数:29
相关论文
共 124 条
  • [1] Deep learning approaches for anomaly-based intrusion detection systems: A survey, taxonomy, and open issues
    Aldweesh, Arwa
    Derhab, Abdelouahid
    Emam, Ahmed Z.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 189 (189)
  • [2] Review of intrusion detection systems based on deep learning techniques: coherent taxonomy, challenges, motivations, recommendations, substantial analysis and future directions
    Aleesa, A. M.
    Zaidan, B. B.
    Zaidan, A. A.
    Sahar, Nan M.
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (14) : 9827 - 9858
  • [3] Contextual information fusion for intrusion detection: a survey and taxonomy
    Aleroud, Ahmed
    Karabatis, George
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 52 (03) : 563 - 619
  • [4] [Anonymous], 2017, VirusShare
  • [5] [Anonymous], 2014, 9 ANN S INF ASS
  • [6] [Anonymous], 2015, J. Inform. Secur.
  • [7] Bansal P., 2016, COMM COM INF SC, P518, DOI DOI 10.1007/978-981-10-3433-6_62
  • [8] A survey on feature drift adaptation: Definition, benchmark, challenges and future directions
    Barddal, Jean Paul
    Gomes, Heitor Murilo
    Enembreck, Fabricio
    Pfahringer, Bernhard
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2017, 127 : 278 - 294
  • [9] A Survey of Deep Learning Methods for Cyber Security
    Berman, Daniel S.
    Buczak, Anna L.
    Chavis, Jeffrey S.
    Corbett, Cherita L.
    [J]. INFORMATION, 2019, 10 (04)
  • [10] Bojanowski P., 2017, Transactions of the Association for Computational Linguistics, V5, P135, DOI [10.1162/tacl_a_00051, DOI 10.1162/TACLA00051]