Detecting unknown intrusions from large heterogeneous data through ensemble learning

被引:1
作者
Jemili, Farah [1 ]
Jouini, Khaled [1 ]
Korbaa, Ouajdi [1 ]
机构
[1] Univ Sousse, MARS Res Lab, ISITCom, LR17ES05, Hammam Sousse 4011, Tunisia
来源
INTELLIGENT SYSTEMS WITH APPLICATIONS | 2025年 / 25卷
关键词
Big heterogeneous data; Intrusion detection; Data fusion;
D O I
10.1016/j.iswa.2024.200465
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rapid expansion of data volumes, technological advancements, and the emergence of the Internet of Things (IoT) have heightened concerns regarding the detection of unknown intrusions based on singular sources of network traffic. This progression has led to the generation of vast and diverse datasets originating from various sources including IoT devices, web applications, and web services. Effectively discerning attacks within such a heterogeneous network traffic landscape necessitates the identification of underlying security behaviors, essential for developing an efficient analysis information system. This paper aims to establish a comprehensive framework for network intrusion detection. The proposed methodology involves the synthesis of network features into a universal security database through the utilization of Term Frequency-Inverse Document Frequency Terms (TF-IDF) and semantic Cosine similarity. By amalgamating a diverse array of data flows, a set of universal features is generated, facilitating storage within the newly devised universal representation. Subsequently, Principal Component Analysis (PCA) is employed to reduce the dimensionality of the extensive universal security database while preserving essential information. Leveraging Ensemble Learning, a novel method is introduced for the detection of unknown attacks. The efficacy of the developed database is evaluated using various Machine Learning algorithms, including Na & iuml;ve Bayes, K-Nearest Neighbor, Logistic Regression, Decision Tree, and Random Forest. Furthermore, Ensemble Learning methods are assessed under two distinct scenarios. Experimental findings, conducted on datasets such as CICIDS 2017, NSL-KDD, and UNSW, demonstrate the universality, versatility, and effectiveness of the proposed approach, particularly in accommodating datasets with diverse structures.
引用
收藏
页数:19
相关论文
共 15 条
[1]   Effective SQL Injection Detection: A Fusion of Binary Olympiad Optimizer and Classification Algorithm [J].
Arasteh, Bahman ;
Bouyer, Asgarali ;
Sefati, Seyed Salar ;
Craciunescu, Razvan .
MATHEMATICS, 2024, 12 (18)
[2]   Detecting SQL injection attacks by binary gray wolf optimizer and machine learning algorithms [J].
Arasteh, Bahman ;
Aghaei, Babak ;
Farzad, Behnoud ;
Arasteh, Keyvan ;
Kiani, Farzad ;
Torkamanian-Afshar, Mahsa .
NEURAL COMPUTING & APPLICATIONS, 2024, 36 (12) :6771-6792
[3]  
CICIDS, 2017, About us
[4]   CMShark: A NetFlow and machine-learning based crypto-jacking intrusion-detection method [J].
Danesh, Hamed ;
Karimi, Mohammad Bagher ;
Arasteh, Bahman .
INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2024, 18 (03) :2255-2273
[5]  
Elayni M., 2019, INT C INT SYST DES A
[6]  
github, NSL-KDD
[7]  
github, CIC Flow Meter
[8]   Application of Traditional Machine Learning Models to Detect Abnormal Traffic in the Internet of Things Networks [J].
Istratova, Evgeniya ;
Grif, Mikhail ;
Dostovalov, Dmitry .
COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2021), 2021, 12876 :735-744
[9]  
openargus, ARGUS IDS
[10]   Intrusion detection model using machine learning algorithm on Big Data environment [J].
Othman, Suad Mohammed ;
Ba-Alwi, Fadl Mutaher ;
Alsohybe, Nabeel T. ;
Al-Hashida, Amal Y. .
JOURNAL OF BIG DATA, 2018, 5 (01)