Comparative Study between Big Data Analysis Techniques in Intrusion Detection

被引:19
作者
Hafsa, Mounir [1 ]
Jemili, Farah [2 ]
机构
[1] Univ Sousse, Higher Inst Comp Sci & Telecom ISITCOM, Hammam Sousse 4011, Tunisia
[2] Univ Sousse, Higher Inst Comp Sci & Telecom ISITCOM, MARS Res Lab LR17ES05, Hammam Sousse 4011, Tunisia
关键词
intrusion detection system; machine learning; Apache Spark; Structured Streaming; Big Data; Decision Trees; Microsoft Azure Cloud;
D O I
10.3390/bdcc3010001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cybersecurity ventures expect that cyber-attack damage costs will rise to $11.5 billion in 2019 and that a business will fall victim to a cyber-attack every 14 seconds. Notice here that the time frame for such an event is seconds. With petabytes of data generated each day, this is a challenging task for traditional intrusion detection systems (IDSs). Protecting sensitive information is a major concern for both businesses and governments. Therefore, the need for a real-time, large-scale and effective IDS is a must. In this work, we present a cloud-based, fault tolerant, scalable and distributed IDS that uses Apache Spark Structured Streaming and its Machine Learning library (MLlib) to detect intrusions in real-time. To demonstrate the efficacy and effectivity of this system, we implement the proposed system within Microsoft Azure Cloud, as it provides both processing power and storage capabilities. A decision tree algorithm is used to predict the nature of incoming data. For this task, the use of the MAWILab dataset as a data source will give better insights about the system capabilities against cyber-attacks. The experimental results showed a 99.95% accuracy and more than 55,175 events per second were processed by the proposed system on a small cluster.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 24 条
[1]  
[Anonymous], AZ REG
[2]  
apache, Structured streaming programming guide-Spark 3.3.0 documentation
[3]  
Ar L., 2003, P SIAM C APPL DYN SY
[4]   Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark [J].
Armbrust, Michael ;
Das, Tathagata ;
Torres, Joseph ;
Yavuz, Burak ;
Zhu, Shixiong ;
Xin, Reynold ;
Ghodsi, Ali ;
Stoica, Ion ;
Zaharia, Matei .
SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, :601-613
[5]   Performance evaluation of intrusion detection based on machine learning using Apache Spark [J].
Belouch, Mustapha ;
El Hadaj, Salah ;
Idhammad, Mohamed .
PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS2017), 2018, 127 :1-6
[6]  
Callegari C., 2016, P INT C FUT NETW SYS
[7]  
Casas P., 2017, P IEEE 6 INT C CLOUD
[8]  
Essid M., 2016, P 2016 IEEE INT C SY
[9]  
Fontugne R., 2010, P 6 INT C, P1, DOI DOI 10.1145/1921168.1921179
[10]  
Gaied I, 2015, I C COMP SYST APPLIC