Intrusion Detection Model Based on TF.IDF and C4.5 Algorithms

被引:6
作者
Awadh, Khaldoon [1 ]
Akbas, Ayhan [2 ]
机构
[1] Univ Turkish Aeronaut Assoc, Comp Engn Dept, Ankara, Turkey
[2] Cankiri Karatekin Univ, Comp Engn Dept, Cankiri, Turkey
来源
JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI | 2021年 / 24卷 / 04期
关键词
IDS; TF.IDF; data mining; machine learning; network security;
D O I
10.2339/politeknik.693221
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In recent years, the use of machine learning and data mining technologies has drawn researchers' attention to new ways to improve the performance of Intrusion Detection Systems (IDS). These techniques have proven to be an effective method in distinguishing malicious network packets. One of the most challenging problems that researchers are faced with is the transformation of data into a form that can be handled effectively by Machine Learning Algorithms (MLA). In this paper, we present an IDS model based on the decision tree C4.5 algorithm with transforming simulated UNSW-NB15 dataset as a pre-processing operation. Our model uses Term Frequency.Inverse Document Frequency (TF.IDF) to convert data types to an acceptable and efficient form for machine learning to achieve high detection performance. The model has been tested with randomly selected 250000 records of the UNSW-NB15 dataset. Selected records have been grouped into various segment sizes, like 50, 500, 1000, and 5000 items. Each segment has been, further, grouped into two subsets of multi and binary class datasets. The performance of the Decision Tree C4.5 algorithm with Multilayer Perceptron (MLP) and Naive Bayes (NB) has been compared in Weka software. Our proposed method significantly has improved the accuracy of classifiers and decreased incorrectly detected instances. The increase in accuracy reflects the efficiency of transforming the dataset with TF.IDF of various segment sizes.
引用
收藏
页码:1691 / 1698
页数:8
相关论文
共 50 条
  • [1] ML Based Intrusion Detection Scheme for various types of attacks in a WSN using C4.5 and CART classifiers
    Gite P.
    Chouhan K.
    Murali Krishna K.
    Kumar Nayak C.
    Soni M.
    Shrivastava A.
    Materials Today: Proceedings, 2023, 80 : 3769 - 3776
  • [2] Intrusion detection using a hybrid support vector machine based on entropy and TF-IDF
    Chen, Rung-Ching
    Chen, Su-Ping
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2008, 4 (02): : 413 - 424
  • [3] NeC4.5: Neural ensemble based C4.5
    Zhou, ZH
    Jiang, YA
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (06) : 770 - 773
  • [4] An Improved TANC Classification Algorith Based on C4.5
    Zhao Xiao-qiang
    Yang Jia-min
    26TH CHINESE CONTROL AND DECISION CONFERENCE (2014 CCDC), 2014, : 4992 - 4996
  • [5] A Combined Classification Algorithm Based on C4.5 and NB
    Jiang, Liangxiao
    Li, Chaoqun
    Wu, Jia
    Zhu, Jian
    ADVANCES IN COMPUTATION AND INTELLIGENCE, PROCEEDINGS, 2008, 5370 : 350 - +
  • [6] Detection of Phishing Websites Using C4.5 Data Mining Algorithm
    Priya, Akansha
    Meenakshi, Er.
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2017, : 1468 - 1472
  • [7] Research on C4.5 algorithm improvement strategy based on MapReduce
    Wang, Huan-Bin
    Gao, Yang-Jun
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY, 2021, 183 : 160 - 165
  • [8] Comparison of Classification Data Mining C4.5 and Naive Bayes Algorithms of EDM Dataset
    Santoso, Joseph Teguh
    Ginantra, Ni Luh Wiwik Sri Rahayu
    Arifin, Muhammad
    Riinawati, R.
    Sudrajat, Dadang
    Rahim, Robbi
    TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2021, 10 (04): : 1738 - 1744
  • [9] Best-Fit Learning Curve Model for the C4.5 Algorithm
    Brumen, Bostjan
    Rozman, Ivan
    Hericko, Marjan
    Cernezel, Ales
    Hoelbl, Marko
    INFORMATICA, 2014, 25 (03) : 385 - 399
  • [10] New Approach to shorten Feature Set via TF-IDF for Machine Learning-based Webshell Detection
    Viet Anh Phan
    Jerabek, Jan
    Dinh Khanh Le
    Gotthans, Tomas
    2024 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE, CSR, 2024, : 50 - 55