Assessing the Impact of Batch-Based Data Aggregation Techniques for Feature Engineering on Machine Learning-Based Network IDSs

被引:2
作者
Magan-Carrion, Roberto [1 ]
Urda, Daniel [2 ]
Diaz-Cano, Ignacio [3 ]
Dorronsoro, Bernabe [4 ]
机构
[1] Univ Granada, Network Engn & Secur Grp, Dept Signal Theory Commun & Telemat, Granada, Spain
[2] Univ Burgos, Grp Inteligencia Computac Aplicada GICAP, Dept Ingn Informat, Escuela Politecn Super, Av Cantabria S-N, Burgos 09006, Spain
[3] Univ Cadiz, Appl Robot Grp, Dept Automat Elect Comp Architecture & Com Net En, Cadiz, Spain
[4] Univ Cadiz, Dept Comp Engn, Graph Methods Optimizat & Learning GOAL Grp, Cadiz, Spain
来源
14TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN SECURITY FOR INFORMATION SYSTEMS AND 12TH INTERNATIONAL CONFERENCE ON EUROPEAN TRANSNATIONAL EDUCATIONAL (CISIS 2021 AND ICEUTE 2021) | 2022年 / 1400卷
关键词
Machine learning; Feature engineering; NIDS; Cybersecurity; Information security;
D O I
10.1007/978-3-030-87872-6_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Communication networks and systems are continuously threatened by a great variety of cybersecurity attacks coming from new malware that targets old and new systems' vulnerabilities. In this sense, Intrusion Detection Systems (IDSs) and, specifically, Network IDSs (NIDSs) are used to count on robust methods and techniques to detect and classify security attacks. One of the important parts in the assessment of NIDSs, is the Feature Engineering (FE) process, where raw datasets are transformed onto derived ones where both, features and observations are smartly transformed. In this work, the ff4ml framework, which includes the Feature as a Counter (FaaC) FE approach, is used to transform raw features into new ones that are counters of the originals. The FaaC approach aggregates raw observations by time intervals, thus limiting its use to network datasets containing timestamps. This work proposes a batch-based aggregation technique that allows applying FaaC in timestamp-less datasets and analyzes its impact on the performance of Machine Learning (ML)-based NIDSs in comparison to timestamp-based aggregation approaches.
引用
收藏
页码:116 / 125
页数:10
相关论文
共 50 条
  • [1] Feature engineering process on well log data for machine learning-based SAGD performance prediction
    Kim, Namhwa
    Shin, Hyundon
    Lee, Kyungbook
    GEOENERGY SCIENCE AND ENGINEERING, 2023, 229
  • [2] Optimized feature engineering for machine learning-based emotion recognition from human speech
    Anuja Thakur
    Sanjeev Kumar Dhull
    Signal, Image and Video Processing, 2025, 19 (8)
  • [3] Reviewing various feature selection techniques in machine learning-based botnet detection
    Baruah, Sangita
    Borah, Dhruba Jyoti
    Deka, Vaskar
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (12)
  • [4] Impact of Feature Normalization on Machine Learning-Based Human Fall Detection
    Fayad, Moustafa
    Hachani, Mohamed-Yacine
    Mostefaoui, Ahmed
    Merzoug, Mohammed Amine
    Lajoie, Isabelle
    Yahiaoui, Reda
    MANAGEMENT OF DIGITAL ECOSYSTEMS, MEDES 2023, 2024, 2022 : 147 - 161
  • [5] Machine learning-based intrusion detection: feature selection versus feature extraction
    Ngo, Vu-Duc
    Vuong, Tuan-Cuong
    Van Luong, Thien
    Tran, Hung
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (03): : 2365 - 2379
  • [6] Machine learning-based network intrusion detection for big and imbalanced data using oversampling, stacking feature embedding and feature extraction
    Md. Alamin Talukder
    Md. Manowarul Islam
    Md Ashraf Uddin
    Khondokar Fida Hasan
    Selina Sharmin
    Salem A. Alyami
    Mohammad Ali Moni
    Journal of Big Data, 11
  • [7] Comparison of Multiple Feature Selection Techniques for Machine Learning-Based Detection of IoT Attacks
    Viet Anh Phan
    Jerabek, Jan
    Malina, Lukas
    19TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY, AND SECURITY, ARES 2024, 2024,
  • [8] Increasing the Performance of Machine Learning-Based IDSs on an Imbalanced and Up-to-Date Dataset
    Karatas, Gozde
    Demir, Onder
    Sahingoz, Ozgur Koray
    IEEE ACCESS, 2020, 8 : 32150 - 32162
  • [9] Machine learning-based network intrusion detection for big and imbalanced data using oversampling, stacking feature embedding and feature extraction
    Talukder, Md. Alamin
    Islam, Md. Manowarul
    Uddin, Md Ashraf
    Hasan, Khondokar Fida
    Sharmin, Selina
    Alyami, Salem A.
    Moni, Mohammad Ali
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [10] Assessing the Performance of a Machine Learning-Based Hybrid Model in Downscaling Precipitation Data
    Rouzegari, Nazak
    Nourani, Vahid
    Ludwig, Ralf
    Laux, Patrick
    PROCEEDINGS OF 7TH INTERNATIONAL CONFERENCE ON HARMONY SEARCH, SOFT COMPUTING AND APPLICATIONS (ICHSA 2022), 2022, 140 : 235 - 245