Unlocking the Power of Machine Learning in Cybersecurity Forensics: Identifying Malicious Files

被引：0

作者：

Yavas, Cemil Emre ^{[1
]}

Das, Jiban Krishna ^{[1
]}

Akpomedaye, Bennett ^{[1
]}

Chen, Lei ^{[1
]}

Ji, Yiming ^{[1
]}

机构：

[1] Georgia Southern Univ, Dept Informat Technol, Statesboro, GA 30458 USA

来源：

SECURITY AND MANAGEMENT AND WIRELESS NETWORKS, SAM 2024, ICWN 2024 | 2025年 / 2254卷

基金：

美国国家科学基金会;

关键词：

Cybersecurity; Machine Learning; Malicious Files; Digital Forensics; Cyber Threats; File System Analysis; Hexadecimal Code Analysis;

D O I：

10.1007/978-3-031-86637-1_10

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Our research introduces a novel method for determining the originating software of digital images, which significantly advances digital forensic analysis capabilities. This method involves transforming images into their hexadecimal code representations, thereby stripping away metadata and making the files unrecognizable by conventional identification techniques. Through a meticulous analysis of these hex codes, broken down into 2-character substrings, we construct detailed feature vectors representing the frequency of each substring. Utilizing a diverse array of machine learning models, including RandomForestClassifier, LogisticRegression, and others, our approach successfully identifies the software used to create the images, such as PowerPoint, GIMP, Picasa, and the online tool Batchtools.pro, with an impressive accuracy rate between 97% and 100%. Moreover, this technique enables the detection and flagging of files containing malicious content with nearly perfect accuracy. Our approach not only enhances the understanding of a file's digital lineage but also offers a new mechanism in digital forensics, providing a robust tool for both identifying the software used in file creation and detecting malicious alterations.

引用

页码：123 / 139

页数：17

共 35 条

[11] Exposing Manipulated Photos and Videos in Digital Forensics Analysis [J].

Ferreira, Sara ;

Antunes, Mario ;

Correia, Manuel E. .

JOURNAL OF IMAGING, 2021, 7 (07)

[12] Detection of Malicious PDF Files Using a Two-Stage Machine Learning Algorithm [J].

He, Kang ;

Zhu, Yuefei ;

He, Yubo ;

Liu, Long ;

Lu, Bin ;

Lin, Wei .

CHINESE JOURNAL OF ELECTRONICS, 2020, 29 (06) :1165-1177

[13] Low frequency and radar's physical based features for improvement of convolutional neural networks for PolSAR image classification [J].

Imani, Maryam .

EGYPTIAN JOURNAL OF REMOTE SENSING AND SPACE SCIENCES, 2022, 25 (01) :55-62

[14] Detection of malicious code using the direct hashing and pruning and support vector machine [J].

Ju, YeongJi ;

Kim, MinGu ;

Shin, JuHyun .

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (18)

[15] ImageDetox: Method for the Neutralization of Malicious Code Hidden in Image Files [J].

Jung, Dong-Seob ;

Lee, Sang-Joon ;

Euom, Ieck-Chae .

SYMMETRY-BASEL, 2020, 12 (10) :1-18

[16]

Kalnawat Aarti, 2024, E3S Web of Conferences, V491, DOI 10.1051/e3sconf/202449102025

[17]

Karampidis K., 2016, Lecture Notes in Business Information Processing, V249, pE1

[18]

Khan B., 2023, J. Cyber Secur, V5, P1, DOI [10.32604/jcs.2023.042501, DOI 10.32604/JCS.2023.042501]

[19] Efficient Deep Learning Network With Multi-Streams for Android Malware Family Classification [J].

Kim, Hyun-Il ;

Kang, Moonyoung ;

Cho, Seong-Je ;

Choi, Sang-Il .

IEEE ACCESS, 2022, 10 :5518-5532

[20] FILM: Filtering and Machine Learning for Malware Detection in Edge Computing [J].

Kim, Young Jae ;

Park, Chan-Hyeok ;

Yoon, MyungKeun .

SENSORS, 2022, 22 (06)

← 1 2 3 4 →