Unlocking the Power of Machine Learning in Cybersecurity Forensics: Identifying Malicious Files

被引:0
作者
Yavas, Cemil Emre [1 ]
Das, Jiban Krishna [1 ]
Akpomedaye, Bennett [1 ]
Chen, Lei [1 ]
Ji, Yiming [1 ]
机构
[1] Georgia Southern Univ, Dept Informat Technol, Statesboro, GA 30458 USA
来源
SECURITY AND MANAGEMENT AND WIRELESS NETWORKS, SAM 2024, ICWN 2024 | 2025年 / 2254卷
基金
美国国家科学基金会;
关键词
Cybersecurity; Machine Learning; Malicious Files; Digital Forensics; Cyber Threats; File System Analysis; Hexadecimal Code Analysis;
D O I
10.1007/978-3-031-86637-1_10
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Our research introduces a novel method for determining the originating software of digital images, which significantly advances digital forensic analysis capabilities. This method involves transforming images into their hexadecimal code representations, thereby stripping away metadata and making the files unrecognizable by conventional identification techniques. Through a meticulous analysis of these hex codes, broken down into 2-character substrings, we construct detailed feature vectors representing the frequency of each substring. Utilizing a diverse array of machine learning models, including RandomForestClassifier, LogisticRegression, and others, our approach successfully identifies the software used to create the images, such as PowerPoint, GIMP, Picasa, and the online tool Batchtools.pro, with an impressive accuracy rate between 97% and 100%. Moreover, this technique enables the detection and flagging of files containing malicious content with nearly perfect accuracy. Our approach not only enhances the understanding of a file's digital lineage but also offers a new mechanism in digital forensics, providing a robust tool for both identifying the software used in file creation and detecting malicious alterations.
引用
收藏
页码:123 / 139
页数:17
相关论文
共 35 条
[11]   Exposing Manipulated Photos and Videos in Digital Forensics Analysis [J].
Ferreira, Sara ;
Antunes, Mario ;
Correia, Manuel E. .
JOURNAL OF IMAGING, 2021, 7 (07)
[12]   Detection of Malicious PDF Files Using a Two-Stage Machine Learning Algorithm [J].
He, Kang ;
Zhu, Yuefei ;
He, Yubo ;
Liu, Long ;
Lu, Bin ;
Lin, Wei .
CHINESE JOURNAL OF ELECTRONICS, 2020, 29 (06) :1165-1177
[14]   Detection of malicious code using the direct hashing and pruning and support vector machine [J].
Ju, YeongJi ;
Kim, MinGu ;
Shin, JuHyun .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (18)
[15]   ImageDetox: Method for the Neutralization of Malicious Code Hidden in Image Files [J].
Jung, Dong-Seob ;
Lee, Sang-Joon ;
Euom, Ieck-Chae .
SYMMETRY-BASEL, 2020, 12 (10) :1-18
[16]  
Kalnawat Aarti, 2024, E3S Web of Conferences, V491, DOI 10.1051/e3sconf/202449102025
[17]  
Karampidis K., 2016, Lecture Notes in Business Information Processing, V249, pE1
[18]  
Khan B., 2023, J. Cyber Secur, V5, P1, DOI [10.32604/jcs.2023.042501, DOI 10.32604/JCS.2023.042501]
[19]   Efficient Deep Learning Network With Multi-Streams for Android Malware Family Classification [J].
Kim, Hyun-Il ;
Kang, Moonyoung ;
Cho, Seong-Je ;
Choi, Sang-Il .
IEEE ACCESS, 2022, 10 :5518-5532
[20]   FILM: Filtering and Machine Learning for Malware Detection in Edge Computing [J].
Kim, Young Jae ;
Park, Chan-Hyeok ;
Yoon, MyungKeun .
SENSORS, 2022, 22 (06)