A Lightweight Cross-Version Binary Code Similarity Detection Based on Similarity and Correlation Coefficient Features

被引:9
作者
Guo, Hui [1 ]
Huang, Shuguang [1 ]
Huang, Cheng [2 ]
Zhang, Min [1 ]
Pan, Zulie [1 ]
Shi, Fan [1 ]
Huang, Hui [1 ]
Hu, Donghui [3 ]
Wang, Xiaoping [1 ]
机构
[1] Natl Univ Def Technol, Coll Elect Engn, Hefei 230011, Peoples R China
[2] Sichuan Univ, Coll Cybersecur, Chengdu 610065, Peoples R China
[3] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230009, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Binary code similarity detection; cross-version binary; malware detection; similarity coefficient; correlation coefficient;
D O I
10.1109/ACCESS.2020.3004813
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The technique of binary code similarity detection (BCSD) has been applied in many fields, such as malware detection, plagiarism detection and vulnerability search, etc. Existing solutions for the BCSD problem usually compare specific features between binaries based on the control flow graphs of functions from binaries or compute the embedding vector of binary functions and solve the problem based on deep learning algorithms. In this paper, from another research perspective, we propose a new and lightweight method to solve cross-version BCSD problem based on multiple features. It transforms binary functions into vectors and signals and computes the similarity coefficient value and correlation coefficient value for solving cross-version BCSD problem. Without relying on the CFG of functions, deep learning algorithms and other related attributes, our method works directly on the raw bytes of each binary and it can be used as an alternative method to coping with various complex situations that exist in the real-world environment. We implement the method and evaluate it on a custom dataset with about 423,282 samples. The result shows that the method could perform well in cross-version BCSD field, and the recall of our method could reach 96.63%, which is almost the same as the state-of-the-art static solution.
引用
收藏
页码:120501 / 120512
页数:12
相关论文
共 28 条
[1]  
[Anonymous], 2016, P NETW DISTR SYST SE
[2]  
[Anonymous], 2013, PROC 22 USENIX SECUR, DOI DOI 10.5555/2534766.2534774
[3]  
[Anonymous], P NEURAL INFORM PROC
[4]  
Bayer U., 2009, NDSS
[5]   A significant increase in wave height in the North Atlantic Ocean over the 20th century [J].
Bertin, Xavier ;
Prouteau, Elizabeth ;
Letetrel, Camille .
GLOBAL AND PLANETARY CHANGE, 2013, 106 :77-83
[6]   Automatic patch-based exploit generation is possible: Techniques and implications [J].
Brumley, David ;
Poosankam, Pongsin ;
Song, Dawn ;
Zheng, Jiang .
PROCEEDINGS OF THE 2008 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, 2008, :143-+
[7]   BinGo: Cross-Architecture Cross-OS Binary Search [J].
Chandramohan, Mahinthan ;
Xue, Yinxing ;
Xu, Zhengzi ;
Liu, Yang ;
Cho, Chia Yuan ;
Kuan, Tan Hee Beng .
FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2016, :678-689
[8]   Wireless Elevator Communication and Monitor System Design Based on ZigBee Technology and Ethernet [J].
Chen, Jyh-Wei ;
Thanh-Nhat-Trung Tran ;
Hsieh, Yu-Cheng .
PROCEEDINGS OF THE 2019 IEEE EURASIA CONFERENCE ON IOT, COMMUNICATION AND ENGINEERING (ECICE), 2019, :369-372
[9]  
Dai HJ, 2016, PR MACH LEARN RES, V48
[10]  
David Y, 2016, ACM SIGPLAN NOTICES, V51, P266, DOI [10.1145/2908080.2908126, 10.1145/2980983.2908126]