FuncNet: A Euclidean Embedding Approach for Lightweight Cross-platform Binary Recognition

被引:4
作者
Luo, Mengxia [1 ]
Yang, Can [1 ]
Gong, Xiaorui [1 ]
Yu, Lei [1 ]
机构
[1] Chinese Acad Sci, Univ Chinese Acad Sci, Sch Cyber Secur, Inst Informat Engn, Beijing, Peoples R China
来源
SECURITY AND PRIVACY IN COMMUNICATION NETWORKS, SECURECOMM, PT I | 2019年 / 304卷
关键词
Binary reverse analysis; Euclidean embedding; PopSom;
D O I
10.1007/978-3-030-37228-6_16
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reverse analysis is a necessary but manually dependent technique to comprehend the working principle of new malware. The cross-platform binary recognition facilitates the work of reverse engineers by identifying those duplicated or known parts compiled from various platforms. However, existing approaches mainly rely on raw function bytes or cosine embedding representation, which have either low binary recognition accuracy or high binary search overheads on real-world binary recognition tasks. In this paper, we propose a lightweight neural network-based approach to generate the Euclidean embedding (i.e., a numeric vector), based on the control flow graph and callee's interface information of each binary function, and classify the embedding vectors with an Euclidean distance sensitive artificial neural network. We implement a prototype called FuncNet, and evaluate it on real-world projects with 1980 binaries, about 2 million function pairs. The experiment result shows that its accuracy outperforms state-of-the-art solutions by over 13% on average and the binary search on big datasets can be done with constant time complexity.
引用
收藏
页码:319 / 337
页数:19
相关论文
共 26 条
[1]  
Abadi M., 2016, 12 USENIX S OPERATIN, V16, P265
[2]  
[Anonymous], 2015, IDA PRO DISASSEMBLER
[3]  
[Anonymous], 2018, Minisom: minimalistic and numpy-based implementation of the self organizing map
[4]   Malware Similarity Identification Using Call Graph Based System Call Subsequence Features [J].
Blokhin, Kristina ;
Saxe, Josh ;
Mentis, David .
2013 33RD IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW 2013), 2013, :6-10
[5]   Automatic patch-based exploit generation is possible: Techniques and implications [J].
Brumley, David ;
Poosankam, Pongsin ;
Song, Dawn ;
Zheng, Jiang .
PROCEEDINGS OF THE 2008 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, 2008, :143-+
[6]   BinGo: Cross-Architecture Cross-OS Binary Search [J].
Chandramohan, Mahinthan ;
Xue, Yinxing ;
Xu, Zhengzi ;
Liu, Yang ;
Cho, Chia Yuan ;
Kuan, Tan Hee Beng .
FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2016, :678-689
[7]  
Dahl GE, 2013, INT CONF ACOUST SPEE, P8609, DOI 10.1109/ICASSP.2013.6639346
[8]  
Dai HJ, 2016, PR MACH LEARN RES, V48
[9]   Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization [J].
Ding, Steven H. H. ;
Fung, Benjamin C. M. ;
Charland, Philippe .
2019 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2019), 2019, :472-489
[10]   Kam1n0: MapReduce-based Assembly Clone Search for Reverse Engineering [J].
Ding, Steven H. H. ;
Fung, Benjamin C. M. ;
Charland, Philippe .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :461-470