Survey of Source Code Bug Detection Based on Deep Learning

被引:0
作者
Deng X. [1 ,2 ]
Ye W. [2 ]
Xie R. [2 ,3 ]
Zhang S.-K. [2 ]
机构
[1] School of Software and Microelectronics, Peking University, Beijing
[2] National Engineering Research Center for Software Engineering, Peking University, Beijing
[3] School of Electronics Engineering and Computer Science, Peking University, Beijing
来源
Ruan Jian Xue Bao/Journal of Software | 2023年 / 34卷 / 02期
关键词
code representation; deep learning; vulnerability detection;
D O I
10.13328/j.cnki.jos.006696
中图分类号
学科分类号
摘要
Source code bug (vulnerability) detection is a process of judging whether there are unexpected behaviors in the program code. It is widely used in software engineering tasks such as software testing and software maintenance, and plays a vital role in software functional assurance and application security. Traditional vulnerability detection research is based on program analysis, which usually requires strong domain knowledge and complex calculation rules, and faces the problem of state explosion, resulting in limited detection performance, and there is room for greater improvement in the rate of false positives and false negatives. In recent years, the open source community’s vigorous development has accumulated massive amounts of data with open source code as the core. In this context, the feature learning capabilities of deep learning can automatically learn semantically rich code representations, thereby providing a new way for vulnerability detection. This study collected the latest high-level papers in this field, systematically summarized and explained the current methods from two aspects: vulnerability code dataset and deep learning vulnerability detection model. Finally, it summarizes the main challenges faced by the research in this field, and looks forward to the possible future research focus. © 2023 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:625 / 654
页数:29
相关论文
共 89 条
[1]  
Planning S., The economic impacts of inadequate infrastructure for software testing, (2002)
[2]  
LaToza TD, Venolia G, DeLine R., Maintaining mental models: A study of developer work habits, Proc. of the 28th Int’l Conf. on Software Engineering, pp. 492-501, (2006)
[3]  
IEEE standard glossary of software engineering terminology (IEEE Std 610.12-1990), 169, (1990)
[4]  
Adger WN., Vulnerability, Global Environmental Change, 16, 3, pp. 268-281, (2006)
[5]  
Coverity: Coverity scan static analysis, (2022)
[6]  
KlocWork: Static code analysis for C, C++, C#, and Java, (2022)
[7]  
Gao Q, Ma S, Shao S, Et al., CoBOT: Static C/C++ bug detection in the presence of incomplete code, Proc. of the 26th IEEE/ ACM Int’l Conf. on Program Comprehension (ICPC), pp. 385-388, (2018)
[8]  
Cadar C, Dunbar D, Engler DR., Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs, OSDI, 8, pp. 209-224, (2008)
[9]  
Chipounov V, Kuznetsov V, Candea G., S2E: A platform for in-vivo multi-path analysis of software systems, ACM SIGPLAN Notices, 46, 3, pp. 265-278, (2011)
[10]  
Cha SK, Avgerinos T, Rebert A, Et al., Unleashing mayhem on binary code, Proc. of the 2012 IEEE Symp. on Security and Privacy, pp. 380-394, (2012)