Compiler Fuzzing Test Case Generation with Feed-forward Neural Network

被引:0
作者
Xu H.-R. [1 ]
Wang Y.-J. [1 ]
Huang Z.-J. [2 ]
Xie P.-D. [1 ]
Fan S.-H. [1 ]
机构
[1] College of Computer Science and Technology, National University of Defense Technology, Changsha
[2] Institute of System Engineering, Academy of Military Sciences, Beijing
来源
Ruan Jian Xue Bao/Journal of Software | 2022年 / 33卷 / 06期
关键词
Abstract syntax network; Compiler fuzzing; Deep learning; Feed-forward neural network; Software defect;
D O I
10.13328/j.cnki.jos.006565
中图分类号
学科分类号
摘要
Compiler fuzzing is one of the commonly used techniques to test the functionality and safety of compilers. The fuzzer produces grammatically correct test cases to test the deep parts of the compiler. Recently, recurrent neural networks-based deep learning methods have been introduced to the test case generation process. Aiming at the problems of insufficient grammatical accuracy and low generation efficiency when generating test cases, a method for generating compiler fuzzing test cases is proposed based on feed-forward neural networks, and the prototype tool FAIR is designed and implemented. Different from the method based on token sequence learning, FAIR extracts code fragments from the abstract syntax tree, and uses a self-attention-based feed-forward neural network to capture the grammatical associations between code fragments. After learning a generative model of the programming language, fair automatically produce diverse test cases. Experimental results show that FAIR is superior to its competitors in terms of grammatical accuracy and generation efficiency of generating test cases. The proposed method has significantly improved the ability to detect compiler software defects, and has successfully detected 20 software defects in GCC and LLVM. In addition, the method has sound portability. The simple ported FAIR-JS has detected 2 defects in the JavaScript engine. © Copyright 2022, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:1996 / 2011
页数:15
相关论文
共 30 条
[11]  
Wang NY, Ye YX, Liu L, Feng LZ, Bao T, Peng T., Language models based on deep learning: A review, Ruan Jian Xue Bao/ Journal of Software, 32, 4, pp. 1082-1115, (2021)
[12]  
Sutskever I, Martens J, Hinton GE., Generating text with recurrent neural networks, Proc. of the ICML, (2011)
[13]  
Godefroid P, Peleg H, Singh R., Learn&fuzz: Machine learning for input fuzzing, Proc. of the 32nd IEEE/ACM Int'l Conf. on Automated Software Engineering (ASE), pp. 50-59, (2017)
[14]  
Hochreiter S, Schmidhuber J., Long short-term memory, Neural Computation, 9, 8, pp. 1735-1780, (1997)
[15]  
Karpathy A, Johnson J, Fei-Fei L., Visualizing and understanding recurrent networks, (2015)
[16]  
Bahdanau D, Cho K, Bengio Y., Neural machine translation by jointly learning to align and translate, (2014)
[17]  
Luong MT, Pham H, Manning CD., Effective approaches to attention-based neural machine translation, (2015)
[18]  
Salton G, Ross R, Kelleher J., Attentive language models, Proc. of the 8th Int'l Joint Conf. on Natural Language Processing (Vol.1: Long Papers), pp. 441-450, (2017)
[19]  
Al-Rfou R, Choe D, Constant N, Guo M, Jones L., Character-level language modeling with deeper self-attention, Proc. of the AAAI Conf. on Artificial Intelligence, 33, 1, pp. 3159-3166, (2019)
[20]  
Zheng W, Chen JZ, Wu XX, Chen X, Xia X., Empirical studies on deep-learning-based security bug report prediction methods, Ruan Jian Xue Bao/Journal of Software, 31, 5, pp. 1294-1313, (2020)