Vectorizing Program Ingredients for Better JVM Testing

被引:10
作者
Gao, Tianchang [1 ]
Chen, Junjie [1 ]
Zhao, Yingquan [1 ]
Zhang, Yuqun [2 ]
Zhang, Lingming [3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China
[2] Southern Univ Sci & Technol, Shenzhen, Peoples R China
[3] Univ Illinois, Champaign, IL USA
来源
PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023 | 2023年
基金
中国国家自然科学基金;
关键词
!text type='Java']Java[!/text] Virtual Machine; Program Synthesis; JVM Testing; Test Oracle;
D O I
10.1145/3597926.3598075
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
JVM testing is one of the most widely-used methodologies for guaranteeing the quality of JVMs. Among various JVM testing techniques, synthesis-based JVM testing, which constructs a test program by synthesizing various code snippets (also called program ingredients), has been demonstrated state-of-the-art. The existing synthesis-based JVM testing work puts more efforts in ensuring the validity of synthesized test programs, but ignores the influence of huge ingredient space, which largely limits the ingredient exploration efficiency as well as JVM testing performance. In this work, we propose Vectorized JVM Testing (called VECT) to further promote the performance of synthesis-based JVM testing. Its key insight is to reduce the huge ingredient space by clustering semantically similar ingredients via vectorizing ingredients using state-of-the-art code representation. To make VECT complete and more effective, based on vectorized ingredients, VECT further designs a feedback-driven ingredient selection strategy and an enhanced test oracle. We conducted an extensive study to evaluate VECT on three popular JVMs (i.e., HotSpot, OpenJ9, and Bisheng JDK) involving five OpenJDK versions. The results demonstrate VECT detects 115.03%similar to 776.92% more unique inconsistencies than the state-of-the-art JVM testing technique during the same testing time. In particular, VECT detects 26 previously unknown bugs for them, 15 of which have already been confirmed/fixed by developers.
引用
收藏
页码:526 / 537
页数:12
相关论文
共 57 条
[1]  
Ahmad WU, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P2655
[2]   code2vec: Learning Distributed Representations of Code [J].
Alon, Uri ;
Zilberstein, Meital ;
Levy, Omer ;
Yahav, Eran .
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (POPL)
[3]  
[Anonymous], 2022, SCIKIT LEARN
[4]  
[Anonymous], 2022, HotSpot
[5]  
[Anonymous], 2022, GCOV
[6]  
[Anonymous], 2022, VECT
[7]  
[Anonymous], 2022, OpenJ9
[8]  
[Anonymous], 2022, Bisheng
[9]  
[Anonymous], 2012, P 21 USENIX SEC S US
[10]   InferCode: Self-Supervised Learning of Code Representations by Predicting Subtrees [J].
Bui, Nghi D. Q. ;
Yu, Yijun ;
Jiang, Lingxiao .
2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2021), 2021, :1186-1197