Detecting Data Races in OpenMP with Deep Learning and Large Language Models

被引:2
作者
Alsofyani, May [1 ]
Wang, Liqiang [1 ]
机构
[1] Univ Cent Florida, Dept Comp Sci, Orlando, FL 32816 USA
来源
53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024 | 2024年
关键词
data race; race condition; bug detection; OpenMP; transformer encoder; large language model; CodeBERTa; GPT-4; Turbo;
D O I
10.1145/3677333.3678160
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer-based neural network models are increasingly employed to handle software engineering issues, such as bug localization and program repair. These models, equipped with a self-attention mechanism, excel at understanding source code context and semantics. Recently, large language models (LLMs) have emerged as a promising alternative for analyzing and understanding code structure. In this paper, we propose two novel methods for detecting data race bugs in OpenMP programs. The first method is based on a transformer encoder trained from scratch. The second method leverages LLMs, specifically extending GPT-4 Turbo through the use of prompt engineering and fine-tuning techniques. For training and testing our approach, we utilized two datasets comprising different OpenMP directives. Our experiments show that the transformer encoder achieves competitive accuracy compared to LLMs, whether through fine-tuning or prompt engineering techniques. This performance may be attributed to the complexity of many OpenMP directives and the limited availability of labeled datasets.
引用
收藏
页码:96 / 103
页数:8
相关论文
共 23 条
[1]  
Antao Samuel F., 2016, 2016 Third Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC). Proceedings, P1, DOI 10.1109/LLVM-HPC.2016.006
[2]   ARCHER: Effectively Spotting Data Races in Large OpenMP Applications [J].
Atzeni, Simone ;
Gopalakrishnan, Ganesh ;
Rakamaric, Zvonimir ;
Ahn, Dong H. ;
Laguna, Ignacio ;
Schulz, Martin ;
Lee, Gregory L. ;
Protze, Joachim ;
Mueller, Matthias S. ;
Mueller, Matthias S. .
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, :53-62
[3]   PACER: Proportional Detection of Data Races [J].
Bond, Michael D. ;
Coons, Katherine E. ;
McKinley, Kathryn S. .
ACM SIGPLAN NOTICES, 2010, 45 (06) :255-268
[4]  
Cao Jialun, 2023, abs/2304.08191.
[5]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6]  
Dinella Elizabeth, 2020, INT C LEARNING REPRE
[7]  
Engler D., 2003, Operating Systems Review, V37, P237, DOI 10.1145/1165389.945468
[8]  
Feng ZY, 2020, Arxiv, DOI arXiv:2002.08155
[9]  
Guo Q., 2023, Journal of Advanced Research in Artificial Intelligence and Machine Learning., V8, P3
[10]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]