Detecting Data Races in OpenMP with Deep Learning and Large Language Models

被引：0

作者：

Alsofyani, May ^{[1
]}

Wang, Liqiang ^{[1
]}

机构：

[1] Univ Cent Florida, Dept Comp Sci, Orlando, FL 32816 USA

来源：

53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024 | 2024年

关键词：

data race; race condition; bug detection; OpenMP; transformer encoder; large language model; CodeBERTa; GPT-4; Turbo;

D O I：

10.1145/3677333.3678160

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformer-based neural network models are increasingly employed to handle software engineering issues, such as bug localization and program repair. These models, equipped with a self-attention mechanism, excel at understanding source code context and semantics. Recently, large language models (LLMs) have emerged as a promising alternative for analyzing and understanding code structure. In this paper, we propose two novel methods for detecting data race bugs in OpenMP programs. The first method is based on a transformer encoder trained from scratch. The second method leverages LLMs, specifically extending GPT-4 Turbo through the use of prompt engineering and fine-tuning techniques. For training and testing our approach, we utilized two datasets comprising different OpenMP directives. Our experiments show that the transformer encoder achieves competitive accuracy compared to LLMs, whether through fine-tuning or prompt engineering techniques. This performance may be attributed to the complexity of many OpenMP directives and the limited availability of labeled datasets.

引用

页码：96 / 103

页数：8

共 50 条

[21] Data augmentation based on large language models for radiological report classification
Collado-Montanez, Jaime
Martin-Valdivia, Maria-Teresa
Martinez-Camara, Eugenio
KNOWLEDGE-BASED SYSTEMS, 2025, 308
[22] MediGPT: Exploring Potentials of Conventional and Large Language Models on Medical Data
Rony, Mohammad Abu Tareq
Islam, Mohammad Shariful
Sultan, Tipu
Alshathri, Samah
El-Shafai, Walid
IEEE ACCESS, 2024, 12 : 103473 - 103487
[23] Understanding Sarcoidosis Using Large Language Models and Social Media Data
Xi, Nan Miles
Ji, Hong-Long
Wang, Lin
JOURNAL OF HEALTHCARE INFORMATICS RESEARCH, 2024,
[24] Are Large Language Models Capable of Causal Reasoning for Sensing Data Analysis?
Hu, Zhizhang
Zhang, Yue
Rossi, Ryan
Yu, Tong
Kim, Sungchul
Pan, Shijia
PROCEEDINGS OF THE 2024 WORKSHOP ON EDGE AND MOBILE FOUNDATION MODELS, EDGEFM 2024, 2024, : 24 - 29
[25] Detecting Data Races Caused by Inconsistent Lock Protection in Device Drivers
Chen, Qiu-Liang
Bai, Jia-Ju
Jiang, Zu-Ming
Lawall, Julia
Hu, Shi-Min
2019 IEEE 26TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER), 2019, : 366 - 376
[26] LLM4RL: Enhancing Reinforcement Learning with Large Language Models
Zhou, Jiehan
Zhao, Yang
Liu, Jiahong
Dong, Peijun
Luo, Xiaoyu
Tao, Hang
Chang, Shi
Luo, Hanjiang
2024 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CCECE 2024, 2024, : 86 - 87
[27] Reward Design Using Large Language Models for Natural Language Explanation of Reinforcement Learning Agent Actions
Masadome, Shinya
Harada, Taku
IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2025,
[28] Carbon-based molecular properties efficiently predicted by deep learning-based quantum chemical simulation with large language models
Wang H.
Chen B.
Sun H.
Zhang Y.
Computers in Biology and Medicine, 2024, 176
[29] An Experimental Research of Text-to-SQL for Heterogeneous Data in Large Language Models
Yang, Weiwei
Wang, Xiaoliang
Chen, Bosheng
Liu, Yong
Wang, Bing
Wang, Hui
Wang, Xiaoke
Zhua, Haitao
Wang, Zhehao
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT I, ICIC 2024, 2024, 14875 : 378 - 389
[30] GRace: A Low-Overhead Mechanism for Detecting Data Races in GPU Programs
Zheng, Mai
Ravi, Vignesh T.
Qin, Feng
Agrawal, Gagan
ACM SIGPLAN NOTICES, 2011, 46 (08) : 135 - 145

← 1 2 3 4 5 →