Detecting Data Races in OpenMP with Deep Learning and Large Language Models

被引：0

作者：

Alsofyani, May ^{[1
]}

Wang, Liqiang ^{[1
]}

机构：

[1] Univ Cent Florida, Dept Comp Sci, Orlando, FL 32816 USA

来源：

53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024 | 2024年

关键词：

data race; race condition; bug detection; OpenMP; transformer encoder; large language model; CodeBERTa; GPT-4; Turbo;

D O I：

10.1145/3677333.3678160

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformer-based neural network models are increasingly employed to handle software engineering issues, such as bug localization and program repair. These models, equipped with a self-attention mechanism, excel at understanding source code context and semantics. Recently, large language models (LLMs) have emerged as a promising alternative for analyzing and understanding code structure. In this paper, we propose two novel methods for detecting data race bugs in OpenMP programs. The first method is based on a transformer encoder trained from scratch. The second method leverages LLMs, specifically extending GPT-4 Turbo through the use of prompt engineering and fine-tuning techniques. For training and testing our approach, we utilized two datasets comprising different OpenMP directives. Our experiments show that the transformer encoder achieves competitive accuracy compared to LLMs, whether through fine-tuning or prompt engineering techniques. This performance may be attributed to the complexity of many OpenMP directives and the limited availability of labeled datasets.

引用

页码：96 / 103

页数：8

共 50 条

[41] Large Language Models in Cosmetic Dermatology
Landau, Marina
Kroumpouzos, George
Goldust, Mohamad
JOURNAL OF COSMETIC DERMATOLOGY, 2025, 24 (02)
[42] Consumer segmentation with large language models
Li, Yinan
Liu, Ying
Yu, Muran
JOURNAL OF RETAILING AND CONSUMER SERVICES, 2025, 82
[43] Applications of Large Language Models in Pathology
Cheng, Jerome
BIOENGINEERING-BASEL, 2024, 11 (04):
[44] Enhanced Database Interaction using Large Language Models for Improved Data Retrieval and Analysis
Usha, V
Abhinash, Nalagarla Chiru
Chowdary, Sakhamuri Nitin
Sathya, V
Reddy, Eeda Ramakrishna
Priya, Sathiya S.
2024 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT CYBER PHYSICAL SYSTEMS AND INTERNET OF THINGS, ICOICI 2024, 2024, : 1302 - 1306
[45] Beyond large language models: rediscovering the role of classical statistics in modern data science
Gutierrez, Inmaculada
Gomez, Daniel
Castro, Javier
Bimber, Bruce
Labarre, Julien
2024 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, FUZZ-IEEE 2024, 2024,
[46] A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics
He, Kai
Mao, Rui
Lin, Qika
Ruan, Yucheng
Lan, Xiang
Feng, Mengling
Cambria, Erik
INFORMATION FUSION, 2025, 118
[47] Decomposing Relational Triple Extraction with Large Language Models for Better Generalization on Unseen Data
Meng, Boyu
Lin, Tianhe
Yang, Deqing
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT IV, PAKDD 2024, 2024, 14648 : 104 - 115
[48] Quo Vadis ChatGPT? From large language models to Large Knowledge Models
Venkatasubramanian, Venkat
Chakraborty, Arijit
COMPUTERS & CHEMICAL ENGINEERING, 2025, 192
[49] Detecting Data Races in Android Applications Based on Shared Variable Analysis and Constraint Solver
Sun Q.
Xu L.
Xia X.-M.
Zhang W.-F.
Ruan Jian Xue Bao/Journal of Software, 2019, 30 (11): : 3281 - 3296
[50] GMRace: Detecting Data Races in GPU Programs via a Low-Overhead Scheme
Zheng, Mai
Ravi, Vignesh T.
Qin, Feng
Agrawal, Gagan
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (01) : 104 - 115

← 1 2 3 4 5 →