Enhanced automated code vulnerability repair using large language models

被引:2
作者
de-Fitero-Dominguez, David [1 ]
Garcia-Lopez, Eva [1 ]
Garcia-Cabot, Antonio [1 ]
Martinez-Herraiz, Jose-Javier [1 ]
机构
[1] Univ Alcala, Dept Ciencias Computac, Alcala De Henares 28805, Spain
关键词
Automated code repair; Deep learning; Large language models; Vulnerability repair; Mistral; Code llama;
D O I
10.1016/j.engappai.2024.109291
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This research addresses the complex challenge of automated repair of code vulnerabilities, vital for enhancing digital security in an increasingly technology-driven world. The study introduces a novel and efficient format for the representation of code modification, using advanced Large Language Models (LLMs) such as Code Llama and Mistral. These models, fine-tuned on datasets featuring C/C++ code vulnerabilities, significantly improve the accuracy and adaptability of automated code repair techniques. A key finding is the enhanced repair accuracy of these models when compared to previous methods such as VulRepair, which underscores their practical utility and efficiency. The research also offers a critical assessment of current evaluation metrics, such as "Perfect Predictions", and their limitations in reflecting the true capabilities of automated repair models in real-world scenarios. Following this, it underscores the importance of using test datasets devoid of train samples, emphasizing the need for dataset integrity to enhance the effectiveness of LLMs in code repair tasks. The significance of this work is its contribution to digital security, setting new standards for automated code vulnerability repair and paving the way for future advancements in the fields of cybersecurity and artificial intelligence. The study does not only highlight the potential of LLMs in enhancing code security but also fosters further exploration and research in these crucial areas.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Knowledge-Aware Code Generation with Large Language Models
    Huang, Tao
    Sun, Zhihong
    Jin, Zhi
    Li, Ge
    Lyu, Chen
    PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 52 - 63
  • [32] Investigating the Efficacy of Large Language Models for Code Clone Detection
    Khajezade, Mohamad
    Wu, Jie J. W.
    Fard, Fatemeh Hendijani
    Rodriguez-Perez, Gema
    Shehata, Mohamed Sami
    PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 161 - 165
  • [33] Comparing Code Explanations Created by Students and Large Language Models
    Leinonen, Juho
    Denny, Paul
    MacNeil, Stephen
    Sarsa, Sami
    Bernstein, Seth
    Kim, Joanne
    Tran, Andrew
    Hellas, Arto
    PROCEEDINGS OF THE 2023 CONFERENCE ON INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, ITICSE 2023, VOL 1, 2023, : 124 - 130
  • [34] Bugs in large language models generated code: an empirical study
    Tambon, Florian
    Moradi-Dakhel, Arghavan
    Nikanjam, Amin
    Khomh, Foutse
    Desmarais, Michel C.
    Antoniol, Giuliano
    EMPIRICAL SOFTWARE ENGINEERING, 2025, 30 (03)
  • [35] Code Clone Detection Techniques Based on Large Language Models
    Almatrafi, Afnan A.
    Eassa, Fathy A.
    Sharaf, Sanaa A.
    IEEE ACCESS, 2025, 13 : 46136 - 46146
  • [36] Comparative Analysis of Large Language Models in Source Code Analysis
    Erdogan, Huseyin
    Turan, Nezihe Turhan
    Onan, Aytug
    INTELLIGENT AND FUZZY SYSTEMS, INFUS 2024 CONFERENCE, VOL 1, 2024, 1088 : 185 - 192
  • [37] Self-Planning Code Generation with Large Language Models
    Jiang, Xue
    Dong, Yihong
    Wang, Lecheng
    Fang, Zheng
    Shang, Qiwei
    Li, Ge
    Jin, Zhi
    Jiao, Wenpin
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (07)
  • [38] A Comparative Analysis of Large Language Models for Code Documentation Generation
    Dvivedi, Shubhang Shekhar
    Vijay, Vyshnav
    Pujari, Sai Leela Rahul
    Lodh, Shoumik
    Kumar, Dhruv
    PROCEEDINGS OF THE 1ST ACM INTERNATIONAL CONFERENCE ON AI-POWERED SOFTWARE, AIWARE 2024, 2024, : 65 - 73
  • [39] Automating Patch Set Generation from Code Review Comments Using Large Language Models
    Rahman, Tajmilur
    Singh, Rahul
    Sultan, Mir Yousuf
    PROCEEDINGS 2024 IEEE/ACM 3RD INTERNATIONAL CONFERENCE ON AI ENGINEERING-SOFTWARE ENGINEERING FOR AI, CAIN 2024, 2024, : 273 - 274
  • [40] Automated Programming Exercise Generation in the Era of Large Language Models
    Meissner, Niklas
    Speth, Sandro
    Becker, Steffen
    2024 36TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING EDUCATION AND TRAINING, CSEE & T 2024, 2024,