Bash comment generation via data augmentation and semantic-aware CodeBERT

被引:0
|
作者
Yiheng Shen
Xiaolin Ju
Xiang Chen
Guang Yang
机构
[1] Nantong University,School of Information Science and Technology
[2] Nanjing University of Aeronautics and Astronautics,College of Computer Science and Technology
来源
Automated Software Engineering | 2024年 / 31卷
关键词
Bash code; Code comment generation; Adversarial training; Data augmentation;
D O I
暂无
中图分类号
学科分类号
摘要
Understanding Bash code is challenging for developers due to its syntax flexibility and unique features. Bash lacks sufficient training data compared to comment generation tasks in popular programming languages. Furthermore, collecting more real Bash code and corresponding comments is time-consuming and labor-intensive. In this study, we propose a two-module method named Bash2Com for Bash code comments generation. The first module, NP-GD, is a gradient-based automatic data augmentation component that enhances normalization stability when generating adversarial examples. The second module, MASA, leverages CodeBERT to learn the rich semantics of Bash code. Specifically, MASA considers the representations learned at each layer of CodeBERT as a set of semantic information that captures recursive relationships within the code. To generate comments for different Bash snippets, MASA employs LSTM and attention mechanisms to dynamically concentrate on relevant representational information. Then, we utilize the Transformer decoder and beam search algorithm to generate code comments. To evaluate the effectiveness of Bash2Com, we consider a corpus of 10,592 Bash code and corresponding comments. Compared with the state-of-the-art baselines, our experimental results show that Bash2Com can outperform all baselines by at least 10.19%, 11.81%, 2.61%, and 6.13% in terms of the performance measures BLEU-3/4, METEOR, and ROUGR-L. Moreover, the rationality of NP-GD and MASA in Bash2Com are verified by ablation studies. Finally, we conduct a human evaluation to illustrate the effectiveness of Bash2Com from practitioners’ perspectives.
引用
收藏
相关论文
共 10 条
  • [1] Bash comment generation via data augmentation and semantic-aware CodeBERT
    Shen, Yiheng
    Ju, Xiaolin
    Chen, Xiang
    Yang, Guang
    AUTOMATED SOFTWARE ENGINEERING, 2024, 31 (01)
  • [2] BASHEXPLAINER: Retrieval-Augmented Bash Code Comment Generation based on Fine-tuned CodeBERT
    Yu, Chi
    Yang, Guang
    Chen, Xiang
    Liu, Ke
    Zhou, Yanlin
    2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2022), 2022, : 82 - 93
  • [3] Training Deep Code Comment Generation Models via Data Augmentation
    Zhang, Xiaoqing
    Zhou, Yu
    Han, Tingting
    Chen, Taolue
    THE 12TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2020, 2021, : 185 - 188
  • [4] SETA: Semantic-Aware Edge-Guided Token Augmentation for Domain Generalization
    Guo, Jintao
    Qi, Lei
    Shi, Yinghuan
    Gao, Yang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 5622 - 5636
  • [5] Easy and effective! Data augmentation for knowledge-aware dialogue generation via multi-perspective sentences interaction☆
    Peng, Sisi
    Qu, Dan
    Zhang, Wenlin
    Zhang, Hao
    Li, Shunhang
    Xu, Minchen
    NEUROCOMPUTING, 2025, 614
  • [6] Knowledge Base Question Generation via Data Augmentation with Dynamic-Prompt
    Zhao, Long
    Xu, Yin
    Wang, Yanyan
    Li, Fei
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VII, 2024, 15022 : 249 - 261
  • [7] Enhanced Breast Lesion Classification via Knowledge Guided Cross-Modal and Semantic Data Augmentation
    Chen, Kun
    Guo, Yuanfan
    Yang, Canqian
    Xu, Yi
    Zhang, Rui
    Li, Chunxiao
    Wu, Rong
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT V, 2021, 12905 : 53 - 63
  • [8] Accurate Scene Text Detection Via Scale-Aware Data Augmentation and Shape Similarity Constraint
    Dai, Pengwen
    Li, Yang
    Zhang, Hua
    Li, Jingzhi
    Cao, Xiaochun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1883 - 1895
  • [9] Learning Semantic Textual Similarity via Multi-Teacher Knowledge Distillation: A Multiple Data Augmentation method
    Lu, Zhikun
    Zhao, Ying
    Li, Jinnan
    Tian, Yuan
    2024 9TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS, ICCCS 2024, 2024, : 1197 - 1203
  • [10] Domain Adaptive Semantic Segmentation of Remote Sensing Images via Self-Training-Based Dual-Level Data Augmentation
    Hu, Xiaoxing
    Wang, Yupei
    Chen, Liang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 19713 - 19729