An alternative to code comment generation? Generating comment from bytecode

被引：0

作者：

Chen, Xiangping ^{[2
]}

Chen, Junqi ^{[1
]}

Lian, Zhilu ^{[1
]}

Huang, Yuan ^{[1
]}

Zhou, Xiaocong ^{[3
,4
]}

Wu, Yunzhi ^{[5
]}

Zheng, Zibin ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Software Engn, Zhuhai, Peoples R China

[2] Sun Yat Sen Univ, Sch Journalism & Commun, Guangzhou, Peoples R China

[3] Sun Yat Sen Univ, Guangzhou, Peoples R China

[4] Sch Comp Sci & Engn, Guangzhou, Peoples R China

[5] Guangzhou Modern Informat Engn Coll, Sch Econ & Management, Guangzhou, Peoples R China

来源：

INFORMATION AND SOFTWARE TECHNOLOGY | 2025年 / 179卷

基金：

国家重点研发计划;

关键词：

Code comment; Comment generation; Bytecode; Source code unavailable;

D O I：

10.1016/j.infsof.2024.107623

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Context: Due to the importance and necessity of code comments, recent works propose many comment generation models with source code as input. But sometimes there has no access to obtain the source code, only the bytecode, such as many Apps. Objective: If there is away to generate comments for bytecode directly, tasks such as malware detection and understanding closed-source software can benefit from the generated comment because it improves the understandability of the system. Therefore, we propose a novel approach called ByteGen to generate comments from bytecode. Methods: Specifically, to extract the structure characteristic of the bytecode, we utilize the control flow graph (CFG) of the bytecode and use a special traversal named enhanced SBT to serialize CFG. The enhanced SBT can completely preserve the structure of the CFG in a sequence. We set up experiments on a dataset with a scale of about 50,000 bytecode-comment pairs collected from Maven. Results: Experimental results show that the average BLEU-4 score of ByteGen is 28.67, which outperforms several baselines, and a human study also indicates the effectiveness of ByteGen in generating comments from bytecodes. Conclusion: In general, ByteGen performs better than other baselines. Therefore, this also proves the effectiveness of our approach in the code comment generation scenario without source code.

引用

页数：12

共 63 条

[1] A Survey of Machine Learning for Big Code and Naturalness
Allamanis, Miltiadis
Barr, Earl T.
Devanbu, Premkumar
Sutton, Charles
[J]. ACM COMPUTING SURVEYS, 2018, 51 (04)
[2] Banerjee S., 2005, P ACL WORKSH INTR EX, V29, P65, DOI DOI 10.3115/1626355.1626389
[3] Chen XP, 2024, Arxiv, DOI arXiv:2410.13110
[4] Small extracellular vesicles from young plasma reverse age-related functional declines by improving mitochondrial energy metabolism
Chen, Xiaorui
Luo, Yang
Zhu, Qing
Zhang, Jingzi
Huang, Huan
Kan, Yansheng
Li, Dian
Xu, Ming
Liu, Shuohan
Li, Jianxiao
Pan, Jinmeng
Zhang, Li
Guo, Yan
Wang, Binghao
Qi, Guantong
Zhou, Zhen
Zhang, Chen-Yu
Fang, Lei
Wang, Yanbo
Chen, Xi
[J]. NATURE AGING, 2024, 4 (06): : 814 - 838
[5] THE CNN PARADIGM
CHUA, LO
ROSKA, T
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 1993, 40 (03) : 147 - 156
[6] Dahm M., 1999, JIT'99. Java-Information-Tag 1999, P267
[7] DEXRAY: A Simple, yet Effective Deep Learning Approach to Android Malware Detection Based on Image Representation of Bytecode
Daoudi, Nadia
Samhi, Jordan
Kabore, Abdoul Kader
Allix, Kevin
Bissyande, Tegawende F.
Klein, Jacques
[J]. DEPLOYABLE MACHINE LEARNING FOR SECURITY DEFENSE, MLHAT 2021, 2021, 1482 : 81 - 106
[8] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[9] Dey R, 2017, MIDWEST SYMP CIRCUIT, P1597, DOI 10.1109/MWSCAS.2017.8053243
[10] Do Code Summarization Models Process Too Much Information? Function Signature May Be All That Is Needed
Ding, Xi
Peng, Rui
Chen, Xiangping
Huang, Yuan
Bian, Jing
Zheng, Zibin
[J]. ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (06)

← 1 2 3 4 5 6 7 →