Summarizing source code through heterogeneous feature fusion and extraction

被引：1

作者：

Guo, Juncai ^{[1
]}

Liu, Jin ^{[1
]}

Liu, Xiao ^{[2
]}

Li, Li ^{[3
]}

机构：

[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China

[2] Deakin Univ, Sch Informat Technol, Melbourne, Australia

[3] Beihang Univ, Sch Software, Beijing, Peoples R China

来源：

INFORMATION FUSION | 2024年 / 103卷

基金：

中国国家自然科学基金;

关键词：

Code summarization; Feature fusion; Heterogeneous graph; Graph neural network; Transformer;

D O I：

10.1016/j.inffus.2023.102058

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Code summarization, which seeks to automatically produce a succinct natural-language description to summarize the functionality of source code, plays an essential role in maintaining the software. Currently, plentiful approaches have been proposed to first encode the source code based on its Abstract Syntax Tree (AST), and then decode it into a textual summary. However, most existing works interpret the AST-based syntax structure as a homogeneous graph, without discriminating the different relations between graph nodes (e.g., the parent-child and sibling relations) in a heterogeneous way. To mitigate this issue, this paper proposes HetCoS to extract the syntactic and sequential features of source code by exploring its inherent heterogeneity for code summarization. Specifically, we first build a Heterogeneous Code Graph (HCG) that fuses the syntax structure and code sequence with eight types of edges/relations designed between graph nodes. Moreover, we present a heterogeneous graph neural network for capturing the diverse relations in HCG. The represented HCG is then fed into a Transformer decoder, followed by a multi-head attention-based copying mechanism to support high-quality summary generation. Extensive experiments on the major Java and Python datasets illustrate the superiority of our approach over sixteen state-of-the-art baselines. To promote reproducibility studies, we make the implementation of HetCoS publicly accessible at https://github.com/GJCEXP/HETCOS.

引用

页数：16

共 50 条

[1] Summarizing source code with Heterogeneous Syntax Graph and dual position
Guo, Juncai
Liu, Jin
Liu, Xiao
Wan, Yao
Li, Li
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (05)
[2] Summarizing Source Code from Structure and Context
Hou, Shifu
Chen, Lingwei
Ye, Yanfang
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[3] Summarizing source code with hierarchical code representation
Zhou, Ziyi
Yu, Huiqun
Fan, Guisheng
Huang, Zijie
Yang, Xingguang
INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 143
[4] Large-scale predicting protein functions through heterogeneous feature fusion
Zheng, Rongtao
Huang, Zhijian
Deng, Lei
BRIEFINGS IN BIOINFORMATICS, 2023, 24 (04)
[5] Federated learning based multi-task feature fusion framework for code expressive semantic extraction
Deng, Fengyang
Fu, Cai
Qian, Yekui
Yang, Jia
He, Shuai
Xu, Hao
SOFTWARE-PRACTICE & EXPERIENCE, 2022, 52 (08): : 1849 - 1866
[6] Source Code Vulnerability Detection Based on Joint Graph and Multimodal Feature Fusion
Jin, Dun
He, Chengwan
Zou, Quan
Qin, Yan
Wang, Boshu
ELECTRONICS, 2025, 14 (05):
[7] Feature Regrouping for CCA-Based Feature Fusion and Extraction Through Normalized Cut
Wu Zuobin
Mao Kezhi
Ng, Gee-Wah
2018 21ST INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2018, : 2275 - 2282
[8] Biometric bits extraction through phase quantization based on feature level fusion
Hyunggu Lee
Andrew Beng Jin Teoh
Jaihie Kim
Telecommunication Systems, 2011, 47 : 255 - 273
[9] Biometric bits extraction through phase quantization based on feature level fusion
Lee, Hyunggu
Teoh, Andrew Beng Jin
Kim, Jaihie
TELECOMMUNICATION SYSTEMS, 2011, 47 (3-4) : 255 - 273
[10] An unsupervised feature extraction and fusion framework for multi-source data based on copula theory
Chen, Xiuwei
Lai, Li
Luo, Maokang
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2025, 180

← 1 2 3 4 5 →