Summarizing source code through heterogeneous feature fusion and extraction

被引:1
|
作者
Guo, Juncai [1 ]
Liu, Jin [1 ]
Liu, Xiao [2 ]
Li, Li [3 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[2] Deakin Univ, Sch Informat Technol, Melbourne, Australia
[3] Beihang Univ, Sch Software, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Code summarization; Feature fusion; Heterogeneous graph; Graph neural network; Transformer;
D O I
10.1016/j.inffus.2023.102058
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Code summarization, which seeks to automatically produce a succinct natural-language description to summarize the functionality of source code, plays an essential role in maintaining the software. Currently, plentiful approaches have been proposed to first encode the source code based on its Abstract Syntax Tree (AST), and then decode it into a textual summary. However, most existing works interpret the AST-based syntax structure as a homogeneous graph, without discriminating the different relations between graph nodes (e.g., the parent-child and sibling relations) in a heterogeneous way. To mitigate this issue, this paper proposes HetCoS to extract the syntactic and sequential features of source code by exploring its inherent heterogeneity for code summarization. Specifically, we first build a Heterogeneous Code Graph (HCG) that fuses the syntax structure and code sequence with eight types of edges/relations designed between graph nodes. Moreover, we present a heterogeneous graph neural network for capturing the diverse relations in HCG. The represented HCG is then fed into a Transformer decoder, followed by a multi-head attention-based copying mechanism to support high-quality summary generation. Extensive experiments on the major Java and Python datasets illustrate the superiority of our approach over sixteen state-of-the-art baselines. To promote reproducibility studies, we make the implementation of HetCoS publicly accessible at https://github.com/GJCEXP/HETCOS.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Summarizing source code with Heterogeneous Syntax Graph and dual position
    Guo, Juncai
    Liu, Jin
    Liu, Xiao
    Wan, Yao
    Li, Li
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (05)
  • [2] Summarizing Source Code from Structure and Context
    Hou, Shifu
    Chen, Lingwei
    Ye, Yanfang
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [3] Summarizing source code with hierarchical code representation
    Zhou, Ziyi
    Yu, Huiqun
    Fan, Guisheng
    Huang, Zijie
    Yang, Xingguang
    INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 143
  • [4] Large-scale predicting protein functions through heterogeneous feature fusion
    Zheng, Rongtao
    Huang, Zhijian
    Deng, Lei
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (04)
  • [5] Federated learning based multi-task feature fusion framework for code expressive semantic extraction
    Deng, Fengyang
    Fu, Cai
    Qian, Yekui
    Yang, Jia
    He, Shuai
    Xu, Hao
    SOFTWARE-PRACTICE & EXPERIENCE, 2022, 52 (08): : 1849 - 1866
  • [6] Source Code Vulnerability Detection Based on Joint Graph and Multimodal Feature Fusion
    Jin, Dun
    He, Chengwan
    Zou, Quan
    Qin, Yan
    Wang, Boshu
    ELECTRONICS, 2025, 14 (05):
  • [7] Feature Regrouping for CCA-Based Feature Fusion and Extraction Through Normalized Cut
    Wu Zuobin
    Mao Kezhi
    Ng, Gee-Wah
    2018 21ST INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2018, : 2275 - 2282
  • [8] Biometric bits extraction through phase quantization based on feature level fusion
    Hyunggu Lee
    Andrew Beng Jin Teoh
    Jaihie Kim
    Telecommunication Systems, 2011, 47 : 255 - 273
  • [9] Biometric bits extraction through phase quantization based on feature level fusion
    Lee, Hyunggu
    Teoh, Andrew Beng Jin
    Kim, Jaihie
    TELECOMMUNICATION SYSTEMS, 2011, 47 (3-4) : 255 - 273
  • [10] An unsupervised feature extraction and fusion framework for multi-source data based on copula theory
    Chen, Xiuwei
    Lai, Li
    Luo, Maokang
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2025, 180