A hybrid code representation learning approach for predicting method names

被引:5
|
作者
Zhang, Fengyi [1 ,2 ]
Chen, Bihuan [1 ]
Li, Rongfan [1 ,2 ]
Peng, Xin [1 ,2 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Fudan Univ, Shanghai Key Lab Data Sci, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Code representation learning; Method name prediction; Deep learning;
D O I
10.1016/j.jss.2021.111011
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Program semantic properties such as class names, method names, and variable names and types play an important role in software development and maintenance. Method names are of particular importance because they provide the cornerstone of abstraction for developers to communicate with each other for various purposes (e.g., code review and program comprehension). Existing method name prediction approaches often represent code as lexical tokens or syntactical AST (abstract syntax tree) paths, making them difficult to learn code semantics and hindering their effectiveness in predicting method names. Initial attempts have been made to represent code as execution traces to capture code semantics, but suffer scalability in collecting execution traces. In this paper, we propose a hybrid code representation learning approach, named Meth2Seq, to encode a method as a sequence of distributed vectors. Meth2Seq represents a method as (1) a bag of paths on the program dependence graph, (2) a sequence of typed intermediate representation statements and (3) a sentence of natural language comment, to scalably capture code semantics. The learned sequence of vectors of a method is fed to a decoder model to predict method names. Our evaluation with a dataset of 280.5K methods in 67 Java projects has demonstrated that Meth2Seq outperforms the two state-of-the-art code representation learning approaches in F1-score by 92.6% and 36.6%, while also outperforming two state-of-the-art method name prediction approaches in F1-score by 85.6% and 178.1%. (C) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Exploiting Method Names to Improve Code Summarization: A Deliberation Multi-Task Learning Approach
    Xie, Rui
    Ye, Wei
    Sun, Jinan
    Zhang, Shikun
    2021 IEEE/ACM 29TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2021), 2021, : 138 - 148
  • [2] MetaTPTrans: A Meta Learning Approach for Multilingual Code Representation Learning
    Pian, Weiguo
    Peng, Hanyu
    Tang, Xunzhu
    Sun, Tiezhu
    Tian, Haoye
    Habib, Andrew
    Klein, Jacques
    Bissyande, Tegawende F.
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4, 2023, : 5239 - 5247
  • [3] Evaluating Representation Learning of Code Changes for Predicting Patch Correctness in Program Repair
    Tian, Haoye
    Liu, Kui
    Kabore, Abdoul Kader
    Koyuncu, Anil
    Li, Li
    Klein, Jacques
    Bissyande, Tegawende F.
    2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2020), 2020, : 981 - 992
  • [4] Robust Representation Learning of Biomedical Names
    Phan, Minh C.
    Sun, Aixin
    Tay, Yi
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3275 - 3285
  • [5] Predicting bugs in source code changes with incremental learning method
    Yuan, Zi
    Yu, Lili
    Liu, Chao
    Zhang, Linghua
    Journal of Software, 2013, 8 (07) : 1620 - 1633
  • [6] Contrastive Code Representation Learning
    Jain, Paras
    Jain, Ajay
    Zhang, Tianjun
    Abbeel, Pieter
    Gonzalez, Joseph E.
    Stoica, Ion
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5954 - 5971
  • [7] Contextuality of Code Representation Learning
    Li, Yi
    Wang, Shaohua
    Nguyen, Tien N.
    2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 548 - 559
  • [8] Predicting Lumbar Spondylolisthesis: A Hybrid Deep Learning Approach
    Saravagi, Deepika
    Agrawal, Shweta
    Saravagi, Manisha
    Jain, Sanjiv K.
    Sharma, Bhisham
    Mehbodniya, Abolfazl
    Chowdhury, Subrata
    Webber, Julian L.
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 37 (02): : 2133 - 2151
  • [9] A Hybrid Approach To Detect Code Smells using Deep Learning
    Hadj-Kacem, Mouna
    Bouassida, Nadia
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON EVALUATION OF NOVEL APPROACHES TO SOFTWARE ENGINEERING, 2018, : 137 - 146
  • [10] A Hybrid Malicious Code Detection Method based on Deep Learning
    Li, Yuancheng
    Ma, Rong
    Jiao, Runhai
    INTERNATIONAL JOURNAL OF SECURITY AND ITS APPLICATIONS, 2015, 9 (05): : 205 - 215