More than a framework: Sketching out technical enablers for natural language-based source code generation

被引:0
作者
Yang, Chen [1 ]
Liu, Yan [1 ]
Yin, Changqing [1 ]
机构
[1] Tongji Univ, Sch Software Engn, Caoan Hwy, Shanghai, Peoples R China
关键词
Source code generation; NL2Code; Software engineering; Machine learning application;
D O I
10.1016/j.cosrev.2024.100637
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Natural Language -based Source Code Generation (NLSCG) holds the promise to revolutionize the way how software is developed by means of facilitating a collection of intelligent technical enablers, based on sustained improvements on the natural language to source code pipelines and continuous adoption of new coding paradigms. In recent years, a large variety of NLSCG technical solutions have been proposed, and quite exciting experimental results have been reported. Meanwhile, current researches and initiative application projects in this area reflect a large diversity of NLSCG contexts and of major technical enablers. Such heterogeneity, fragmentation, and vagueness of the NLSCG technical landscape are currently frustrating the full realization of the NLSCG research and application vision. Players in this field could not find systematic guidelines on how to effectively address the "known unknowns" and how to simply spot the "unknown unknowns", which eventually hinder the turning of NLSCG solutions into further research enhancements or production applications. Understanding the context, boundaries, capabilities, and integrations of NLSCG enablers is considered as one of the key drivers for the more practical application of NLSCG models. In this paper, we analyze in detail the natural language to source code pipelines and the evolvement of source code generation tasks, by considering both the problem context and technological aspects. A foresight reference framework for NLSCG is proposed to help handle the source code generation tasks with proper intelligent models. We review the present-day NLSCG technical landscape, as well as the core technical enablers along the source code generation pipelines. Relevant experiments are conducted to validate the role of representative models across different technical enablers on typical datasets, and we finally highlight the contribution of different enablers to code generation capabilities.
引用
收藏
页数:17
相关论文
共 95 条
[1]  
Inan HA, 2019, Arxiv, DOI arXiv:1909.12764
[2]  
Allal LB, 2023, Arxiv, DOI arXiv:2301.03988
[3]  
Allamanis M, 2018, Arxiv, DOI arXiv:1711.00740
[4]   A Survey of Machine Learning for Big Code and Naturalness [J].
Allamanis, Miltiadis ;
Barr, Earl T. ;
Devanbu, Premkumar ;
Sutton, Charles .
ACM COMPUTING SURVEYS, 2018, 51 (04)
[5]  
[Anonymous], 2022, BLOOM 176B PARAMETER, DOI [10.48550/arXiv.2211.05100, DOI 10.48550/ARXIV.2211.05100]
[6]  
Brown TB, 2020, Arxiv, DOI [arXiv:2005.14165, 10.48550/arXiv.2005.14165, DOI 10.48550/ARXIV.2005.14165]
[7]  
Babanejad N., 2020, P 58 ANN M ASS COMP, P5799, DOI [10.18653/v1/2020.acl-main.514, DOI 10.18653/V1/2020.ACL-MAIN.514]
[8]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[9]  
Beau N, 2022, Arxiv, DOI arXiv:2202.13972
[10]  
Bednarek J, 2019, Arxiv, DOI arXiv:1810.09717