More than a framework: Sketching out technical enablers for natural language-based source code generation

被引:0
作者
Yang, Chen [1 ]
Liu, Yan [1 ]
Yin, Changqing [1 ]
机构
[1] Tongji Univ, Sch Software Engn, Caoan Hwy, Shanghai, Peoples R China
关键词
Source code generation; NL2Code; Software engineering; Machine learning application;
D O I
10.1016/j.cosrev.2024.100637
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Natural Language -based Source Code Generation (NLSCG) holds the promise to revolutionize the way how software is developed by means of facilitating a collection of intelligent technical enablers, based on sustained improvements on the natural language to source code pipelines and continuous adoption of new coding paradigms. In recent years, a large variety of NLSCG technical solutions have been proposed, and quite exciting experimental results have been reported. Meanwhile, current researches and initiative application projects in this area reflect a large diversity of NLSCG contexts and of major technical enablers. Such heterogeneity, fragmentation, and vagueness of the NLSCG technical landscape are currently frustrating the full realization of the NLSCG research and application vision. Players in this field could not find systematic guidelines on how to effectively address the "known unknowns" and how to simply spot the "unknown unknowns", which eventually hinder the turning of NLSCG solutions into further research enhancements or production applications. Understanding the context, boundaries, capabilities, and integrations of NLSCG enablers is considered as one of the key drivers for the more practical application of NLSCG models. In this paper, we analyze in detail the natural language to source code pipelines and the evolvement of source code generation tasks, by considering both the problem context and technological aspects. A foresight reference framework for NLSCG is proposed to help handle the source code generation tasks with proper intelligent models. We review the present-day NLSCG technical landscape, as well as the core technical enablers along the source code generation pipelines. Relevant experiments are conducted to validate the role of representative models across different technical enablers on typical datasets, and we finally highlight the contribution of different enablers to code generation capabilities.
引用
收藏
页数:17
相关论文
共 95 条
[61]   A Survey of Automatic Code Generation from Natural Language [J].
Shin, Jiho ;
Nam, Jaechang .
JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2021, 17 (03) :537-555
[62]  
Shin R, 2019, ADV NEUR IN, V32
[63]  
Shin R, 2019, Arxiv, DOI arXiv:1906.11790
[64]  
Si CL, 2025, Arxiv, DOI arXiv:2403.03163
[65]  
Soliman AS, 2022, Journal of Engineering and Applied Science, V69, DOI [10.1186/s44147-022-00159-4, DOI 10.1186/S44147-022-00159-4]
[66]  
Stehnii A, 2017, Generation of Code from Text Description with Syntactic Parsing and Tree2Tree Model
[67]  
Sun YB, 2018, Arxiv, DOI arXiv:1804.08338
[68]  
Sun ZY, 2020, AAAI CONF ARTIF INTE, V34, P8984
[69]  
Sun ZY, 2019, AAAI CONF ARTIF INTE, P7055
[70]  
Tang Xiangru, 2019, P 31 INT C SOFTW ENG, P385, DOI [10.18293/SEKE2019-170, DOI 10.18293/SEKE2019-170]