DCServCG: A data-centric service code generation using deep learning

被引:8
作者
Alizadehsani, Zakieh [1 ,2 ]
Ghaemi, Hadi [3 ]
Shahraki, Amin [4 ]
Gonzalez-Briones, Alfonso [1 ,5 ]
Corchado, Juan M. [1 ,5 ]
机构
[1] Air Inst, IoT Digital Innovat Hub, Salamanca, Spain
[2] Univ Salamanca, Fac Sci, Salamanca, Spain
[3] Ferdowsi Univ Mashhad, Comp Engn Dept, Mashhad, Iran
[4] Univ Oslo, Dept Informat, Oslo, Norway
[5] Univ Salamanca, BISITE Res Grp, Salamanca, Spain
关键词
Code auto-completion; Transformers; Language modeling; SOA; Web service; Code mining;
D O I
10.1016/j.engappai.2023.106304
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern software development paradigms, including Service-Oriented Architecture (SOA), tend to make use of available services e.g., web service Application Programming Interfaces (APIs) to generate new software. Thus, for the further advancement of SOA, the development of accurate automatic tasks, such as service discovery and composition, is necessary. Most of these automated tasks rely heavily on web service metadata annotation. The lack of machine-readable documentation and structured metadata reduces the accuracy and volume of automatic data annotation, negatively affecting the performance of automated SOA tasks. This study aims to propose automatic code completion for improving web service-based systems by identifying and capturing service usage collected from public repositories that share Open Source Software (OSS). To this end, a Data-Centric Service Code Generation (DCServCG) model is proposed to improve old-fashioned, general-purpose code generators that neglect essential service-based code characteristics e.g., sequence overlap and bias issues. DCServCG takes advantage of the data-centric concept, i.e., conditional text generation, to overcome the mentioned issues. We have evaluated the approach from the point of view of language modeling metrics. The obtained results indicate that the usage of the data-centric approach reduces perplexity by 1.125. Moreover, the DCServCG model uses de-noising and conditional text generation, which is trained on the transformer by distilling the knowledge, DistilGPT2 (82M parameters) trained faster and its perplexity is 0.363 lower than ServCG (124M parameters) without de-noising and conditional text generation, which lower perplexity value indicates better model generalization performance.
引用
收藏
页数:17
相关论文
共 67 条
[1]  
Akkiraju R., 2005, Web Service Semantics - WSDL-S: W3C Member Submission
[2]  
Alizadehsani Z., 2022, 2022 48 EUR C SOFTW
[3]   A Survey of Machine Learning for Big Code and Naturalness [J].
Allamanis, Miltiadis ;
Barr, Earl T. ;
Devanbu, Premkumar ;
Sutton, Charles .
ACM COMPUTING SURVEYS, 2018, 51 (04)
[4]  
Allamanis M, 2013, IEEE WORK CONF MIN S, P207, DOI 10.1109/MSR.2013.6624029
[5]  
Alsmadi I., 2022, ADVERSARIAL MACHINE
[6]   Research Challenges of Web Service Composition [J].
Alwasouf, Ali A. ;
Kumar, Deepak .
SOFTWARE ENGINEERING (CSI 2015), 2019, 731 :681-689
[7]  
[Anonymous], 2019, JAV PROV LEX PARS
[8]  
[Anonymous], 2014, PROGR DAT
[9]   A data-centric review of deep transfer learning with applications to text data [J].
Bashath, Samar ;
Perera, Nadeesha ;
Tripathi, Shailesh ;
Manjang, Kalifa ;
Dehmer, Matthias ;
Streib, Frank Emmert .
INFORMATION SCIENCES, 2022, 585 :498-528
[10]  
Belinkov Y, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P877