Java']JavaBERT: Training a transformer-based model for the Java']Java programming language

被引:9
作者
De Sousa, Nelson Tavares [1 ]
Hasselbring, Wilhelm [1 ]
机构
[1] Univ Kiel, Software Engn Grp, Kiel, Germany
来源
2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING WORKSHOPS (ASEW 2021) | 2021年
关键词
D O I
10.1109/ASEW52652.2021.00028
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Code quality is and will be a crucial factor while developing new software code, requiring appropriate tools to ensure functional and reliable code. Machine learning techniques are still rarely used for software engineering tools, missing out the potential benefits of its application. Natural language processing has shown the potential to process text data regarding a variety of tasks. We argue, that such models can also show similar benefits for software code processing. In this paper, we investigate how models used for natural language processing can be trained upon software code. We introduce a data retrieval pipeline for software code and train a model upon Java software code. The resulting model, JavaBERT, shows a high accuracy on the masked language modeling task showing its potential for software engineering tools.
引用
收藏
页码:90 / 95
页数:6
相关论文
共 17 条
  • [1] [Anonymous], 2007, COMPILERS PRINCIPLES
  • [2] Learning a Static Analyzer from Data
    Bielik, Pavol
    Raychev, Veselin
    Vechev, Martin
    [J]. COMPUTER AIDED VERIFICATION, CAV 2017, PT I, 2017, 10426 : 233 - 253
  • [3] Devlin Jacob, 2018, ACL
  • [4] Feng Zhangyin, 2020, ARXIV200208155V4
  • [5] Huo X, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P1909
  • [6] Lan Z, 2020, INT C LEARN REPR
  • [7] Liu Y, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P5070
  • [8] Loshchilov I, 2019, 7 INT C LEARN REPR I
  • [9] Morris A., 2004, WER RIL MER WIL IMPR
  • [10] Using (Bio)Metrics to Predict Code Quality Online
    Mueller, Sebastian C.
    Fritz, Thomas
    [J]. 2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2016, : 452 - 463