On ML-Based Program Translation: Perils and Promises

被引:1
作者
Malyala, Aniketh [1 ]
Zhou, Katelyn [1 ]
Ray, Baishakhi [2 ]
Chakraborty, Saikat [3 ]
机构
[1] Silver Creek High Sch, San Jose, CA 95121 USA
[2] Columbia Univ, New York, NY USA
[3] Microsoft Res, Redmond, WA USA
来源
2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING-NEW IDEAS AND EMERGING RESULTS, ICSE-NIER | 2023年
关键词
Code generation; code translation; program transformation;
D O I
10.1109/ICSE-NIER58687.2023.00017
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
With the advent of new and advanced programming languages, it becomes imperative to migrate legacy software to new programming languages. Unsupervised Machine Learning-based Program Translation could play an essential role in such migration, even without a sufficiently sizeable reliable corpus of parallel source code. However, these translators are far from perfect due to their statistical nature. This work investigates unsupervised program translators and where and why they fail. With in-depth error analysis of such failures, we have identified that the cases where such translators fail follow a few particular patterns. With this insight, we develop a rule-based program mutation engine, which pre-processes the input code if the input follows specific patterns and post-process the output if the output follows certain patterns. We show that our code processing tool, in conjunction with the program translator, can form a hybrid program translator and significantly improve the state-of-the-art. In the future, we envision an end-to-end program translation tool where programming domain knowledge can be embedded into an ML-based translation pipeline using pre- and post-processing steps.
引用
收藏
页码:60 / 65
页数:6
相关论文
共 35 条
  • [1] Aggarwal K., 2015, PeerJ
  • [2] Ahmad WU, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P2655
  • [3] Ahmed T., 2022, IEEE Transactions on Software Engineering (TSE)
  • [4] Ahmed T, 2022, Arxiv, DOI arXiv:2104.14671
  • [5] The Effectiveness of Supervised Machine Learning Algorithms in Predicting Software Refactoring
    Aniche, Mauricio
    Maziero, Erick
    Durelli, Rafael
    Durelli, Vinicius H. S.
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (04) : 1432 - 1450
  • [6] babeljs.io, BAB JAV COMP
  • [7] Chakraborty S., 2022, 2022 ACM JOINT EUROP
  • [8] CODIT: Code Editing With Tree-Based Neural Models
    Chakraborty, Saikat
    Ding, Yangruibo
    Allamanis, Miltiadis
    Ray, Baishakhi
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (04) : 1385 - 1399
  • [9] SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair
    Chen, Zimin
    Kommrusch, Steve
    Tufano, Michele
    Pouchet, Louis-Noel
    Poshyvanyk, Denys
    Monperrus, Martin
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (09) : 1943 - 1959
  • [10] docs.python.org, 2TO3 AUT PYTH 2 3 CO