On ML-Based Program Translation: Perils and Promises

被引：1

作者：

Malyala, Aniketh ^{[1
]}

Zhou, Katelyn ^{[1
]}

Ray, Baishakhi ^{[2
]}

Chakraborty, Saikat ^{[3
]}

机构：

[1] Silver Creek High Sch, San Jose, CA 95121 USA

[2] Columbia Univ, New York, NY USA

[3] Microsoft Res, Redmond, WA USA

来源：

2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING-NEW IDEAS AND EMERGING RESULTS, ICSE-NIER | 2023年

关键词：

Code generation; code translation; program transformation;

D O I：

10.1109/ICSE-NIER58687.2023.00017

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

With the advent of new and advanced programming languages, it becomes imperative to migrate legacy software to new programming languages. Unsupervised Machine Learning-based Program Translation could play an essential role in such migration, even without a sufficiently sizeable reliable corpus of parallel source code. However, these translators are far from perfect due to their statistical nature. This work investigates unsupervised program translators and where and why they fail. With in-depth error analysis of such failures, we have identified that the cases where such translators fail follow a few particular patterns. With this insight, we develop a rule-based program mutation engine, which pre-processes the input code if the input follows specific patterns and post-process the output if the output follows certain patterns. We show that our code processing tool, in conjunction with the program translator, can form a hybrid program translator and significantly improve the state-of-the-art. In the future, we envision an end-to-end program translation tool where programming domain knowledge can be embedded into an ML-based translation pipeline using pre- and post-processing steps.

引用

页码：60 / 65

页数：6

共 35 条

[1] Aggarwal K., 2015, PeerJ
[2] Ahmad WU, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P2655
[3] Ahmed T., 2022, IEEE Transactions on Software Engineering (TSE)
[4] Ahmed T, 2022, Arxiv, DOI arXiv:2104.14671
[5] The Effectiveness of Supervised Machine Learning Algorithms in Predicting Software Refactoring
Aniche, Mauricio
Maziero, Erick
Durelli, Rafael
Durelli, Vinicius H. S.
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (04) : 1432 - 1450
[6] babeljs.io, BAB JAV COMP
[7] Chakraborty S., 2022, 2022 ACM JOINT EUROP
[8] CODIT: Code Editing With Tree-Based Neural Models
Chakraborty, Saikat
Ding, Yangruibo
Allamanis, Miltiadis
Ray, Baishakhi
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (04) : 1385 - 1399
[9] SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair
Chen, Zimin
Kommrusch, Steve
Tufano, Michele
Pouchet, Louis-Noel
Poshyvanyk, Denys
Monperrus, Martin
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (09) : 1943 - 1959
[10] docs.python.org, 2TO3 AUT PYTH 2 3 CO

← 1 2 3 4 →