Cross-Lingual Transfer Learning for Statistical Type Inference

被引：0

作者：

Li, Zhiming ^{[1
]}

Xie, Xiaofei ^{[2
]}

Li, Haoliang ^{[3
]}

Xu, Zhengzi ^{[1
]}

Li, Yi ^{[1
]}

Liu, Yang ^{[1
]}

机构：

[1] Nanyang Technol Univ, Singapore, Singapore

[2] Singapore Management Univ, Singapore, Singapore

[3] City Univ Hong Kong, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2022 | 2022年

基金：

新加坡国家研究基金会;

关键词：

Deep Learning; Transfer Learning; Type Inference;

D O I：

10.1145/3533767.3534411

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Hitherto statistical type inference systems rely thoroughly on supervised learning approaches, which require laborious manual effort to collect and label large amounts of data. Most Turing-complete imperative languages share similar control- and data-flow structures, which make it possible to transfer knowledge learned from one language to another. In this paper, we propose a cross-lingual transfer learning framework, Plato, for statistical type inference, which allows us to leverage prior knowledge learned from the labeled dataset of one language and transfer it to the others, e.g., Python to JavaScript, Java to JavaScript, etc. Plato is powered by a novel kernelized attention mechanism to constrain the attention scope of the backbone Transformer model such that model is forced to base its prediction on commonly shared features among languages. In addition, we propose the syntax enhancement that augments the learning on the feature overlap among language domains. Furthermore, Plato can also be used to improve the performance of the conventional supervised-based type inference by introducing cross-language augmentation, which enables the model to learn more general features across multiple languages. We evaluated Plato under two settings: 1) under the cross-domain scenario that the target language data is not labeled or labeled partially, the results show that Plato outperforms the state-of-the-art domain transfer techniques by a large margin, e.g., it improves the Python to Type-Script baseline by +14.6%@EM, +18.6%@weighted-F1, and 2) under the conventional monolingual supervised scenario, Plato improves the Python baseline by +4.10%@EM, +1.90%@weighted-F1 with the introduction of the cross-lingual augmentation.

引用

页码：239 / 250

页数：12

共 50 条

[31] Zero-Shot Cross-Lingual Transfer in Legal Domain Using Transformer Models
Shaheen, Zein
Wohlgenannt, Gerhard
Mouromtsev, Dmitry
2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2021), 2021, : 450 - 456
[32] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
Byambadorj, Zolzaya
Nishimura, Ryota
Ayush, Altangerel
Ohta, Kengo
Kitaoka, Norihide
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
[33] Knowledge Distillation Based Training of Universal ASR Source Models for Cross-lingual Transfer
Fukuda, Takashi
Thomas, Samuel
INTERSPEECH 2021, 2021, : 3450 - 3454
[34] Zero-shot cross-lingual transfer language selection using linguistic similarity
Eronen, Juuso
Ptaszynski, Michal
Masui, Fumito
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
[35] Transfer language selection for zero-shot cross-lingual abusive language detection
Eronen, Juuso
Ptaszynski, Michal
Masui, Fumito
Arata, Masaki
Leliwa, Gniewosz
Wroczynski, Michal
INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (04)
[36] Generalized Funnelling: Ensemble Learning and Heterogeneous Document Embeddings for Cross-Lingual Text Classification
Moreo, Alejandro
Pedrotti, Andrea
Sebastiani, Fabrizio
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2023, 41 (02)
[37] Heterogeneous Document Embeddings for Cross-Lingual Text Classification
Moreo, Alejandro
Pedrotti, Andrea
Sebastiani, Fabrizio
36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 685 - 688
[38] Cross-lingual articulatory feature information transfer for speech recognition using recurrent progressive neural networks
Morshed, Mahir
Hasegawa-Johnson, Mark
INTERSPEECH 2022, 2022, : 2298 - 2302
[39] SpeakerNet for Cross-lingual Text-Independent Speaker Verification
Habib, Hafsa
Tauseef, Huma
Fahiem, Muhammad Abuzar
Farhan, Saima
Usman, Ghousia
ARCHIVES OF ACOUSTICS, 2020, 45 (04) : 573 - 583
[40] Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models
Lee, Chanhee
Yang, Kisu
Whang, Taesun
Park, Chanjun
Matteson, Andrew
Lim, Heuiseok
APPLIED SCIENCES-BASEL, 2021, 11 (05): : 1 - 15

← 1 2 3 4 5 →