Cross-Lingual Transfer Learning for Statistical Type Inference

被引:0
|
作者
Li, Zhiming [1 ]
Xie, Xiaofei [2 ]
Li, Haoliang [3 ]
Xu, Zhengzi [1 ]
Li, Yi [1 ]
Liu, Yang [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Singapore Management Univ, Singapore, Singapore
[3] City Univ Hong Kong, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2022 | 2022年
基金
新加坡国家研究基金会;
关键词
Deep Learning; Transfer Learning; Type Inference;
D O I
10.1145/3533767.3534411
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Hitherto statistical type inference systems rely thoroughly on supervised learning approaches, which require laborious manual effort to collect and label large amounts of data. Most Turing-complete imperative languages share similar control- and data-flow structures, which make it possible to transfer knowledge learned from one language to another. In this paper, we propose a cross-lingual transfer learning framework, Plato, for statistical type inference, which allows us to leverage prior knowledge learned from the labeled dataset of one language and transfer it to the others, e.g., Python to JavaScript, Java to JavaScript, etc. Plato is powered by a novel kernelized attention mechanism to constrain the attention scope of the backbone Transformer model such that model is forced to base its prediction on commonly shared features among languages. In addition, we propose the syntax enhancement that augments the learning on the feature overlap among language domains. Furthermore, Plato can also be used to improve the performance of the conventional supervised-based type inference by introducing cross-language augmentation, which enables the model to learn more general features across multiple languages. We evaluated Plato under two settings: 1) under the cross-domain scenario that the target language data is not labeled or labeled partially, the results show that Plato outperforms the state-of-the-art domain transfer techniques by a large margin, e.g., it improves the Python to Type-Script baseline by +14.6%@EM, +18.6%@weighted-F1, and 2) under the conventional monolingual supervised scenario, Plato improves the Python baseline by +4.10%@EM, +1.90%@weighted-F1 with the introduction of the cross-lingual augmentation.
引用
收藏
页码:239 / 250
页数:12
相关论文
共 50 条
  • [1] Cross-Lingual Transfer Learning Framework for Program Analysis
    Li, Zhiming
    2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 1074 - 1078
  • [2] Cross-lingual Transfer Learning for Semantic Role Labeling in Russian
    Alimova, Ilseyar
    Tutubalina, Elena
    Kirillovich, Alexander
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE COMPUTATIONAL LINGUISTICS IN BULGARIA (CLIB '20), 2020, : 72 - 80
  • [3] CROSS-LINGUAL TRANSFER LEARNING FOR SPOKEN LANGUAGE UNDERSTANDING
    Quynh Ngoc Thi Do
    Gaspers, Judith
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5956 - 5960
  • [4] Cross-Lingual Transfer Learning for Affective Spoken Dialogue Systems
    Gjoreski, Kristijan
    Gjoreski, Aleksandar
    Kraljevski, Ivan
    Hirschfeld, Diane
    INTERSPEECH 2019, 2019, : 1916 - 1920
  • [5] Cross-Lingual Transfer Learning for Medical Named Entity Recognition
    Ding, Pengjie
    Wang, Lei
    Liang, Yaobo
    Lu, Wei
    Li, Linfeng
    Wang, Chun
    Tang, Buzhou
    Yan, Jun
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT I, 2020, 12112 : 403 - 418
  • [6] Cross-lingual learning for text processing: A survey
    Pikuliak, Matus
    Simko, Marian
    Bielikova, Maria
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 165
  • [7] Domain Mismatch Doesn't Always Prevent Cross-Lingual Transfer Learning
    Edmiston, Daniel
    Keung, Phillip
    Smith, Noah A.
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 892 - 899
  • [8] Cross-lingual transfer learning during supervised training in low resource scenarios
    Das, Amit
    Hasegawa-Johnson, Mark
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3531 - 3535
  • [9] Speech Recognition for Turkic Languages Using Cross-Lingual Transfer Learning from Kazakh
    Orel, Daniil
    Yeshpanov, Rustem
    Varol, Huseyin Atakan
    2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 174 - 182
  • [10] Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization
    Rivera-Zavala, Renzo M.
    Martinez, Paloma
    BMC BIOINFORMATICS, 2021, 22 (SUPPL 1)