Cross-Lingual Transfer Learning for Statistical Type Inference

被引:0
|
作者
Li, Zhiming [1 ]
Xie, Xiaofei [2 ]
Li, Haoliang [3 ]
Xu, Zhengzi [1 ]
Li, Yi [1 ]
Liu, Yang [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Singapore Management Univ, Singapore, Singapore
[3] City Univ Hong Kong, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2022 | 2022年
基金
新加坡国家研究基金会;
关键词
Deep Learning; Transfer Learning; Type Inference;
D O I
10.1145/3533767.3534411
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Hitherto statistical type inference systems rely thoroughly on supervised learning approaches, which require laborious manual effort to collect and label large amounts of data. Most Turing-complete imperative languages share similar control- and data-flow structures, which make it possible to transfer knowledge learned from one language to another. In this paper, we propose a cross-lingual transfer learning framework, Plato, for statistical type inference, which allows us to leverage prior knowledge learned from the labeled dataset of one language and transfer it to the others, e.g., Python to JavaScript, Java to JavaScript, etc. Plato is powered by a novel kernelized attention mechanism to constrain the attention scope of the backbone Transformer model such that model is forced to base its prediction on commonly shared features among languages. In addition, we propose the syntax enhancement that augments the learning on the feature overlap among language domains. Furthermore, Plato can also be used to improve the performance of the conventional supervised-based type inference by introducing cross-language augmentation, which enables the model to learn more general features across multiple languages. We evaluated Plato under two settings: 1) under the cross-domain scenario that the target language data is not labeled or labeled partially, the results show that Plato outperforms the state-of-the-art domain transfer techniques by a large margin, e.g., it improves the Python to Type-Script baseline by +14.6%@EM, +18.6%@weighted-F1, and 2) under the conventional monolingual supervised scenario, Plato improves the Python baseline by +4.10%@EM, +1.90%@weighted-F1 with the introduction of the cross-lingual augmentation.
引用
收藏
页码:239 / 250
页数:12
相关论文
共 50 条
  • [41] A joint learning approach with knowledge injection for zero-shot cross-lingual hate speech detection
    Pamungkas, Endang Wahyu
    Basile, Valerio
    Patti, Viviana
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (04)
  • [42] Towards an entity relation extraction framework in the cross-lingual context
    Yu, Chuanming
    Xue, Haodong
    Wang, Manyi
    An, Lu
    ELECTRONIC LIBRARY, 2021, 39 (03) : 411 - 434
  • [43] Improving Transfer Learning in Cross Lingual Opinion Analysis Through Negative Transfer Detection
    Gui, Lin
    Lu, Qin
    Xu, Ruifeng
    Wei, Qikang
    Cao, Yuhui
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2015, 2015, 9403 : 394 - 406
  • [44] Cross lingual transfer learning for sentiment analysis of Italian TripAdvisor reviews
    Catelli, Rosario
    Bevilacqua, Luca
    Mariniello, Nicola
    di Carlo, Vladimiro Scotto
    Magaldi, Massimo
    Fujita, Hamido
    De Pietro, Giuseppe
    Esposito, Massimo
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 209
  • [45] Statistical Type Inference for Incomplete Programs
    Peng, Yaohui
    Xie, Jing
    Yang, Qiongling
    Guo, Hanwen
    Li, Qingan
    Xue, Jingling
    Yuan, Mengting
    PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 720 - 732
  • [46] Towards zero-shot cross-lingual named entity disambiguation
    Barrena, Ander
    Soroa, Aitor
    Agirre, Eneko
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 184
  • [47] Assorted Attention Network for Cross-Lingual Language-to-Vision Retrieval
    Yu, Tan
    Yang, Yi
    Fei, Hongliang
    Li, Yi
    Chen, Xiaodong
    Li, Ping
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 2444 - 2454
  • [48] Cross-Lingual Image Caption Generation Based on Visual Attention Model
    Wang, Bin
    Wang, Cungang
    Zhang, Qian
    Su, Ying
    Wang, Yang
    Xu, Yanyan
    IEEE ACCESS, 2020, 8 : 104543 - 104554
  • [49] Cross-lingual Text Classification via Model Translation with Limited Dictionaries
    Xu, Ruochen
    Yang, Yiming
    Liu, Hanxiao
    Hsi, Andrew
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 95 - 104
  • [50] Multilingual Semantic Sourcing using Product Images for Cross-lingual Alignment
    Mangrulkar, Sourab
    Ankith, M. S.
    Sembium, Vivek
    COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 41 - 51