Generative-Adversarial Networks for Low-Resource Language Data Augmentation in Machine Translation

被引:0
作者
Zeng, Linda [1 ]
机构
[1] Harker Sch, San Jose, CA 95124 USA
来源
2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024 | 2024年
关键词
Data augmentation; generative adversarial networks; low-resource languages; natural language processing; neural machine translation;
D O I
10.1109/ICNLP60986.2024.10692876
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural Machine Translation (NMT) systems struggle when translating to and from low-resource languages, which lack large-scale data corpora for models to use for training. As manual data curation is expensive and time-consuming, we propose utilizing a generative-adversarial network (GAN) to augment low-resource language data. When training on a very small amount of language data (under 20,000 sentences) in a simulated low-resource setting, our model shows potential at data augmentation, generating monolingual language data with sentences such as "ask me that healthy lunch im cooking up," and "my grandfather work harder than your grandfather before." Our novel data augmentation approach takes the first step in investigating the capability of GANs in low-resource NMT, and our results suggest that there is promise for future extension of GANs to low-resource NMT.
引用
收藏
页码:11 / 18
页数:8
相关论文
共 31 条
  • [1] Ahamad A, 2019, NAACL HLT 2019: THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, P53
  • [2] [Anonymous], 2019, P WORKSH METH OPT EV
  • [3] [Anonymous], 2014, P 2014 C EMP METH NA, DOI DOI 10.48550/ARXIV.1406.1078
  • [4] Betti F., 2020, P 13 INT C NAT LANG, P29
  • [5] Cai D., 2021, P 59 ANN M ASS COMP, V1, P7307, DOI [10.18653/v1/2021.acl-long.567, DOI 10.18653/V1/2021.ACL-LONG.567]
  • [6] Cai D., 2021, Neural machine translation with monolingual translation memory
  • [7] Chen W.-R., 2021, Machine translation of lowresource indo-european languages
  • [8] Chollet, 2015, KERAS
  • [9] Currey A., 2017, P 2 C MACH TRANSL, P148
  • [10] Dione CB, 2021, IWPT 2021: THE 17TH INTERNATIONAL CONFERENCE ON PARSING TECHNOLOGIES: PROCEEDINGS OF THE CONFERENCE (INCLUDING THE IWPT 2021 SHARED TASK), P84