Generative-Adversarial Networks for Low-Resource Language Data Augmentation in Machine Translation

被引:0
作者
Zeng, Linda [1 ]
机构
[1] Harker Sch, San Jose, CA 95124 USA
来源
2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024 | 2024年
关键词
Data augmentation; generative adversarial networks; low-resource languages; natural language processing; neural machine translation;
D O I
10.1109/ICNLP60986.2024.10692876
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural Machine Translation (NMT) systems struggle when translating to and from low-resource languages, which lack large-scale data corpora for models to use for training. As manual data curation is expensive and time-consuming, we propose utilizing a generative-adversarial network (GAN) to augment low-resource language data. When training on a very small amount of language data (under 20,000 sentences) in a simulated low-resource setting, our model shows potential at data augmentation, generating monolingual language data with sentences such as "ask me that healthy lunch im cooking up," and "my grandfather work harder than your grandfather before." Our novel data augmentation approach takes the first step in investigating the capability of GANs in low-resource NMT, and our results suggest that there is promise for future extension of GANs to low-resource NMT.
引用
收藏
页码:11 / 18
页数:8
相关论文
共 31 条
  • [11] Elneima A., 2022, P 7 AR NAT LANG PROC, P76
  • [12] Data Augmentation for Low-Resource Neural Machine Translation
    Fadaee, Marzieh
    Bisazza, Arianna
    Monz, Christof
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 567 - 573
  • [13] Fu Z., 2021, A theoretical analysis of the repetition problem in text generation
  • [14] Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
  • [15] Gu Jiatao, 2018, P 2018 C N AM CHAPT, V1, P344, DOI DOI 10.18653/V1/N18-1032
  • [16] Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
  • [17] Kalchbrenner N, 2013, P 2013 C EMPIRICAL M, P1700
  • [18] Kingma D. P., ADAM METHOD STOCHAST
  • [19] BLEU: a method for automatic evaluation of machine translation
    Papineni, K
    Roukos, S
    Ward, T
    Zhu, WJ
    [J]. 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2002, : 311 - 318
  • [20] Ranathunga S., 2021, Neural machine translation for low-resource languages: A survey