Efficient Data Augmentation via lexical matching for boosting performance on Statistical Machine Translation for Indic and a Low-resource language

被引:0
作者
Saxena, Shefali [1 ]
Gupta, Ayush [1 ]
Daniel, Philemon [1 ]
机构
[1] Natl Inst Technol Hamirpur, Dept Elect & Commun Engn, Hamirpur, India
关键词
Data Augmentation; Low-resource language; Machine Translation; Evaluation;
D O I
10.1007/s11042-023-18086-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the fast advancement of AI technology in recent years, many excellent Data Augmentation (DA) approaches have been investigated to increase data efficiency in Natural Language Processing (NLP). The reliance on a large amount of data prohibits NLP models from performing tasks such as labelling enormous amounts of textual data, which require a substantial amount of time, money, and human resources; hence, a better model requires more data. Text DA technique rectifies the data by extending it, enhancing the model's accuracy and resilience. A novel lexical-based matching approach is the cornerstone of this work; it is used to improve the quality of the Machine Translation (MT) system. This study includes resource-rich Indic (i.e., Indo-Aryan and Dravidian language families) to examine the proposed techniques. Extensive experiments on a range of language pairs depict that the proposed method significantly improves scores in the enhanced dataset compared to the baseline system's BLEU, METEOR and ROUGE evaluation scores.
引用
收藏
页码:64255 / 64269
页数:15
相关论文
共 30 条
  • [1] Andreas J, 2020, Arxiv, DOI arXiv:1904.09545
  • [2] Artetxe M, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P3632
  • [3] Banerjee S., 2005, P ACL WORKSH INTR EX, P65
  • [4] Chauhan S, 2021, Arxiv, DOI arXiv:2103.11596
  • [5] Analysis of Neural Machine Translation KANGRI Language by Unsupervised and Semi Supervised Methods
    Chauhan, Shweta
    Saxena, Shefali
    Daniel, Philemon
    [J]. IETE JOURNAL OF RESEARCH, 2023, 69 (10) : 6867 - 6877
  • [6] Chen JA, 2021, Arxiv, DOI arXiv:2106.07499
  • [7] Experience of neural machine translation between Indian languages
    Dewangan, Shubham
    Alva, Shreya
    Joshi, Nitish
    Bhattacharyya, Pushpak
    [J]. MACHINE TRANSLATION, 2021, 35 (01) : 71 - 99
  • [8] Ding B, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P6045
  • [9] Dinu G, 2019, Arxiv, DOI arXiv:1906.01105
  • [10] Fadaee M, 2017, Arxiv, DOI arXiv:1705.00440