Research on English-Chinese machine translation shift based on word vector similarity

被引:0
作者
Ma, Qingqing [1 ]
机构
[1] Nanchang Inst Technol, Changbei Econ Dev Zone,901 Hero Ave, Nanchang 330044, Jiangxi, Peoples R China
关键词
Word vector similarity; English-Chinese translation; Machine translation;
D O I
10.1007/s10015-024-00964-5
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
In English-Chinese machine translation shift, the processing of out-of-vocabulary (OOV) words has a great impact on translation quality. Aiming at OOV, this paper proposed a method based on word vector similarity, calculated the word vector similarity based on the Skip-gram model, used the most similar words to replace OOV in the source sentences, and used the replaced corpus to train the Transformer model. It was found that when the original corpus was used for training, the bilingual evaluation understudy-4 (BLEU-4) of the Transformer model on NIST2006 and NIST2008 was 37.29 and 30.73, respectively. However, when the word vector similarity was used for processing and low-frequency OOV words were retained, the BLEU-4 of the Transformer model on NIST2006 and NIST2008 was improved to 37.36 and 30.78 respectively, showing an increase. Moreover, the translation quality obtained by retaining low-frequency OOV words was better than that obtained by removing low-frequency OOV words. The experimental results prove that the English-Chinese machine translation shift method based on word vector similarity is reliable and can be applied in practice.
引用
收藏
页码:585 / 589
页数:5
相关论文
共 20 条
  • [1] Al-Saqqa S, 2019, 2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, ROBOTICS AND CONTROL, AIRC 2019, P39, DOI 10.1145/3388218.3388229
  • [2] Morpheme Embedding for Bahasa Indonesia Using Modified Byte Pair Encoding
    Amalia, Amalia
    Sitompul, Opim Salim
    Mantoro, Teddy
    Nababan, Erna Budhiarti
    [J]. IEEE ACCESS, 2021, 9 : 155699 - 155710
  • [3] Arabic-Chinese Neural Machine Translation: Romanized Arabic as Subword Unit for Arabic-sourced Translation
    Aqlan, Fares
    Fan, Xiaoping
    Alqwbani, Abdullah
    Al-Mansoub, Akram
    [J]. IEEE ACCESS, 2019, 7 : 133122 - 133135
  • [4] Empirical Analysis of Phrase-Based Statistical Machine Translation System for English to Hindi Language
    Babhulgaonkar, Arun
    Sonavane, Shefali
    [J]. VIETNAM JOURNAL OF COMPUTER SCIENCE, 2022, 09 (02) : 135 - 162
  • [5] Attention-Based Neural Machine Translation Approach for Low-Resourced Indic Languages-A Case of Sanskrit to Hindi Translation
    Bakarola, Vishvajit
    Nasriwala, Jitendra
    [J]. SMART SYSTEMS: INNOVATIONS IN COMPUTING (SSIC 2021), 2022, 235 : 565 - 572
  • [6] Chen S., 2020, CHIN J MED SCI RES M, V33, pE005
  • [7] Compressed-Transformer: Distilling Knowledge from Transformer for Neural Machine Translation
    Chen, Yuan
    Rong, Pan
    [J]. 2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 131 - 137
  • [8] DEEP PERFORMER: SCORE-TO-AUDIO MUSIC PERFORMANCE SYNTHESIS
    Dong, Hao-Wen
    Zhou, Cong
    Berg-Kirkpatrick, Taylor
    McAuley, Julian
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 951 - 955
  • [9] Goel G, 2020, 2020 INT C EMERGING, P1
  • [10] Khaikal MF., 2021, INFORMATIKA MULAWARM, V16, P49