Neural Machine Translation by Fusing Key Information of Text

被引:6
作者
Hu, Shijie [1 ]
Li, Xiaoyu [1 ]
Bai, Jiayu [1 ]
Lei, Hang [1 ]
Qian, Weizhong [1 ]
Hu, Sunqiang [1 ]
Zhang, Cong [2 ]
Kofi, Akpatsa Samuel [1 ]
Qiu, Qian [2 ,3 ]
Zhou, Yong [4 ]
Yang, Shan [5 ]
机构
[1] Univ Elect Sci & Technol China, Sch Informat & Software Engn, Chengdu 610054, Peoples R China
[2] Sichuan Gas Turbine Estab Aero Engine Corp China, Sci & Technol Altitude Simulat Lab, Mianyang 621000, Peoples R China
[3] Northwestern Polytech Univ, Sch Power & Energy, Xian 710072, Peoples R China
[4] Southwest Petr Univ, Sch Comp Sci, Chengdu 610500, Peoples R China
[5] Jackson State Univ, Dept Chem Phys & Atmospher Sci, Jackson, MS 39217 USA
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 74卷 / 02期
关键词
fusion; Key information; neural machine translation; transformer;
D O I
10.32604/cmc.2023.032732
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When the Transformer proposed by Google in 2017, it was first used for machine translation tasks and achieved the state of the art at that time. Although the current neural machine translation model can generate high quality translation results, there are still mistranslations and omissions in the translation of key information of long sentences. On the other hand, the most important part in traditional translation tasks is the translation of key information. In the translation results, as long as the key information is translated accurately and completely, even if other parts of the results are translated incorrect, the final translation results' quality can still be guaran-teed. In order to solve the problem of mistranslation and missed translation effectively, and improve the accuracy and completeness of long sentence translation in machine translation, this paper proposes a key information fused neural machine translation model based on Transformer. The model proposed in this paper extracts the keywords of the source language text separately as the input of the encoder. After the same encoding as the source language text, it is fused with the output of the source language text encoded by the encoder, then the key information is processed and input into the decoder. With incorporating keyword information from the source language sentence, the model's performance in the task of translating long sentences is very reliable. In order to verify the effectiveness of the method of fusion of key information proposed in this paper, a series of experiments were carried out on the verification set. The experimental results show that the Bilingual Evaluation Understudy (BLEU) score of the model proposed in this paper on the Workshop on Machine Translation (WMT) 2017 test dataset is higher than the BLEU score of Transformer proposed by Google on the WMT2017 test dataset. The experimental results show the advantages of the model proposed in this paper.
引用
收藏
页码:2803 / 2815
页数:13
相关论文
共 24 条
  • [1] Alkhouli G., 2016, Research Papers, V1, P54
  • [2] YAKE! Keyword extraction from single documents using multiple local features
    Campos, Ricardo
    Mangaravite, Vitor
    Pasquali, Arian
    Jorge, Alipio
    Nunes, Celia
    Jatowt, Adam
    [J]. INFORMATION SCIENCES, 2020, 509 : 257 - 289
  • [3] A tutorial on the cross-entropy method
    De Boer, PT
    Kroese, DP
    Mannor, S
    Rubinstein, RY
    [J]. ANNALS OF OPERATIONS RESEARCH, 2005, 134 (01) : 19 - 67
  • [4] Ghannay S, 2016, LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P300
  • [5] Gillioz Anthony, 2020, 2020 15th Conference on Computer Science and Information Systems (FedCSIS), P179, DOI 10.15439/2020F20
  • [6] [郭望皓 Guo Wanghao], 2021, [计算机科学与探索, Journal of Frontiers of Computer Science & Technology], V15, P1183
  • [7] A Semantic Supervision Method for Abstractive Summarization
    Hu, Sunqiang
    Li, Xiaoyu
    Deng, Yu
    Peng, Yu
    Lin, Bin
    Yang, Shan
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 69 (01): : 145 - 158
  • [8] Integrating Deep Learning and Machine Translation for Understanding Unrefined Languages
    Ji, HongGeun
    Oh, Soyoung
    Kim, Jina
    Choi, Seong
    Park, Eunil
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (01): : 669 - 678
  • [9] Kalchbrenner Nal, 2013, EMNLP 2013 2013 C EM, P1700
  • [10] Kang C., 2020, CHINESE J INTELLIGEN, V2, P144