Transformer-Based End-to-End Speech Translation With Rotary Position Embedding

被引:2
|
作者
Li, Xueqing [1 ]
Li, Shengqiang [1 ]
Zhang, Xiao-Lei [1 ,2 ]
Rahardja, Susanto [1 ,3 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Res & Dev Inst, Shenzhen 710072, Peoples R China
[3] Singapore Inst Technol, Engn Cluster, Singapore 138683, Singapore
基金
美国国家科学基金会;
关键词
End-to-end speech translation; rotary position embedding; Transformer;
D O I
10.1109/LSP.2024.3353039
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, many Transformer-based models have been applied to end-to-end speech translation because of their capability to model global dependencies. Position embedding is crucial in Transformer models as it facilitates the modeling of dependencies between elements at various positions within the input sequence. Most position embedding methods employed in speech translation such as the absolute and relative position embedding, often encounter challenges in leveraging relative positional information or adding computational burden to the model. In this letter, we introduce a novel approach by incorporating rotary position embedding into Transformer-based speech translation (RoPE-ST). RoPE-ST first adds absolute position information by multiplying the input vector with rotation matrices, and then implements relative position embedding through the dot-product of the self-attention mechanism. The main advantage of the proposed method over the original method is that rotary position embedding combines the benefits of absolute and relative position embedding, which is suited for position embedding in speech translation tasks. We conduct experiments on a multilingual speech translation corpus MuST-C. Results show that RoPE-ST achieves an average improvement of 2.91 BLEU over the method without rotary position embedding in eight translation directions.
引用
收藏
页码:371 / 375
页数:5
相关论文
共 50 条
  • [1] A Transformer-Based End-to-End Automatic Speech Recognition Algorithm
    Dong, Fang
    Qian, Yiyang
    Wang, Tianlei
    Liu, Peng
    Cao, Jiuwen
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1592 - 1596
  • [2] Transformer-based Long-context End-to-end Speech Recognition
    Hori, Takaaki
    Moritz, Niko
    Hori, Chiori
    Le Roux, Jonathan
    INTERSPEECH 2020, 2020, : 5011 - 5015
  • [3] On-device Streaming Transformer-based End-to-End Speech Recognition
    Oh, Yoo Rhee
    Park, Kiyoung
    INTERSPEECH 2021, 2021, : 967 - 968
  • [4] An Investigation of Positional Encoding in Transformer-based End-to-end Speech Recognition
    Yue, Fengpeng
    Ko, Tom
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [5] TRANSFORMER-BASED ONLINE CTC/ATTENTION END-TO-END SPEECH RECOGNITION ARCHITECTURE
    Miao, Haoran
    Cheng, Gaofeng
    Gao, Changfeng
    Zhang, Pengyuan
    Yan, Yonghong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6084 - 6088
  • [6] TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION WITH LOCAL DENSE SYNTHESIZER ATTENTION
    Xu, Menglong
    Li, Shengqiang
    Zhang, Xiao-Lei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5899 - 5903
  • [7] SIMPLIFIED SELF-ATTENTION FOR TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION
    Luo, Haoneng
    Zhang, Shiliang
    Lei, Ming
    Xie, Lei
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 75 - 81
  • [8] Transformer-based end-to-end scene text recognition
    Zhu, Xinghao
    Zhang, Zhi
    PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1691 - 1695
  • [9] Transformer-Based End-to-End Anatomical and Functional Image Fusion
    Zhang, Jing
    Liu, Aiping
    Wang, Dan
    Liu, Yu
    Wang, Z. Jane
    Chen, Xun
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [10] Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition
    Lohrenz, Timo
    Li, Zhengyang
    Fingscheidt, Tim
    INTERSPEECH 2021, 2021, : 2846 - 2850