Speech recognition and intelligent translation under multimodal human-computer interaction system

被引:0
|
作者
Huang, Danhua [1 ]
Xiang, Shuaiqiu [2 ]
机构
[1] Zhejiang Yuexiu Univ, Sch English Studies, Shaoxing 312000, Peoples R China
[2] Shenzhen Inst Informat Technol, Sch Software, Shenzhen 518172, Peoples R China
关键词
multimodal human-computer interaction; speech recognition; intelligent translation; attention mechanism;
D O I
10.1515/jisys-2023-0192
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The traditional translation robot is limited to the translation of single-mode text images and text videos, which has the problem of low translation accuracy. Therefore, speech recognition and intelligent translation in multimodal human-computer interaction (HCI) system are proposed. First, the network structure of speech recognition model in multi-channel HCI system is established, and the multi-head self-attention mechanism is constructed. Then, the artificial intelligence voice wake-up function is designed, and a multimodal machine translation model is constructed. On this basis, selective attention is added to obtain visual recognition of perceived text, and the decoder is used for multimodal gating fusion to realize the output of encoder translation results. Experimental results show that this method has high BLUE value and high translation accuracy.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Multimodal human-computer interaction system for speech rehabilitation
    Zhao J.
    Wang L.
    Shi L.-J.
    Kuang Z.-J.
    Wang S.-B.
    Sun Y.-Y.
    Shi, Li-Juan (shilj@ccu.edu.cn), 1600, Editorial Board of Jilin University (50): : 1478 - 1486
  • [2] MULTI-PLATFORM INTELLIGENT SYSTEM FOR MULTIMODAL HUMAN-COMPUTER INTERACTION
    Jarosz, Mateusz
    Nawrocki, Piotr
    Sniezynski, Bartlomiej
    Indurkhya, Bipin
    COMPUTING AND INFORMATICS, 2021, 40 (01) : 83 - 103
  • [3] Automated Speech Recognition System in Advancement of Human-Computer Interaction
    Panda, Soumya Priyadarsini
    2017 INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC), 2017, : 302 - 306
  • [4] Application of Speech Recognition Technology Based on Multimodal Information in Human-Computer Interaction
    Zhang, Yuan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (09) : 101 - 111
  • [5] Speech and language processing for multimodal human-computer interaction
    Deng, L
    Wang, Y
    Wang, K
    Acero, A
    Hon, H
    Droppo, J
    Boulis, C
    Mahajan, M
    Huang, XD
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 161 - 187
  • [6] Speech and Language Processing for Multimodal Human-Computer Interaction
    L. Deng
    Y. Wang
    K. Wang
    A. Acero
    H. Hon
    J. Droppo
    C. Boulis
    M. Mahajan
    X.D. Huang
    Journal of VLSI signal processing systems for signal, image and video technology, 2004, 36 : 161 - 187
  • [7] Speech timing prediction in multimodal human-computer interaction
    Bourguet, ML
    Ando, A
    HUMAN-COMPUTER INTERACTION - INTERACT '97, 1997, : 453 - 460
  • [8] Multimodal Biometric Human Recognition for Perceptual Human-Computer Interaction
    Jiang, Richard M.
    Sadka, Abdul H.
    Crookes, Danny
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2010, 40 (06): : 676 - 681
  • [9] Design and implementation of human-computer interaction intelligent system based on speech control
    Liu J.
    Chang W.
    Li J.
    Wang J.
    Computer-Aided Design and Applications, 2020, 17 (Special Issue 2): : 22 - 34
  • [10] Multimodal human-computer interaction
    Turk, M
    REAL-TIME VISION FOR HUMAN-COMPUTER INTERACTION, 2005, : 269 - 283