Semantic Enhancement Framework for Robust Speech Recognition

被引:0
作者
Yang, Baochen [1 ,2 ]
Yu, Kai [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, AI Inst, Dept Comp Sci & Engn, X LANCE Lab,MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
[2] State Key Lab Media Convergence Prod Technol & Sy, Shanghai, Peoples R China
来源
MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022 | 2023年 / 1765卷
关键词
Semantic enhancement; Speech recognition;
D O I
10.1007/978-981-99-2401-1_7
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Auto speech recognition (ASR) has been widely used in dialogue systems of various domains, performing as a crucial part of technology. Since the output of the ASR system will provide input to the subsequent system, the semantic intelligibility problem of the recognition results draws wide attention, yet remains unsolved. We propose a semantic enhancement framework to extract global semantic information from the audio to guide the recognition results. We evaluate our method on the Wall Street Journal (WSJ) dataset. The proposed framework gain relative 5.9% and 9.1% improvement of theWER on dev93 set and eval92 set compared to the baseline model.
引用
收藏
页码:81 / 88
页数:8
相关论文
共 17 条
  • [1] Bruguier A, 2019, INT CONF ACOUST SPEE, P6171, DOI 10.1109/ICASSP.2019.8682441
  • [2] CONTEXT-AWARE TRANSFORMER TRANSDUCER FOR SPEECH RECOGNITION
    Chang, Feng-Ju
    Liu, Jing
    Radfar, Martin
    Mouchtaris, Athanasios
    Omologo, Maurizio
    Rastrow, Ariya
    Kunzmann, Siegfried
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 503 - 510
  • [3] Alleviating ASR Long-Tailed Problem by Decoupling the Learning of Representation and Classification
    Deng, Keqi
    Cheng, Gaofeng
    Yang, Runyan
    Yan, Yonghong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 340 - 354
  • [4] DEEP SHALLOW FUSION FOR RNN-T PERSONALIZATION
    Duc Le
    Keren, Gil
    Chan, Julian
    Mahadeokar, Jay
    Fuegen, Christian
    Seltzer, Michael L.
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 251 - 257
  • [5] Grootendorst Maarten, 2021, Zenodo
  • [6] Gulcehre C, 2015, Arxiv, DOI arXiv:1503.03535
  • [7] SPELL MY NAME: KEYWORD BOOSTED SPEECH RECOGNITION
    Jung, Namkyu
    Kim, Geonmin
    Chung, Joon Son
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6642 - 6646
  • [8] Kannan A, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5824, DOI 10.1109/ICASSP.2018.8462682
  • [9] Liu DR, 2020, Arxiv, DOI arXiv:2005.07394
  • [10] Michaely AH, 2017, 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), P272, DOI 10.1109/ASRU.2017.8268946