A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions

被引:0
|
作者
Yang, Zhao [1 ,3 ]
Ng, Dianwen [2 ,3 ]
Zhang, Chong
Jiang, Rui [1 ]
Xi, Wei [1 ]
Ma, Yukun [2 ]
Ni, Chongjia [2 ]
Zhao, Jizhong [1 ]
Ma, Bin [2 ]
Chng, Eng Siong [3 ]
机构
[1] Xi An Jiao Tong Univ, Fac Elect & Informat Engn, Xian, Peoples R China
[2] Alibaba Grp, Speech Lab DAMO Acad, Hangzhou, Peoples R China
[3] Nanyang Technol Univ, Singapore, Singapore
来源
INTERSPEECH 2023 | 2023年
基金
国家重点研发计划;
关键词
Speech Recognition; Error Correction; Unified Model; Interactive Training; Noisy and Accented Speech;
D O I
10.21437/Interspeech.2023-1300
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech recognition (ASR) and its post-processing, such as recognition error correction, are usually cascaded in a pipeline ignoring their strong interconnection. Inspired by the recent progress of leveraging text data to improve linguistic modeling, we propose a Unified ASR and error Correction framework (UAC), coupling speech recognition and error correction to capture richer semantic information for improving the performance of speech recognition. The proposed framework established interaction between speech and textual representations via explicitly fusing their uni-modal embeddings in a shared encoder. Additionally, the proposed framework is flexible to operate in either synchronous or asynchronous variant and could be equipped with modality and task tags enhancing its adaptation to heterogeneous inputs. Experimental results on accented and noisy speech datasets demonstrate that our method effectively produces improved word error rate when compared against the pipeline baselines.
引用
收藏
页码:4953 / 4957
页数:5
相关论文
共 50 条
  • [21] Recognition of Noisy Speech: A Comparative Survey of Robust Model Architecture and Feature Enhancement
    Björn Schuller
    Martin Wöllmer
    Tobias Moosmayr
    Gerhard Rigoll
    EURASIP Journal on Audio, Speech, and Music Processing, 2009
  • [22] An efficient joint training model for monaural noisy-reverberant speech recognition
    Lian, Xiaoyu
    Xia, Nan
    Dai, Gaole
    Yang, Hongqin
    APPLIED ACOUSTICS, 2025, 228
  • [23] END-TO-END MULTI-ACCENT SPEECH RECOGNITION WITH UNSUPERVISED ACCENT MODELLING
    Li, Song
    Ouyang, Beibei
    Liao, Dexin
    Xia, Shipeng
    Li, Lin
    Hong, Qingyang
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6418 - 6422
  • [24] Noisy Hidden Markov Models for Speech Recognition
    Audhkhasi, Kartik
    Osoba, Osonde
    Kosko, Bart
    2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [25] INDEPENDENT COMPONENT ANALYSIS FOR NOISY SPEECH RECOGNITION
    Hsieh, Hsin-Lung
    Chien, Jen-Tzung
    Shinoda, Koichi
    Furui, Sadaoki
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4369 - +
  • [26] Subspace Modeling and Selection for Noisy Speech Recognition
    Chien, Jen-Tzung
    Ting, Chuan-Wei
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 789 - 792
  • [27] Speech Recognition On Mobile Devices In Noisy Environments
    Yurtcan, Yaser
    Kilic, Banu Gunel
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [28] Recognition of noisy speech by a nonstationary AR HMM with gain adaptation under unknown noise
    Lee, KY
    Lee, J
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (07): : 741 - 746
  • [29] A SPELLING CORRECTION MODEL FOR END-TO-END SPEECH RECOGNITION
    Guo, Jinxi
    Sainath, Tara N.
    Weiss, Ron J.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5651 - 5655
  • [30] Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech
    Leem, Seong-Gyun
    Fulford, Daniel
    Onnela, Jukka-Pekka
    Gard, David
    Busso, Carlos
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 917 - 929