A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions

被引:0
|
作者
Yang, Zhao [1 ,3 ]
Ng, Dianwen [2 ,3 ]
Zhang, Chong
Jiang, Rui [1 ]
Xi, Wei [1 ]
Ma, Yukun [2 ]
Ni, Chongjia [2 ]
Zhao, Jizhong [1 ]
Ma, Bin [2 ]
Chng, Eng Siong [3 ]
机构
[1] Xi An Jiao Tong Univ, Fac Elect & Informat Engn, Xian, Peoples R China
[2] Alibaba Grp, Speech Lab DAMO Acad, Hangzhou, Peoples R China
[3] Nanyang Technol Univ, Singapore, Singapore
来源
INTERSPEECH 2023 | 2023年
基金
国家重点研发计划;
关键词
Speech Recognition; Error Correction; Unified Model; Interactive Training; Noisy and Accented Speech;
D O I
10.21437/Interspeech.2023-1300
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech recognition (ASR) and its post-processing, such as recognition error correction, are usually cascaded in a pipeline ignoring their strong interconnection. Inspired by the recent progress of leveraging text data to improve linguistic modeling, we propose a Unified ASR and error Correction framework (UAC), coupling speech recognition and error correction to capture richer semantic information for improving the performance of speech recognition. The proposed framework established interaction between speech and textual representations via explicitly fusing their uni-modal embeddings in a shared encoder. Additionally, the proposed framework is flexible to operate in either synchronous or asynchronous variant and could be equipped with modality and task tags enhancing its adaptation to heterogeneous inputs. Experimental results on accented and noisy speech datasets demonstrate that our method effectively produces improved word error rate when compared against the pipeline baselines.
引用
收藏
页码:4953 / 4957
页数:5
相关论文
共 50 条
  • [1] Speech recognition for noisy conditions based on discrete wavelet transform and parallel model combination
    Hu, CH
    Liu, XF
    ICEMI 2005: Conference Proceedings of the Seventh International Conference on Electronic Measurement & Instruments, Vol 1, 2005, : 408 - 411
  • [2] Speech enhancement strategy for speech recognition microcontroller under noisy environments
    Chan, Kit Yan
    Nordholm, Sven
    Yiu, Ka Fai Cedric
    Togneri, Roberto
    NEUROCOMPUTING, 2013, 118 : 279 - 288
  • [3] SPEECH RECOGNITION WITH NO SPEECH OR WITH NOISY SPEECH
    Krishna, Gautam
    Co Tran
    Yu, Jianguo
    Tewfik, Ahmed H.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1090 - 1094
  • [4] Robust Audio-Visual Speech Recognition Under Noisy Audio-Video Conditions
    Stewart, Darryl
    Seymour, Rowan
    Pass, Adrian
    Ming, Ji
    IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (02) : 175 - 184
  • [5] Advancing Speech Recognition With No Speech Or With Noisy Speech
    Krishna, Gautam
    Tran, Co
    Carnahan, Mason
    Tewfik, Ahmed
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [6] Enhanced Multichannel Histogram Equalization for Speech Recognition in noisy acoustic conditions
    Principi, Emanuele
    Rotili, Rudy
    Squartini, Stefano
    NEURAL NETS WIRN11, 2011, 234 : 149 - 161
  • [7] A Study on Noisy Speech Recognition
    Saeed, Khalid
    Szczepanski, Adam
    ICBAKE: 2009 INTERNATIONAL CONFERENCE ON BIOMETRICS AND KANSEI ENGINEERING, 2009, : 142 - 147
  • [8] ROBUST SPEECH RECOGNITION UNDER NOISY ENVIRONMENTS USING ASYMMETRIC TAPERS
    Alam, Md Jahangir
    Kenny, Patrick
    O'Shaughnessy, Douglas
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1638 - 1642
  • [9] Phoneme and tonal accent recognition for Thai speech
    Theera-Umpon, Nipon
    Chansareewittaya, Suppakarn
    Auephanwiriyakul, Sansanee
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) : 13254 - 13259
  • [10] Decision Level Fusion for Audio-Visual Speech Recognition in Noisy Conditions
    Sad, Gonzalo D.
    Terissi, Lucas D.
    Gomez, Juan C.
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2016, 2017, 10125 : 360 - 367