A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions

被引：0

作者：

Yang, Zhao ^{[1
,3
]}

Ng, Dianwen ^{[2
,3
]}

Zhang, Chong

Jiang, Rui ^{[1
]}

Xi, Wei ^{[1
]}

Ma, Yukun ^{[2
]}

Ni, Chongjia ^{[2
]}

Zhao, Jizhong ^{[1
]}

Ma, Bin ^{[2
]}

Chng, Eng Siong ^{[3
]}

机构：

[1] Xi An Jiao Tong Univ, Fac Elect & Informat Engn, Xian, Peoples R China

[2] Alibaba Grp, Speech Lab DAMO Acad, Hangzhou, Peoples R China

[3] Nanyang Technol Univ, Singapore, Singapore

来源：

INTERSPEECH 2023 | 2023年

基金：

国家重点研发计划;

关键词：

Speech Recognition; Error Correction; Unified Model; Interactive Training; Noisy and Accented Speech;

D O I：

10.21437/Interspeech.2023-1300

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Automatic speech recognition (ASR) and its post-processing, such as recognition error correction, are usually cascaded in a pipeline ignoring their strong interconnection. Inspired by the recent progress of leveraging text data to improve linguistic modeling, we propose a Unified ASR and error Correction framework (UAC), coupling speech recognition and error correction to capture richer semantic information for improving the performance of speech recognition. The proposed framework established interaction between speech and textual representations via explicitly fusing their uni-modal embeddings in a shared encoder. Additionally, the proposed framework is flexible to operate in either synchronous or asynchronous variant and could be equipped with modality and task tags enhancing its adaptation to heterogeneous inputs. Experimental results on accented and noisy speech datasets demonstrate that our method effectively produces improved word error rate when compared against the pipeline baselines.

引用

页码：4953 / 4957

页数：5

共 50 条

[1] Speech recognition for noisy conditions based on discrete wavelet transform and parallel model combination
Hu, CH
Liu, XF
ICEMI 2005: Conference Proceedings of the Seventh International Conference on Electronic Measurement & Instruments, Vol 1, 2005, : 408 - 411
[2] Speech enhancement strategy for speech recognition microcontroller under noisy environments
Chan, Kit Yan
Nordholm, Sven
Yiu, Ka Fai Cedric
Togneri, Roberto
NEUROCOMPUTING, 2013, 118 : 279 - 288
[3] SPEECH RECOGNITION WITH NO SPEECH OR WITH NOISY SPEECH
Krishna, Gautam
Co Tran
Yu, Jianguo
Tewfik, Ahmed H.
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1090 - 1094
[4] Robust Audio-Visual Speech Recognition Under Noisy Audio-Video Conditions
Stewart, Darryl
Seymour, Rowan
Pass, Adrian
Ming, Ji
IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (02) : 175 - 184
[5] Advancing Speech Recognition With No Speech Or With Noisy Speech
Krishna, Gautam
Tran, Co
Carnahan, Mason
Tewfik, Ahmed
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[6] Enhanced Multichannel Histogram Equalization for Speech Recognition in noisy acoustic conditions
Principi, Emanuele
Rotili, Rudy
Squartini, Stefano
NEURAL NETS WIRN11, 2011, 234 : 149 - 161
[7] A Study on Noisy Speech Recognition
Saeed, Khalid
Szczepanski, Adam
ICBAKE: 2009 INTERNATIONAL CONFERENCE ON BIOMETRICS AND KANSEI ENGINEERING, 2009, : 142 - 147
[8] ROBUST SPEECH RECOGNITION UNDER NOISY ENVIRONMENTS USING ASYMMETRIC TAPERS
Alam, Md Jahangir
Kenny, Patrick
O'Shaughnessy, Douglas
2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1638 - 1642
[9] Phoneme and tonal accent recognition for Thai speech
Theera-Umpon, Nipon
Chansareewittaya, Suppakarn
Auephanwiriyakul, Sansanee
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) : 13254 - 13259
[10] Decision Level Fusion for Audio-Visual Speech Recognition in Noisy Conditions
Sad, Gonzalo D.
Terissi, Lucas D.
Gomez, Juan C.
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2016, 2017, 10125 : 360 - 367

← 1 2 3 4 5 →