A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions

被引：0

作者：

Yang, Zhao ^{[1
,3
]}

Ng, Dianwen ^{[2
,3
]}

Zhang, Chong

Jiang, Rui ^{[1
]}

Xi, Wei ^{[1
]}

Ma, Yukun ^{[2
]}

Ni, Chongjia ^{[2
]}

Zhao, Jizhong ^{[1
]}

Ma, Bin ^{[2
]}

Chng, Eng Siong ^{[3
]}

机构：

[1] Xi An Jiao Tong Univ, Fac Elect & Informat Engn, Xian, Peoples R China

[2] Alibaba Grp, Speech Lab DAMO Acad, Hangzhou, Peoples R China

[3] Nanyang Technol Univ, Singapore, Singapore

来源：

INTERSPEECH 2023 | 2023年

基金：

国家重点研发计划;

关键词：

Speech Recognition; Error Correction; Unified Model; Interactive Training; Noisy and Accented Speech;

D O I：

10.21437/Interspeech.2023-1300

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Automatic speech recognition (ASR) and its post-processing, such as recognition error correction, are usually cascaded in a pipeline ignoring their strong interconnection. Inspired by the recent progress of leveraging text data to improve linguistic modeling, we propose a Unified ASR and error Correction framework (UAC), coupling speech recognition and error correction to capture richer semantic information for improving the performance of speech recognition. The proposed framework established interaction between speech and textual representations via explicitly fusing their uni-modal embeddings in a shared encoder. Additionally, the proposed framework is flexible to operate in either synchronous or asynchronous variant and could be equipped with modality and task tags enhancing its adaptation to heterogeneous inputs. Experimental results on accented and noisy speech datasets demonstrate that our method effectively produces improved word error rate when compared against the pipeline baselines.

引用

页码：4953 / 4957

页数：5

共 50 条

[31] A unified language model architecture for web-based speech recognition grammars
Holland, Wesley
May, Daniel
Baca, Julie
Lazarou, Georgios
Picone, Joseph
2006 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2006, : 294 - +
[32] Auditory driven subband speech enhancement for automatic recognition of noisy speech
Upadhyay N.
Rosales H.G.
International Journal of Speech Technology, 2016, 19 (4) : 869 - 880
[33] Hybrid model of hidden Markov models and wavelet neural network in noisy speech recognition
Lin Sui-fang
Pan Yong-xiang
Sun Xu-xia
Proceedings of 2005 Chinese Control and Decision Conference, Vols 1 and 2, 2005, : 675 - 678
[34] Model Compensation Approach Based on Nonuniform Spectral Compression Features for Noisy Speech Recognition
Geng-Xin Ning
Gang Wei
Kam-Keung Chu
EURASIP Journal on Advances in Signal Processing, 2007
[35] Accent neutralization for speech recognition of non-native speakers
Radzikowski, Kacper
Forc, Mateusz
Wang, Le
Yoshie, Osamu
Nowak, Robert
IIWAS2019: THE 21ST INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES, 2019, : 136 - 141
[36] Acoustic Modeling in Mandarin Speech Recognition of Minority Accent in Yunnan
Wu Peishan
Yang Jian
PROCEEDINGS OF THE 27TH CHINESE CONTROL CONFERENCE, VOL 4, 2008, : 526 - 530
[37] Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech
Ghorbani, Shahram
Hansen, John H. L.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 762 - 774
[38] Model compensation approach based on nonuniform spectral compression features for noisy speech recognition
Ning, Geng-Xin
Wei, Gang
Chu, Kam-Keung
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2007, 2007 (1)
[39] ON THE USE OF A FAMILY OF SIGNAL LIMITERS FOR RECOGNITION OF NOISY SPEECH
LEE, CH
LIN, CH
SPEECH COMMUNICATION, 1993, 12 (04) : 383 - 392
[40] Speech recognition in noisy environments with Convolutional Neural Networks
Santos, Rafael M.
Matos, Leonardo N.
Macedo, Hendrik T.
Montalvao, Jugurta
2015 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2015), 2015, : 175 - 179

← 1 2 3 4 5 →