A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions

被引:0
|
作者
Yang, Zhao [1 ,3 ]
Ng, Dianwen [2 ,3 ]
Zhang, Chong
Jiang, Rui [1 ]
Xi, Wei [1 ]
Ma, Yukun [2 ]
Ni, Chongjia [2 ]
Zhao, Jizhong [1 ]
Ma, Bin [2 ]
Chng, Eng Siong [3 ]
机构
[1] Xi An Jiao Tong Univ, Fac Elect & Informat Engn, Xian, Peoples R China
[2] Alibaba Grp, Speech Lab DAMO Acad, Hangzhou, Peoples R China
[3] Nanyang Technol Univ, Singapore, Singapore
来源
INTERSPEECH 2023 | 2023年
基金
国家重点研发计划;
关键词
Speech Recognition; Error Correction; Unified Model; Interactive Training; Noisy and Accented Speech;
D O I
10.21437/Interspeech.2023-1300
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech recognition (ASR) and its post-processing, such as recognition error correction, are usually cascaded in a pipeline ignoring their strong interconnection. Inspired by the recent progress of leveraging text data to improve linguistic modeling, we propose a Unified ASR and error Correction framework (UAC), coupling speech recognition and error correction to capture richer semantic information for improving the performance of speech recognition. The proposed framework established interaction between speech and textual representations via explicitly fusing their uni-modal embeddings in a shared encoder. Additionally, the proposed framework is flexible to operate in either synchronous or asynchronous variant and could be equipped with modality and task tags enhancing its adaptation to heterogeneous inputs. Experimental results on accented and noisy speech datasets demonstrate that our method effectively produces improved word error rate when compared against the pipeline baselines.
引用
收藏
页码:4953 / 4957
页数:5
相关论文
共 50 条
  • [41] Multivariate Autoregressive Spectrogram Modeling for Noisy Speech Recognition
    Ganapathy, Sriram
    IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (09) : 1373 - 1377
  • [42] Noisy training for deep neural networks in speech recognition
    Shi Yin
    Chao Liu
    Zhiyong Zhang
    Yiye Lin
    Dong Wang
    Javier Tejedor
    Thomas Fang Zheng
    Yinguo Li
    EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [43] Noisy training for deep neural networks in speech recognition
    Yin, Shi
    Liu, Chao
    Zhang, Zhiyong
    Lin, Yiye
    Wang, Dong
    Tejedor, Javier
    Zheng, Thomas Fang
    Li, Yinguo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 14
  • [44] Improved Noisy Student Training for Automatic Speech Recognition
    Park, Daniel S.
    Zhang, Yu
    Jia, Ye
    Han, Wei
    Chiu, Chung-Cheng
    Li, Bo
    Wu, Yonghui
    Le, Quoc, V
    INTERSPEECH 2020, 2020, : 2817 - 2821
  • [45] SPEECH RECOGNITION ERROR CORRECTION BY USING COMBINATIONAL MEASURES
    Li, Baoxiang
    Chang, Fengxiang
    Guo, Jun
    Liu, Gang
    PROCEEDINGS OF THE 3RD IEEE INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC 2012), 2012, : 375 - 379
  • [46] Performance Improvement in Multiple-Model Speech Recognizer under Noisy Environments
    Yoon, Jang-Hyuk
    Chung, Yong-Joo
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2010, 6218 : 444 - 452
  • [47] Modulation frequency features for phoneme recognition in noisy speech
    Ganapathy, Sriram
    Thomas, Samuel
    Hermansky, Hynek
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 125 (01) : EL8 - EL12
  • [48] SPEECH RECOGNITION IN NOISY ENVIRONMENTS WITH THE AID OF MICROPHONE ARRAYS
    VANCOMPERNOLLE, D
    MA, W
    XIE, F
    VANDIEST, M
    SPEECH COMMUNICATION, 1990, 9 (5-6) : 433 - 442
  • [49] Universal and accent-discriminative encoders for conformer-based accent-invariant speech recognition
    Wang X.
    Long Y.
    Xu D.
    International Journal of Speech Technology, 2022, 25 (4) : 987 - 995
  • [50] CORRECTION OF AUTOMATIC SPEECH RECOGNITION WITH TRANSFORMER SEQUENCE-TO-SEQUENCE MODEL
    Hrinchuk, Oleksii
    Popova, Mariya
    Ginsburg, Boris
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7074 - 7078