A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions

被引:0
|
作者
Yang, Zhao [1 ,3 ]
Ng, Dianwen [2 ,3 ]
Zhang, Chong
Jiang, Rui [1 ]
Xi, Wei [1 ]
Ma, Yukun [2 ]
Ni, Chongjia [2 ]
Zhao, Jizhong [1 ]
Ma, Bin [2 ]
Chng, Eng Siong [3 ]
机构
[1] Xi An Jiao Tong Univ, Fac Elect & Informat Engn, Xian, Peoples R China
[2] Alibaba Grp, Speech Lab DAMO Acad, Hangzhou, Peoples R China
[3] Nanyang Technol Univ, Singapore, Singapore
来源
INTERSPEECH 2023 | 2023年
基金
国家重点研发计划;
关键词
Speech Recognition; Error Correction; Unified Model; Interactive Training; Noisy and Accented Speech;
D O I
10.21437/Interspeech.2023-1300
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech recognition (ASR) and its post-processing, such as recognition error correction, are usually cascaded in a pipeline ignoring their strong interconnection. Inspired by the recent progress of leveraging text data to improve linguistic modeling, we propose a Unified ASR and error Correction framework (UAC), coupling speech recognition and error correction to capture richer semantic information for improving the performance of speech recognition. The proposed framework established interaction between speech and textual representations via explicitly fusing their uni-modal embeddings in a shared encoder. Additionally, the proposed framework is flexible to operate in either synchronous or asynchronous variant and could be equipped with modality and task tags enhancing its adaptation to heterogeneous inputs. Experimental results on accented and noisy speech datasets demonstrate that our method effectively produces improved word error rate when compared against the pipeline baselines.
引用
收藏
页码:4953 / 4957
页数:5
相关论文
共 50 条
  • [31] A unified language model architecture for web-based speech recognition grammars
    Holland, Wesley
    May, Daniel
    Baca, Julie
    Lazarou, Georgios
    Picone, Joseph
    2006 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2006, : 294 - +
  • [32] Auditory driven subband speech enhancement for automatic recognition of noisy speech
    Upadhyay N.
    Rosales H.G.
    International Journal of Speech Technology, 2016, 19 (4) : 869 - 880
  • [33] Hybrid model of hidden Markov models and wavelet neural network in noisy speech recognition
    Lin Sui-fang
    Pan Yong-xiang
    Sun Xu-xia
    Proceedings of 2005 Chinese Control and Decision Conference, Vols 1 and 2, 2005, : 675 - 678
  • [34] Model Compensation Approach Based on Nonuniform Spectral Compression Features for Noisy Speech Recognition
    Geng-Xin Ning
    Gang Wei
    Kam-Keung Chu
    EURASIP Journal on Advances in Signal Processing, 2007
  • [35] Accent neutralization for speech recognition of non-native speakers
    Radzikowski, Kacper
    Forc, Mateusz
    Wang, Le
    Yoshie, Osamu
    Nowak, Robert
    IIWAS2019: THE 21ST INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES, 2019, : 136 - 141
  • [36] Acoustic Modeling in Mandarin Speech Recognition of Minority Accent in Yunnan
    Wu Peishan
    Yang Jian
    PROCEEDINGS OF THE 27TH CHINESE CONTROL CONFERENCE, VOL 4, 2008, : 526 - 530
  • [37] Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech
    Ghorbani, Shahram
    Hansen, John H. L.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 762 - 774
  • [38] Model compensation approach based on nonuniform spectral compression features for noisy speech recognition
    Ning, Geng-Xin
    Wei, Gang
    Chu, Kam-Keung
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2007, 2007 (1)
  • [39] ON THE USE OF A FAMILY OF SIGNAL LIMITERS FOR RECOGNITION OF NOISY SPEECH
    LEE, CH
    LIN, CH
    SPEECH COMMUNICATION, 1993, 12 (04) : 383 - 392
  • [40] Speech recognition in noisy environments with Convolutional Neural Networks
    Santos, Rafael M.
    Matos, Leonardo N.
    Macedo, Hendrik T.
    Montalvao, Jugurta
    2015 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2015), 2015, : 175 - 179