EXPLOITING SYNCHRONY SPECTRA AND DEEP NEURAL NETWORKS FOR NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION

被引:0
|
作者
Ma, Ning [1 ]
Marxer, Ricard [1 ]
Barker, Jon [1 ]
Brown, Guy J. [1 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England
关键词
Deep neural network; noise-robust automatic speech recognition; synchrony spectra; mask estimation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel system that exploits synchrony spectra and deep neural networks (DNNs) for automatic speech recognition (ASR) in challenging noisy environments. Synchrony spectra measure the extent to which each frequency channel in an auditory model is entrained to a particular pitch period, and they are used together with F0 estimates either in a DNN for time-frequency (T-M) mask estimation or to augment the input features for a DNN-based ASR system. The proposed approach was evaluated in the context of the CHiME 3 Challenge. Our experiments show that the synchrony spectra features work best when augmenting the input features to the DNN-based ASR system. Compared to the CHiME-3 baseline system, our best system provides a word error rate (WER) reduction of more than 14% absolute and achieved a WER of 18.56% on the evaluation test set.
引用
收藏
页码:490 / 495
页数:6
相关论文
共 50 条
  • [31] Noise-robust automatic speech recognition using a discriminative echo state network
    Skowronski, Mark D.
    Harris, John G.
    2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, : 1771 - 1774
  • [32] Coupling identification and reconstruction of missing features for noise-robust automatic speech recognition
    Ma, Ning
    Barker, Jon
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2637 - 2640
  • [33] Mapping Sparse Representation to State Likelihoods in Noise-Robust Automatic Speech Recognition
    Mahkonen, Katariina
    Hurmalainen, Antti
    Virtanen, Tuomas
    Gemmeke, Jort
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 472 - +
  • [34] EXTENDED VTS FOR NOISE-ROBUST SPEECH RECOGNITION
    van Dalen, R. C.
    Gales, M. J. F.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3829 - 3832
  • [35] Covariance Modelling for Noise-Robust Speech Recognition
    van Dalen, R. C.
    Gales, M. J. F.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2000 - 2003
  • [36] Frame decorrelation for noise-robust speech recognition
    Jung, HY
    Kim, DY
    Un, CK
    ELECTRONICS LETTERS, 1996, 32 (13) : 1163 - 1164
  • [37] Frame decorrelation for noise-robust speech recognition
    Korea Advanced Inst of Science and, Technology, Taejon, Korea, Republic of
    Electron Lett, 13 (1163-1164):
  • [38] Extended VTS for Noise-Robust Speech Recognition
    van Dalen, Rogier C.
    Gales, Mark J. F.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 733 - 743
  • [39] Orthogonalized distinctive phonetic feature extraction for noise-robust automatic speech recognition
    Fukuda, T
    Nitta, T
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1110 - 1118
  • [40] Knowledge Distillation-Based Training of Speech Enhancement for Noise-Robust Automatic Speech Recognition
    Woo Lee, Geon
    Kook Kim, Hong
    Kong, Duk-Jo
    IEEE ACCESS, 2024, 12 : 72707 - 72720