Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

被引:9
|
作者
Ahmadi, Sara [1 ,2 ]
Ahadi, Seyed Mohammad [1 ]
Cranen, Bert [2 ]
Boves, Lou [2 ]
机构
[1] Amirkabir Univ Technol, Tehran 158754413, Iran
[2] Radboud Univ Nijmegen, Ctr Language Studies, NL-6525 HT Nijmegen, Netherlands
来源
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | 2014年
关键词
Sparse coding/compressive sensing; Sparse classification; Modulation spectrum; Noise robust automatic speech recognition; INTELLIGIBILITY;
D O I
10.1186/s13636-014-0036-3
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equivalent of a sequence of short-time power spectra, mainly to simplify the computation of the posterior probability that a frame of an unknown speech signal is related to a specific state. In this paper we use the raw output of a modulation spectrum analyser in combination with sparse coding as a means for obtaining state posterior probabilities. The modulation spectrum analyser uses 15 gammatone filters. The Hilbert envelope of the output of these filters is then processed by nine modulation frequency filters, with bandwidths up to 16 Hz. Experiments using the AURORA-2 task show that the novel approach is promising. We found that the representation of medium-term dynamics in the modulation spectrum analyser must be improved. We also found that we should move towards sparse classification, by modifying the cost function in sparse coding such that the class(es) represented by the exemplars weigh in, in addition to the accuracy with which unknown observations are reconstructed. This creates two challenges: (1) developing a method for dictionary learning that takes the class occupancy of exemplars into account and (2) developing a method for learning a mapping from exemplar activations to state posterior probabilities that keeps the generalization to unseen conditions that is one of the strongest advantages of sparse coding.
引用
收藏
页码:1 / 20
页数:20
相关论文
共 50 条
  • [1] Sparse coding of the modulation spectrum for noise-robust automatic speech recognition
    Sara Ahmadi
    Seyed Mohammad Ahadi
    Bert Cranen
    Lou Boves
    EURASIP Journal on Audio, Speech, and Music Processing, 2014
  • [2] An overview of noise-robust automatic speech recognition
    Li, Jinyu
    Deng, Li
    Gong, Yifan
    Haeb-Umbach, Reinhold
    IEEE Transactions on Audio, Speech and Language Processing, 2014, 22 (04): : 745 - 777
  • [3] An Overview of Noise-Robust Automatic Speech Recognition
    Li, Jinyu
    Deng, Li
    Gong, Yifan
    Haeb-Umbach, Reinhold
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 745 - 777
  • [4] Mapping Sparse Representation to State Likelihoods in Noise-Robust Automatic Speech Recognition
    Mahkonen, Katariina
    Hurmalainen, Antti
    Virtanen, Tuomas
    Gemmeke, Jort
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 472 - +
  • [5] Direct control on modulation spectrum for noise-robust speech recognition and spectral subtraction
    Wada, Naoya
    Hayasaka, Noboru
    Yoshizawa, Shingo
    Miyanaga, Yoshikazu
    2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS, 2006, : 2533 - +
  • [6] Factorial Speech Processing Models for Noise-Robust Automatic Speech Recognition
    Khademian, Mahdi
    Homayounpour, Mohammad Mehdi
    2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 637 - 642
  • [7] INCORPORATING MASK MODELLING FOR NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION
    Koekueer, Muenevver
    Jancovic, Peter
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3929 - 3932
  • [8] Empirical Mode Decomposition For Noise-Robust Automatic Speech Recognition
    Wu, Kuo-Hao
    Chen, Chia-Ping
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2074 - 2077
  • [9] A companding front end for noise-robust automatic speech recognition
    Guinness, J
    Raj, B
    Schmidt-Nielsen, B
    Turicchia, L
    Sarpeshkar, R
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 249 - 252
  • [10] Noise-robust speech recognition based on difference of power spectrum
    Xu, JF
    Wei, G
    ELECTRONICS LETTERS, 2000, 36 (14) : 1247 - 1248