Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

被引：9

作者：

Ahmadi, Sara ^{[1
,2
]}

Ahadi, Seyed Mohammad ^{[1
]}

Cranen, Bert ^{[2
]}

Boves, Lou ^{[2
]}

机构：

[1] Amirkabir Univ Technol, Tehran 158754413, Iran

[2] Radboud Univ Nijmegen, Ctr Language Studies, NL-6525 HT Nijmegen, Netherlands

来源：

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | 2014年

关键词：

Sparse coding/compressive sensing; Sparse classification; Modulation spectrum; Noise robust automatic speech recognition; INTELLIGIBILITY;

D O I：

10.1186/s13636-014-0036-3

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equivalent of a sequence of short-time power spectra, mainly to simplify the computation of the posterior probability that a frame of an unknown speech signal is related to a specific state. In this paper we use the raw output of a modulation spectrum analyser in combination with sparse coding as a means for obtaining state posterior probabilities. The modulation spectrum analyser uses 15 gammatone filters. The Hilbert envelope of the output of these filters is then processed by nine modulation frequency filters, with bandwidths up to 16 Hz. Experiments using the AURORA-2 task show that the novel approach is promising. We found that the representation of medium-term dynamics in the modulation spectrum analyser must be improved. We also found that we should move towards sparse classification, by modifying the cost function in sparse coding such that the class(es) represented by the exemplars weigh in, in addition to the accuracy with which unknown observations are reconstructed. This creates two challenges: (1) developing a method for dictionary learning that takes the class occupancy of exemplars into account and (2) developing a method for learning a mapping from exemplar activations to state posterior probabilities that keeps the generalization to unseen conditions that is one of the strongest advantages of sparse coding.

引用

页码：1 / 20

页数：20

共 50 条

[1] Sparse coding of the modulation spectrum for noise-robust automatic speech recognition
Sara Ahmadi
Seyed Mohammad Ahadi
Bert Cranen
Lou Boves
EURASIP Journal on Audio, Speech, and Music Processing, 2014
[2] An overview of noise-robust automatic speech recognition
Li, Jinyu
Deng, Li
Gong, Yifan
Haeb-Umbach, Reinhold
IEEE Transactions on Audio, Speech and Language Processing, 2014, 22 (04): : 745 - 777
[3] An Overview of Noise-Robust Automatic Speech Recognition
Li, Jinyu
Deng, Li
Gong, Yifan
Haeb-Umbach, Reinhold
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 745 - 777
[4] Mapping Sparse Representation to State Likelihoods in Noise-Robust Automatic Speech Recognition
Mahkonen, Katariina
Hurmalainen, Antti
Virtanen, Tuomas
Gemmeke, Jort
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 472 - +
[5] Direct control on modulation spectrum for noise-robust speech recognition and spectral subtraction
Wada, Naoya
Hayasaka, Noboru
Yoshizawa, Shingo
Miyanaga, Yoshikazu
2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS, 2006, : 2533 - +
[6] Factorial Speech Processing Models for Noise-Robust Automatic Speech Recognition
Khademian, Mahdi
Homayounpour, Mohammad Mehdi
2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 637 - 642
[7] INCORPORATING MASK MODELLING FOR NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION
Koekueer, Muenevver
Jancovic, Peter
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3929 - 3932
[8] Empirical Mode Decomposition For Noise-Robust Automatic Speech Recognition
Wu, Kuo-Hao
Chen, Chia-Ping
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2074 - 2077
[9] A companding front end for noise-robust automatic speech recognition
Guinness, J
Raj, B
Schmidt-Nielsen, B
Turicchia, L
Sarpeshkar, R
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 249 - 252
[10] Noise-robust speech recognition based on difference of power spectrum
Xu, JF
Wei, G
ELECTRONICS LETTERS, 2000, 36 (14) : 1247 - 1248

← 1 2 3 4 5 →