Analysis of CNN-based Speech Recognition System using Raw Speech as Input

被引:0
作者
Palaz, Dimitri [1 ,2 ]
Magimai-Doss, Mathew [1 ]
Collobert, Ronan [1 ,3 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[3] Facebook AI Res, Menlo Pk, CA USA
来源
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年
关键词
automatic speech recognition; convolutional neural networks; raw signal; robust speech recognition;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech recognition systems typically model the relationship between the acoustic speech signal and the phones in two separate steps: feature extraction and classifier training. In our recent works, we have shown that, in the framework of convolutional neural networks (CNN), the relationship between the raw speech signal and the phones can be directly modeled and ASR systems competitive to standard approach can be built. In this paper, we first analyze and show that, between the first two convolutional layers, the CNN learns (in parts) and models the phone-specific spectral envelope information of 2-4 ms speech. Given that we show that the CNN-based approach yields ASR "trends similar to standard short-term spectral based ASR system under mismatched (noisy) conditions, with the CNN-based approach being more robust.
引用
收藏
页码:11 / 15
页数:5
相关论文
共 27 条
[1]  
Abdel-Hamid O, 2012, INT CONF ACOUST SPEE, P4277, DOI 10.1109/ICASSP.2012.6288864
[2]  
[Anonymous], ARXIV E PRINTS
[3]  
[Anonymous], 1990, Neurocomputing: Algorithms, architectures and applications
[4]  
[Anonymous], 2000, ASR2000 AUTOMATIC SP
[5]  
[Anonymous], TECH REP
[6]  
[Anonymous], 2013, P INT
[7]  
[Anonymous], BIGLEARN NIPS WORKSH
[8]  
[Anonymous], 1991, P NEURO NIMES
[9]  
[Anonymous], 2002, HTK BOOK
[10]  
Bocchieri E, 2013, INT CONF ACOUST SPEE, P6709, DOI 10.1109/ICASSP.2013.6638960