Factored deep convolutional neural networks for noise robust speech recognition

被引:5
作者
Fujimoto, Masakiyo [1 ]
机构
[1] Natl Inst Informat & Commun Technol, Tokyo, Japan
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
关键词
noise robust speech recognition; deep convolutional neural network; factored network; multi-channel input; OPTIMIZATION;
D O I
10.21437/Interspeech.2017-225
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a framework of a factored deep convolutional neural network (CNN) learning for noise robust automatic speech recognition (ASR). Deep CNN architecture. which has attracted great attention in various research areas. has also been successfully applied to ASR. However, to ensure noise robustness, since merely introducing deep CNN architecture into the acoustic modeling of ASR is insufficient, we introduce factored network architecture into deep CNN-based acoustic modeling. The proposed factored deep CNN framework factors out feature enhancement. delta parameter learning, and hidden Markov model state classification into three specific network blocks. By assigning specific roles to each block, the noise robustness of deep CNN-based acoustic models can be improved. With various comparative evaluations, we reveal that the proposed method successfully improves ASR accuracies in noise environments.
引用
收藏
页码:3837 / 3841
页数:5
相关论文
共 39 条
[1]   Convolutional Neural Networks for Speech Recognition [J].
Abdel-Hamid, Ossama ;
Mohamed, Abdel-Rahman ;
Jiang, Hui ;
Deng, Li ;
Penn, Gerald ;
Yu, Dong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545
[2]   Acoustic beamforming for speaker diarization of meetings [J].
Anguera, Xavier ;
Wooters, Chuck ;
Hernando, Javier .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07) :2011-2022
[3]  
[Anonymous], 2014, ARXIV13124400V3
[4]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[5]   Image Super-Resolution Using Deep Convolutional Networks [J].
Dong, Chao ;
Loy, Chen Change ;
He, Kaiming ;
Tang, Xiaoou .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (02) :295-307
[6]   SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].
EPHRAIM, Y ;
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121
[7]  
Fujimoto M, 2015, INT CONF ACOUST SPEE, P5019, DOI 10.1109/ICASSP.2015.7178926
[8]  
Fukuda T, 2017, INT CONF ACOUST SPEE, P5190, DOI 10.1109/ICASSP.2017.7953146
[9]  
Garofolo John S., CSR-I (WSJ0) Complete
[10]  
Glorot X., 2011, P 14 INT C ARTIFICIA, P315