Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition

被引:67
作者
Li, Bo [1 ]
Sainath, Tara N. [1 ]
Weiss, Ron J. [1 ]
Wilson, Kevin W. [1 ]
Bacchiani, Michiel [1 ]
机构
[1] Google Inc, New York, NY 10011 USA
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
speech recognition; multichannel; beamforming; adaptive filtering;
D O I
10.21437/Interspeech.2016-173
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Joint multichannel enhancement and acoustic modeling using neural networks has shown promise over the past few years. However, one shortcoming of previous work [1, 2, 3] is that the filters learned during training are fixed for decoding, potentially limiting the ability of these models to adapt to previously unseen or changing conditions. In this paper we explore a neural network adaptive beamforming (NAB) technique to address this issue. Specifically, we use LSTM layers to predict time domain beamforming filter coefficients at each input frame. These filters are convolved with the framed time domain input signal and summed across channels, essentially performing FIR filter -and sum beamforming using the dynamically adapted filter. The beamformer output is passed into a waveform CLDNN acoustic model [4] which is trained jointly with the filter prediction LSTM layers. We find that the proposed NAB model achieves a 12.7% relative improvement in WER over a single channel model [4] and reaches similar performance to a "factored" model architecture which utilizes several fixed spatial filters [3] on a 2,000-hour Voice Search task, with a 17.9% decrease in computational cost.
引用
收藏
页码:1976 / 1980
页数:5
相关论文
共 31 条
[1]   IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].
ALLEN, JB ;
BERKLEY, DA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950
[2]  
[Anonymous], P ICML UNPUB
[3]  
[Anonymous], 2015, ARXIV150202367
[4]  
[Anonymous], P ICASSP IN PRESS
[5]  
[Anonymous], P ASRU
[6]  
[Anonymous], P ICASSP
[7]  
Barker J, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P504, DOI 10.1109/ASRU.2015.7404837
[8]  
Benesty J, 2008, SPRINGER TOP SIGN PR, V1, P1
[9]  
Bengio S, 2015, ADV NEUR IN, V28
[10]  
Brandstein M., 2013, Microphone arrays: Signal Processing Techniques and Applications