Quaternion Neural Networks for Multi-channel Distant Speech Recognition

被引:9
作者
Qiu, Xinchi [1 ]
Parcollet, Titouan [1 ]
Ravanelli, Mirco [3 ]
Lane, Nicholas D. [1 ,2 ]
Morchid, Mohamed [4 ]
机构
[1] Univ Oxford, Oxford, England
[2] Samsung AI, Cambridge, England
[3] Univ Montreal, Mila, Montreal, PQ, Canada
[4] Avignon Univ, LIA, Avignon, France
来源
INTERSPEECH 2020 | 2020年
关键词
distant speech recognition; quaternion neural networks; multi-microphone speech recognition; ATTENTION;
D O I
10.21437/Interspeech.2020-1682
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Despite the significant progress in automatic speech recognition (ASR), distant ASR remains challenging due to noise and reverberation. A common approach to mitigate this issue consists of equipping the recording devices with multiple microphones that capture the acoustic scene from different perspectives. These multi-channel audio recordings contain specific internal relations between each signal. In this paper, we propose to capture these inter- and intra- structural dependencies with quaternion neural networks, which can jointly process multiple signals as whole quaternion entities. The quaternion algebra replaces the standard dot product with the Hamilton one, thus offering a simple and elegant way to model dependencies between elements. The quaternion layers are then coupled with a recurrent neural network, which can learn long-term dependencies in the time domain. We show that a quaternion long-short term memory neural network (QLSTM), trained on the concatenated multi-channel speech signals, outperforms equivalent real-valued LSTM on two different tasks of multi-channel distant speech recognition.
引用
收藏
页码:329 / 333
页数:5
相关论文
共 39 条
[1]  
[Anonymous], 2008, HDB SIGNAL PROCESSIN
[2]  
[Anonymous], 2010, JMLR WORKSH C P
[3]  
Arena P., 1994, 1994 IEEE International Symposium on Circuits and Systems (Cat. No.94CH3435-5), P307, DOI 10.1109/ISCAS.1994.409587
[4]   Multilayer perceptrons to approximate quaternion valued functions [J].
Arena, P ;
Fortuna, L ;
Muscato, G ;
Xibilia, MG .
NEURAL NETWORKS, 1997, 10 (02) :335-342
[5]  
Benesty J, 2008, SPRINGER TOP SIGN PR, V1, P1
[6]  
Bitzer J, 2001, DIGITAL SIGNAL PROC, P19
[7]  
Brandstein M., 2013, Microphone arrays: signal processing techniques and applications
[8]  
Braun S, 2018, INTERSPEECH, P17
[9]  
Cariow A., 2020, IEEE T NEUR NET LEAR
[10]  
Comminiello D, 2019, INT CONF ACOUST SPEE, P8533, DOI [10.1109/icassp.2019.8682711, 10.1109/ICASSP.2019.8682711]