Quaternion Neural Networks for Multi-channel Distant Speech Recognition

被引：9

作者：

Qiu, Xinchi ^{[1
]}

Parcollet, Titouan ^{[1
]}

Ravanelli, Mirco ^{[3
]}

Lane, Nicholas D. ^{[1
,2
]}

Morchid, Mohamed ^{[4
]}

机构：

[1] Univ Oxford, Oxford, England

[2] Samsung AI, Cambridge, England

[3] Univ Montreal, Mila, Montreal, PQ, Canada

[4] Avignon Univ, LIA, Avignon, France

来源：

INTERSPEECH 2020 | 2020年

关键词：

distant speech recognition; quaternion neural networks; multi-microphone speech recognition; ATTENTION;

D O I：

10.21437/Interspeech.2020-1682

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Despite the significant progress in automatic speech recognition (ASR), distant ASR remains challenging due to noise and reverberation. A common approach to mitigate this issue consists of equipping the recording devices with multiple microphones that capture the acoustic scene from different perspectives. These multi-channel audio recordings contain specific internal relations between each signal. In this paper, we propose to capture these inter- and intra- structural dependencies with quaternion neural networks, which can jointly process multiple signals as whole quaternion entities. The quaternion algebra replaces the standard dot product with the Hamilton one, thus offering a simple and elegant way to model dependencies between elements. The quaternion layers are then coupled with a recurrent neural network, which can learn long-term dependencies in the time domain. We show that a quaternion long-short term memory neural network (QLSTM), trained on the concatenated multi-channel speech signals, outperforms equivalent real-valued LSTM on two different tasks of multi-channel distant speech recognition.

引用

页码：329 / 333

页数：5

共 39 条

[1]

[Anonymous], 2008, HDB SIGNAL PROCESSIN

[2]

[Anonymous], 2010, JMLR WORKSH C P

[3]

Arena P., 1994, 1994 IEEE International Symposium on Circuits and Systems (Cat. No.94CH3435-5), P307, DOI 10.1109/ISCAS.1994.409587

[4] Multilayer perceptrons to approximate quaternion valued functions [J].

Arena, P ;

Fortuna, L ;

Muscato, G ;

Xibilia, MG .

NEURAL NETWORKS, 1997, 10 (02) :335-342

[5]

Benesty J, 2008, SPRINGER TOP SIGN PR, V1, P1

[6]

Bitzer J, 2001, DIGITAL SIGNAL PROC, P19

[7]

Brandstein M., 2013, Microphone arrays: signal processing techniques and applications

[8]

Braun S, 2018, INTERSPEECH, P17

[9]

Cariow A., 2020, IEEE T NEUR NET LEAR

[10]

Comminiello D, 2019, INT CONF ACOUST SPEE, P8533, DOI [10.1109/icassp.2019.8682711, 10.1109/ICASSP.2019.8682711]

← 1 2 3 4 →