INTEGRATING DNN-BASED AND SPATIAL CLUSTERING-BASED MASK ESTIMATION FOR ROBUST MVDR BEAMFORMING

被引：0

作者：

Nakatani, Tomohiro ^{[1
]}

To, Nobutaka ^{[1
]}

Higuchi, Takuya ^{[1
]}

Araki, Shoko ^{[1
]}

Kinoshita, Keisuke ^{[1
]}

机构：

[1] NTT Corp, NTT Commun Sci Labs, 2-4,Hikaridai, Kyoto 6190237, Japan

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年

关键词：

Beamforming; automatic speech recognition; time-frequency mask; deep neural network; spatial clustering; SEPARATION; CLASSIFICATION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recently, time-frequency mask-based beamforming has been extensively studied as the frontend of deep neural network (DNN) based automatic speech recognition (ASR) in noisy environments. Two mask estimation approaches have been separately developed for this beamforming method, namely the the DNN-based approach, which exploits the time-frequency features of the signal, and the spatial c1ustering-based approach, which exploits the spatial features ofthe signal. This paper proposes a new method that integrates the two approaches in a probabilistic way to further improve mask estimati on by exploiting the advantages of both approaches. Experiments using the real data ofthe CHiME-3 multichannel noisy speech corpus show that the proposed method almost always outperforms the conventional approaches in terms ofword error rate (WER) improvement.

引用

页码：286 / 290

页数：5

共 29 条

[1] Acoustic beamforming for speaker diarization of meetings [J].

Anguera, Xavier ;

Wooters, Chuck ;

Hernando, Javier .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07) :2011-2022

[2]

Araki S, 2007, INT CONF ACOUST SPEE, P41

[3] Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors [J].

Araki, Shoko ;

Sawada, Hiroshi ;

Mukai, Ryo ;

Makino, Shoji .

SIGNAL PROCESSING, 2007, 87 (08) :1833-1847

[4]

Barker J., 2015, P IEEE ASRU 2015

[5] Strategies for distant speech recognition in reverberant environments [J].

Delcroix, Marc ;

Yoshioka, Takuya ;

Ogawa, Atsunori ;

Kubo, Yotaro ;

Fujimoto, Masakiyo ;

Ito, Nobutaka ;

Kinoshita, Keisuke ;

Espi, Miquel ;

Araki, Shoko ;

Hori, Takaaki ;

Nakatani, Tomohiro .

EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,

[6]

Erdogan H., 2016, P INT 2016

[7]

Heymann J, 2016, INT CONF ACOUST SPEE, P196, DOI 10.1109/ICASSP.2016.7471664

[8]

Higuchi T, 2016, INT CONF ACOUST SPEE, P5210, DOI 10.1109/ICASSP.2016.7472671

[9]

Ito N., 2016, P EUSIPCO 2016

[10] Binaural Classification for Reverberant Speech Segregation Using Deep Neural Networks [J].

Jiang, Yi ;

Wang, DeLiang ;

Liu, RunSheng ;

Feng, ZhenMing .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) :2112-2121

← 1 2 3 →