New Generalized Sidelobe Canceller with Denoising Auto-Encoder for Improved Speech Enhancement

被引：1

作者：

Shin, Minkyu ^{[1
]}

Mun, Seongkyu ^{[2
]}

Han, David K. ^{[3
]}

Ko, Hanseok ^{[1
]}

机构：

[1] Korea Univ, Sch Elect Engn, Seoul 136713, South Korea

[2] Korea Univ, Dept Visual Informat Proc, Seoul 136713, South Korea

[3] Off Naval Res, Arlington, VA 22217 USA

来源：

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES | 2017年 / E100A卷 / 12期

关键词：

speech enhancement; denoising auto-encoder; acoustic beam-forming; generalized sidelobe canceller; SOURCE SEPARATION;

D O I：

10.1587/transfun.E100.A.3038

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, a multichannel speech enhancement system which adopts a denoising auto-encoder as part of the beamformer is proposed. The proposed structure of the generalized sidelobe canceller generates enhanced multi-channel signals, instead of merely one channel, to which the following denoising auto-encoder can be applied. Because the beamformer exploits spatial information and compensates for differences in the transfer functions of each channel, the proposed system is expected to resolve the difficulty of modelling relative transfer functions consisting of complex numbers which are hard to model with a denoising auto-encoder. As a result, the modelling capability of the denoising auto-encoder can concentrate on removing the artefacts caused by the beamformer. Unlike conventional beamformers, which combine these artefacts into one channel, they remain separated for each channel in the proposed method. As a result, the denoising auto-encoder can remove the artefacts by referring to other channels. Experimental results prove that the proposed structure is effective for the six-channel data in CHiME, as indicated by improvements in terms of speech enhancement and word error rate in automatic speech recognition.

引用

页码：3038 / 3040

页数：3

共 8 条

[1]

Barker J, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P504, DOI 10.1109/ASRU.2015.7404837

[2] Speech Enhancement With a GSC-Like Structure Employing Eigenvector-Based Transfer Function Ratios Estimation [J].

Krueger, Alexander ;

Warsitz, Ernst ;

Haeb-Umbach, Reinhold .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (01) :206-219

[3] Model-Based Expectation-Maximization Source Separation and Localization [J].

Mandel, Michael I. ;

Weiss, Ron J. ;

Ellis, Daniel P. W. .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (02) :382-394

[4]

Panayotov V, 2015, INT CONF ACOUST SPEE, P5206, DOI 10.1109/ICASSP.2015.7178964

[5]

Peddinti V, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3214

[6]

Swietojanski P, 2013, 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P285, DOI 10.1109/ASRU.2013.6707744

[7] An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech [J].

Taal, Cees H. ;

Hendriks, Richard C. ;

Heusdens, Richard ;

Jensen, Jesper .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (07) :2125-2136

[8] Performance measurement in blind audio source separation [J].

Vincent, Emmanuel ;

Gribonval, Remi ;

Févotte, Cedric .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04) :1462-1469

← 1 →