A Cross-channel Attention-based Wave-U-Net for Multi-channel Speech Enhancement

被引：7

作者：

Ho, Minh Tri ^{[1
]}

Lee, Jinyoung ^{[1
]}

Lee, Bong-Ki ^{[2
]}

Yi, Dong Hoon ^{[2
]}

Kang, Hong-Goo ^{[1
]}

机构：

[1] Yonsei Univ, Dept Elect & Elect Engn, Seoul, South Korea

[2] LG Elect Co, Artificial Intelligence Lab, Seoul, South Korea

来源：

INTERSPEECH 2020 | 2020年

关键词：

Multi-channel Speech Enhancement; Wave-U-Net; Cross-Channel Attention;

D O I：

10.21437/Interspeech.2020-2548

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

In this paper, we present a novel architecture for multi-channel speech enhancement using a cross-channel attention-based Wave-U-Net structure. Despite the advantages of utilizing spatial information as well as spectral information, it is challenging to effectively train a multi-channel deep learning system in an end-to-end framework. With a channel-independent encoding architecture for spectral estimation and a strategy to extract spatial information through an inter-channel attention mechanism, we implement a multi-channel speech enhancement system that has high performance even in reverberant and extremely noisy environments. Experimental results show that the proposed architecture has superior performance in terms of signal-to-distortion ratio improvement (SDRi), short-time objective intelligence (STOI), and phoneme error rate (PER) for speech recognition.

引用

页码：4049 / 4053

页数：5

共 25 条

[1]

Allen J. B., 1976, IMAGE METHOD EFFICIE

[2]

[Anonymous], 2002, Tech. Rep

[3]

Atmaja B. T., 2016, SPEECH ENHANCEMENT S

[4]

Benesty J., 2005, Speech Enhancement

[5]

Boeddeker C, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P6697, DOI 10.1109/ICASSP.2018.8461669

[6]

Brandstein M., 2013, Microphone Arrays: Signal Processing Techniques and Applications

[7]

Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621

[8] Monoaural Audio Source Separation Using Deep Convolutional Neural Networks [J].

Chandna, Pritish ;

Miron, Marius ;

Janer, Jordi ;

Gomez, Emilia .

LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2017), 2017, 10169 :258-266

[9]

Choi Hyeong-Seok, 2019, P ICLR

[10] Improved MVDR beamforming using single-channel mask prediction networks [J].

Erdogan, Hakan ;

Hershey, John ;

Watanabe, Shinji ;

Mandel, Michael ;

Le Roux, Jonathan .

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :1981-1985

← 1 2 3 →