A Cross-channel Attention-based Wave-U-Net for Multi-channel Speech Enhancement

被引:7
作者
Ho, Minh Tri [1 ]
Lee, Jinyoung [1 ]
Lee, Bong-Ki [2 ]
Yi, Dong Hoon [2 ]
Kang, Hong-Goo [1 ]
机构
[1] Yonsei Univ, Dept Elect & Elect Engn, Seoul, South Korea
[2] LG Elect Co, Artificial Intelligence Lab, Seoul, South Korea
来源
INTERSPEECH 2020 | 2020年
关键词
Multi-channel Speech Enhancement; Wave-U-Net; Cross-Channel Attention;
D O I
10.21437/Interspeech.2020-2548
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we present a novel architecture for multi-channel speech enhancement using a cross-channel attention-based Wave-U-Net structure. Despite the advantages of utilizing spatial information as well as spectral information, it is challenging to effectively train a multi-channel deep learning system in an end-to-end framework. With a channel-independent encoding architecture for spectral estimation and a strategy to extract spatial information through an inter-channel attention mechanism, we implement a multi-channel speech enhancement system that has high performance even in reverberant and extremely noisy environments. Experimental results show that the proposed architecture has superior performance in terms of signal-to-distortion ratio improvement (SDRi), short-time objective intelligence (STOI), and phoneme error rate (PER) for speech recognition.
引用
收藏
页码:4049 / 4053
页数:5
相关论文
共 25 条
[1]  
Allen J. B., 1976, IMAGE METHOD EFFICIE
[2]  
[Anonymous], 2002, Tech. Rep
[3]  
Atmaja B. T., 2016, SPEECH ENHANCEMENT S
[4]  
Benesty J., 2005, Speech Enhancement
[5]  
Boeddeker C, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P6697, DOI 10.1109/ICASSP.2018.8461669
[6]  
Brandstein M., 2013, Microphone Arrays: Signal Processing Techniques and Applications
[7]  
Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621
[8]   Monoaural Audio Source Separation Using Deep Convolutional Neural Networks [J].
Chandna, Pritish ;
Miron, Marius ;
Janer, Jordi ;
Gomez, Emilia .
LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2017), 2017, 10169 :258-266
[9]  
Choi Hyeong-Seok, 2019, P ICLR
[10]   Improved MVDR beamforming using single-channel mask prediction networks [J].
Erdogan, Hakan ;
Hershey, John ;
Watanabe, Shinji ;
Mandel, Michael ;
Le Roux, Jonathan .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :1981-1985