RAT: RNN-Attention Transformer for Speech Enhancement

被引:0
作者
Zhang, Tailong [1 ]
He, Shulin [1 ]
Li, Hao [1 ]
Zhang, Xueliang [1 ]
机构
[1] Inner Mongolia Univ, Coll Comp Sci, Hohhot, Peoples R China
来源
2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2022年
关键词
Speech enhancement; Transformer; Self-Attention; NOISE;
D O I
10.1109/ISCSLP57327.2022.10037952
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Benefiting from the global modeling capabilities of self-attention mechanisms, Transformer-based models have seen increasing use in natural language processing tasks and automatic speech recognition. The ultra-long sight of Transformer overcomes catastrophic forgetting in Recurrent Neural Networks (RNNs). However, unlike natural language processing and speech recognition tasks that focus on global information, speech enhancement focuses more on local information. Therefore, the original Transformer is not optimally suited to speech enhancement. In this paper, we propose an improved Transformer model called RNN-Attention Transformer (RAT), which applies multi-head self-attention (MHSA) to the temporal dimension. The input sequence is chunked and different models are applied intra-chunk and inter-chunks. Since RNNs are better at modeling local information than self-attention, RNNs and self-attention are used to model intra-chunk information and inter-chunks information, respectively. Experiments show that RAT significantly reduces parameters and improves performance compared to the baseline.
引用
收藏
页码:463 / 467
页数:5
相关论文
共 25 条
[21]  
Tan K, 2018, INTERSPEECH, P3229
[22]  
Vaswani A, 2017, ADV NEUR IN, V30
[23]   Supervised Speech Separation Based on Deep Learning: An Overview [J].
Wang, DeLiang ;
Chen, Jitong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) :1702-1726
[24]  
Wang ZQ, 2020, IEEE-ACM T AUDIO SPE, V28, P1778, DOI [10.1109/TASLP.2020.2998279, 10.1109/taslp.2020.2998279]
[25]  
Zhang KH, 2021, Arxiv, DOI arXiv:2105.02436