Light-weight speech separation based on dual-path attention and recurrent neural network

被引:0
|
作者
Yang Y. [1 ,2 ]
Hu Q. [1 ,2 ]
Zhang P. [1 ,2 ]
机构
[1] Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing
[2] University of Chinese Academy of Sciences, Beijing
来源
Shengxue Xuebao/Acta Acustica | 2023年 / 48卷 / 05期
关键词
Deep neural network; Dual-path network; Light-weight model; Self-attention network; Speech separation;
D O I
10.12395/0371-0025.2022044
中图分类号
学科分类号
摘要
A light-weight speech separation algorithm based on dual-path attention and recurrent neural network is proposed. First, optional branch structures based on dual-path attention mechanism and dual-path recurrent network are utilized to model the speech signals, which facilitate the extraction of deep feature information and the reduction of training parameters. Second, sub-band processing approach is introduced to alleviate the computation burden. As shown by the experimental results on the LibriCSS dataset, the average word error rate obtained by the proposed algorithm is 8.6% with only 0.15 MiB training parameters and 15.2 G/6s computation cost, which is 3.3−391.3 and 1.1−3.2 times smaller than other mainstream approaches. This proves the proposed algorithm can effectively reduce the training parameters and computation cost while achieving high speech separation performance. © 2023 Science Press. All rights reserved.
引用
收藏
页码:1060 / 1069
页数:9
相关论文
共 34 条
  • [1] pp. 696-706
  • [2] pp. 775-784
  • [3] 42, (2016)
  • [4] 35, (2010)
  • [5] Benesty J, Chen J, Huang Y., Study of the widely linear wiener filter for noise reduction, IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 205-208, (2010)
  • [6] pp. 475-484
  • [7] Cohen I, Benesty J, Gannot S., Speech processing in modern communication: Challenges and perspectives, (2009)
  • [8] Luo Y, Mesgarani N., TasNet: Time-domain audio separation network for real-time, single-channel speech separation, IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 696-700, (2018)
  • [9] Luo Y, Chen Z, Yoshioka T., Dual-path RNN: Efficient long sequence modeling for time-domain single-channel speech separation, IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 46-50, (2020)
  • [10] Chen J J, Mao Q, Liu D., Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation, Proc. Interspeech, pp. 2642-2646, (2020)