Light-weight speech separation based on dual-path attention and recurrent neural network

被引：0

作者：

Yang Y. ^{[1
,2
]}

Hu Q. ^{[1
,2
]}

Zhang P. ^{[1
,2
]}

机构：

[1] Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing

[2] University of Chinese Academy of Sciences, Beijing

来源：

Shengxue Xuebao/Acta Acustica | 2023年 / 48卷 / 05期

关键词：

Deep neural network; Dual-path network; Light-weight model; Self-attention network; Speech separation;

D O I：

10.12395/0371-0025.2022044

中图分类号：

学科分类号：

摘要：

A light-weight speech separation algorithm based on dual-path attention and recurrent neural network is proposed. First, optional branch structures based on dual-path attention mechanism and dual-path recurrent network are utilized to model the speech signals, which facilitate the extraction of deep feature information and the reduction of training parameters. Second, sub-band processing approach is introduced to alleviate the computation burden. As shown by the experimental results on the LibriCSS dataset, the average word error rate obtained by the proposed algorithm is 8.6% with only 0.15 MiB training parameters and 15.2 G/6s computation cost, which is 3.3−391.3 and 1.1−3.2 times smaller than other mainstream approaches. This proves the proposed algorithm can effectively reduce the training parameters and computation cost while achieving high speech separation performance. © 2023 Science Press. All rights reserved.

引用

页码：1060 / 1069

页数：9

共 34 条

[1] pp. 696-706
[2] pp. 775-784
[3] 42, (2016)
[4] 35, (2010)
[5] Benesty J, Chen J, Huang Y., Study of the widely linear wiener filter for noise reduction, IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 205-208, (2010)
[6] pp. 475-484
[7] Cohen I, Benesty J, Gannot S., Speech processing in modern communication: Challenges and perspectives, (2009)
[8] Luo Y, Mesgarani N., TasNet: Time-domain audio separation network for real-time, single-channel speech separation, IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 696-700, (2018)
[9] Luo Y, Chen Z, Yoshioka T., Dual-path RNN: Efficient long sequence modeling for time-domain single-channel speech separation, IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 46-50, (2020)
[10] Chen J J, Mao Q, Liu D., Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation, Proc. Interspeech, pp. 2642-2646, (2020)

← 1 2 3 4 →