Implementation of Real-Time Speech Separation Model Using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN)

被引：3

作者：

Wijayakusuma, Alfian ^{[1
]}

Gozali, Davin Reinaldo ^{[1
]}

Widjaja, Anthony ^{[1
]}

Ham, Hanry ^{[1
]}

机构：

[1] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia

来源：

5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020 | 2021年 / 179卷

关键词：

Speech Separation; Time-Domain; Time-Domain Audio Separation Network; Dual-Path Recurrent Neural Network; Real-Time;

D O I：

10.1016/j.procs.2021.01.065

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The purpose of this research is to develop a model that is able to perform real-time speaker independent multi-talker speech separation task in time-domain using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN). This research will conduct experiments on some RNN architectures, number of batch size, and optimizers as hyperparameters in order to implement TasNet and DPRNN. This research also try to analyze the impact of these hyperparameters setup on model performance. The expected result of this research is a more accurate model and lower latency to complete speaker independent multi-talker speech separation task in real-time than previous research model. (C) 2021 The Authors. Published by Elsevier B.V.

引用

页码：762 / 772

页数：11

共 28 条

[1]

Agnew J, 2000, J Am Acad Audiol, V11, P330

[2]

[Anonymous], 2019, VARIANCE ADAPTIVE LE

[3]

[Anonymous], 2018, REVISITING SMALL BAT

[4]

Awan A.A., 2017, P MLHPC 2017 MACH LE, DOI [DOI 10.1145/3146347.3146356, 10.1145/ 3146347.3146356]

[5]

Bengio Yoshua, 2012, Neural Networks: Tricks of the Trade. Second Edition: LNCS 7700, P437, DOI 10.1007/978-3-642-35289-8_26

[6]

Carilli M, 2019, AUTOMATIC MIXED PREC

[7]

Cho K., 2014, C EMP METH NAT LANG, P1724, DOI [10.3115/v1/D14-1179, DOI 10.3115/V1/D14-1179]

[8]

Chung J., 2014, ARXIV

[9]

DeLiang Wang, 2008, Trends Amplif, V12, P332, DOI 10.1177/1084713808326455

[10]

Google, 2020, SERV LEV OB

← 1 2 3 →