Implementation of Real-Time Speech Separation Model Using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN)

被引：3

作者：

Wijayakusuma, Alfian ^{[1
]}

Gozali, Davin Reinaldo ^{[1
]}

Widjaja, Anthony ^{[1
]}

Ham, Hanry ^{[1
]}

机构：

[1] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia

来源：

5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020 | 2021年 / 179卷

关键词：

Speech Separation; Time-Domain; Time-Domain Audio Separation Network; Dual-Path Recurrent Neural Network; Real-Time;

D O I：

10.1016/j.procs.2021.01.065

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The purpose of this research is to develop a model that is able to perform real-time speaker independent multi-talker speech separation task in time-domain using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN). This research will conduct experiments on some RNN architectures, number of batch size, and optimizers as hyperparameters in order to implement TasNet and DPRNN. This research also try to analyze the impact of these hyperparameters setup on model performance. The expected result of this research is a more accurate model and lower latency to complete speaker independent multi-talker speech separation task in real-time than previous research model. (C) 2021 The Authors. Published by Elsevier B.V.

引用

页码：762 / 772

页数：11

共 50 条

[31] SANDGLASSET: A LIGHT MULTI-GRANULARITY SELF-ATTENTIVE NETWORK FOR TIME-DOMAIN SPEECH SEPARATION
Lam, Max W. Y.
Wang, Jun
Su, Dan
Yu, Dong
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5759 - 5763
[32] EFFECTIVE LOW-COST TIME-DOMAIN AUDIO SEPARATION USING GLOBALLY ATTENTIVE LOCALLY RECURRENT NETWORKS
Lam, Max W. Y.
Wang, Jun
Su, Dan
Yu, Dong
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 801 - 808
[33] Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement
Song, Zhendong
Ma, Yupeng
Tan, Fang
Feng, Xiaoyi
APPLIED SCIENCES-BASEL, 2022, 12 (07):
[34] Real-time event detection using recurrent neural network in social sensors
Van Quan Nguyen
Tien Nguyen Anh
Yang, Hyung-Jeong
INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2019, 15 (06):
[35] REAL-TIME LOW-LATENCY MUSIC SOURCE SEPARATION USING HYBRID SPECTROGRAM-TASNET
Venkatesh, Satvik
Benilov, Arthur
Coleman, Philip
Roskam, Frederic
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 611 - 615
[36] Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network
Pandey, Ashutosh
Xu, Buye
Kumar, Anurag
Donley, Jacob
Calamia, Paul
Wang, DeLiang
INTERSPEECH 2022, 2022, : 729 - 733
[37] DE-DPCTnet: Deep Encoder Dual-path Convolutional Transformer Network for Multi-channel Speech Separation
Wang, Zhenyu
Zhou, Yi
Gan, Lu
Chen, Rilin
Tang, Xinyu
Liu, Hongqing
2022 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2022, : 180 - 184
[38] Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation
Chen, Jingjing
Mao, Qirong
Liu, Dong
INTERSPEECH 2020, 2020, : 2642 - 2646
[39] A residual convolutional neural network based approach for real-time path planning
Liu, Yang
Zheng, Zheng
Qin, Fangyun
Zhang, Xiaoyi
Yao, Haonan
KNOWLEDGE-BASED SYSTEMS, 2022, 242
[40] Real-time Multi-channel Speech Enhancement Based on Neural Network Masking with Attention Model
Xue, Cheng
Huang, Weilong
Chen, Weiguang
Feng, Jinwei
INTERSPEECH 2021, 2021, : 1862 - 1866

← 1 2 3 4 5 →