Implementation of Real-Time Speech Separation Model Using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN)

被引:3
|
作者
Wijayakusuma, Alfian [1 ]
Gozali, Davin Reinaldo [1 ]
Widjaja, Anthony [1 ]
Ham, Hanry [1 ]
机构
[1] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia
来源
5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020 | 2021年 / 179卷
关键词
Speech Separation; Time-Domain; Time-Domain Audio Separation Network; Dual-Path Recurrent Neural Network; Real-Time;
D O I
10.1016/j.procs.2021.01.065
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of this research is to develop a model that is able to perform real-time speaker independent multi-talker speech separation task in time-domain using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN). This research will conduct experiments on some RNN architectures, number of batch size, and optimizers as hyperparameters in order to implement TasNet and DPRNN. This research also try to analyze the impact of these hyperparameters setup on model performance. The expected result of this research is a more accurate model and lower latency to complete speaker independent multi-talker speech separation task in real-time than previous research model. (C) 2021 The Authors. Published by Elsevier B.V.
引用
收藏
页码:762 / 772
页数:11
相关论文
共 50 条
  • [31] SANDGLASSET: A LIGHT MULTI-GRANULARITY SELF-ATTENTIVE NETWORK FOR TIME-DOMAIN SPEECH SEPARATION
    Lam, Max W. Y.
    Wang, Jun
    Su, Dan
    Yu, Dong
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5759 - 5763
  • [32] EFFECTIVE LOW-COST TIME-DOMAIN AUDIO SEPARATION USING GLOBALLY ATTENTIVE LOCALLY RECURRENT NETWORKS
    Lam, Max W. Y.
    Wang, Jun
    Su, Dan
    Yu, Dong
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 801 - 808
  • [33] Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement
    Song, Zhendong
    Ma, Yupeng
    Tan, Fang
    Feng, Xiaoyi
    APPLIED SCIENCES-BASEL, 2022, 12 (07):
  • [34] Real-time event detection using recurrent neural network in social sensors
    Van Quan Nguyen
    Tien Nguyen Anh
    Yang, Hyung-Jeong
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2019, 15 (06):
  • [35] REAL-TIME LOW-LATENCY MUSIC SOURCE SEPARATION USING HYBRID SPECTROGRAM-TASNET
    Venkatesh, Satvik
    Benilov, Arthur
    Coleman, Philip
    Roskam, Frederic
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 611 - 615
  • [36] Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network
    Pandey, Ashutosh
    Xu, Buye
    Kumar, Anurag
    Donley, Jacob
    Calamia, Paul
    Wang, DeLiang
    INTERSPEECH 2022, 2022, : 729 - 733
  • [37] DE-DPCTnet: Deep Encoder Dual-path Convolutional Transformer Network for Multi-channel Speech Separation
    Wang, Zhenyu
    Zhou, Yi
    Gan, Lu
    Chen, Rilin
    Tang, Xinyu
    Liu, Hongqing
    2022 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2022, : 180 - 184
  • [38] Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation
    Chen, Jingjing
    Mao, Qirong
    Liu, Dong
    INTERSPEECH 2020, 2020, : 2642 - 2646
  • [39] A residual convolutional neural network based approach for real-time path planning
    Liu, Yang
    Zheng, Zheng
    Qin, Fangyun
    Zhang, Xiaoyi
    Yao, Haonan
    KNOWLEDGE-BASED SYSTEMS, 2022, 242
  • [40] Real-time Multi-channel Speech Enhancement Based on Neural Network Masking with Attention Model
    Xue, Cheng
    Huang, Weilong
    Chen, Weiguang
    Feng, Jinwei
    INTERSPEECH 2021, 2021, : 1862 - 1866