Implementation of Real-Time Speech Separation Model Using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN)

被引:3
|
作者
Wijayakusuma, Alfian [1 ]
Gozali, Davin Reinaldo [1 ]
Widjaja, Anthony [1 ]
Ham, Hanry [1 ]
机构
[1] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia
来源
5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020 | 2021年 / 179卷
关键词
Speech Separation; Time-Domain; Time-Domain Audio Separation Network; Dual-Path Recurrent Neural Network; Real-Time;
D O I
10.1016/j.procs.2021.01.065
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of this research is to develop a model that is able to perform real-time speaker independent multi-talker speech separation task in time-domain using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN). This research will conduct experiments on some RNN architectures, number of batch size, and optimizers as hyperparameters in order to implement TasNet and DPRNN. This research also try to analyze the impact of these hyperparameters setup on model performance. The expected result of this research is a more accurate model and lower latency to complete speaker independent multi-talker speech separation task in real-time than previous research model. (C) 2021 The Authors. Published by Elsevier B.V.
引用
收藏
页码:762 / 772
页数:11
相关论文
共 50 条
  • [21] Audio-Visual Fusion using Multiscale Temporal Convolutional Attention for Time-Domain Speech Separation
    Liu, Debang
    Zhang, Tianqi
    Christensen, Mads Graesboll
    Wei, Ying
    An, Zeliang
    INTERSPEECH 2023, 2023, : 3694 - 3698
  • [22] TCNN: TEMPORAL CONVOLUTIONAL NEURAL NETWORK FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN
    Pandey, Ashutosh
    Wang, DeLiang
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6875 - 6879
  • [23] TIME-DOMAIN AUDIO-VISUAL SPEECH SEPARATION ON LOW QUALITY VIDEOS
    Wu, Yifei
    Li, Chenda
    Bai, Jinfeng
    Wu, Zhongqin
    Qian, Yanmin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 256 - 260
  • [24] Real-time implementation of the cerebellum neural network
    Hao, Xinyu
    Wang, Jiang
    Yang, Shuangming
    Deng, Bin
    Wei, Xile
    Yi, Guosheng
    PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 3595 - 3599
  • [25] TPARN: TRIPLE-PATH ATTENTIVE RECURRENT NETWORK FOR TIME-DOMAIN MULTICHANNEL SPEECH ENHANCEMeENT
    Pandey, Ashutosh
    Xu, Buye
    Kumar, Anurag
    Donley, Jacob
    Calamia, Paul
    Wang, DeLiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6497 - 6501
  • [26] DENSELY CONNECTED NEURAL NETWORK WITH DILATED CONVOLUTIONS FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN
    Pandey, Ashutosh
    Wang, DeLiang
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6629 - 6633
  • [27] Vocal Harmony Separation using Time-domain Neural Networks
    Sarkar, Saurjya
    Benetos, Emmanouil
    Sandler, Mark
    INTERSPEECH 2021, 2021, : 3515 - 3519
  • [28] ONLINE DEEP ATTRACTOR NETWORK FOR REAL-TIME SINGLE-CHANNEL SPEECH SEPARATION
    Han, Cong
    Luo, Yi
    Mesgarani, Nima
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 361 - 365
  • [29] YOLOv3-DPFIN: A Dual-Path Feature Fusion Neural Network for Robust Real-Time Sonar Target Detection
    Kong, Wanzeng
    Hong, Jichen
    Jia, Mingyang
    Yao, Jinliang
    Gong, Weihua
    Hu, Hua
    Zhang, Haigang
    IEEE SENSORS JOURNAL, 2020, 20 (07) : 3745 - 3756
  • [30] Improved Speech Separation via Dual-Domain Joint Encoder in Time-Domain Networks
    Wang, Lan
    Zhang, Haitao
    Qiu, Youli
    Jiang, Yanji
    Dong, Hao
    Guo, Pengfei
    2024 INTERNATIONAL CONFERENCE ON ELECTRONIC ENGINEERING AND INFORMATION SYSTEMS, EEISS 2024, 2024, : 233 - 239