REAL-TIME BINAURAL SPEECH SEPARATION WITH PRESERVED SPATIAL CUES

被引:0
|
作者
Han, Cong [1 ]
Luo, Yi [1 ]
Mesgarani, Nima [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
基金
美国国家科学基金会;
关键词
Binaural speech separation; interaural cues; deep learning; real-time;
D O I
10.1109/icassp40776.2020.9053215
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep learning speech separation algorithms have achieved great success in improving the quality and intelligibility of separated speech from mixed audio. Most previous methods focused on generating a single-channel output for each of the target speakers, hence discarding the spatial cues needed for the localization of sound sources in space. However, preserving the spatial information is important in many applications that aim to accurately render the acoustic scene such as in hearing aids and augmented reality (AR). Here, we propose a speech separation algorithm that preserves the interaural cues of separated sound sources and can be implemented with low latency and high fidelity, therefore enabling a real-time modification of the acoustic scene. Based on the time-domain audio separation network (TasNet), a single-channel time-domain speech separation system that can be implemented in real-time, we propose a multi-input-multi-output (MIMO) end-to-end extension of TasNet that takes binaural mixed audio as input and simultaneously separates target speakers in both channels. Experimental results show that the proposed end-to-end MIMO system is able to significantly improve the separation performance and keep the perceived location of the modified sources intact in various acoustic scenes.
引用
收藏
页码:6404 / 6408
页数:5
相关论文
共 50 条
  • [31] Real-time spatial normalization for dynamic gesture classification
    Zeghoud, Sofiane
    Ali, Saba Ghazanfar
    Ertugrul, Egemen
    Kamel, Aouaidjia
    Sheng, Bin
    Li, Ping
    Chi, Xiaoyu
    Kim, Jinman
    Mao, Lijuan
    VISUAL COMPUTER, 2022, 38 (04) : 1345 - 1357
  • [32] Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target
    Dadvar, Paria
    Geravanchizadeh, Masoud
    SPEECH COMMUNICATION, 2019, 108 : 41 - 52
  • [33] Real-Time Codebook-based Speech Enhancement with GPUs
    Prasanna, A. N. Sai
    Gurumurthyt, Iver Chandrashekaran
    Naidu, D. H. R.
    Baruith, Pallav Kuniar
    2014 INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2014, : 306 - 311
  • [34] Real-time spatial normalization for dynamic gesture classification
    Sofiane Zeghoud
    Saba Ghazanfar Ali
    Egemen Ertugrul
    Aouaidjia Kamel
    Bin Sheng
    Ping Li
    Xiaoyu Chi
    Jinman Kim
    Lijuan Mao
    The Visual Computer, 2022, 38 : 1345 - 1357
  • [35] Implementation of Real-Time Speech Separation Model Using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN)
    Wijayakusuma, Alfian
    Gozali, Davin Reinaldo
    Widjaja, Anthony
    Ham, Hanry
    5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020, 2021, 179 : 762 - 772
  • [36] Enhancing the usability of real-time speech recognition captioning through personalised displays and real-time multiple speaker editing and annotation
    Wald, Mike
    Bain, Keith
    UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION: APPLICATIONS AND SERVICES, PT 3, PROCEEDINGS, 2007, : 446 - +
  • [37] Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network
    Luo, Yi
    Mesgarani, Nima
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 342 - 346
  • [38] A Real-Time Dual-Microphone Speech Enhancement Algorithm Assisted by Bone Conduction Sensor
    Zhou, Yi
    Chen, Yufan
    Ma, Yongbao
    Liu, Hongqing
    SENSORS, 2020, 20 (18) : 1 - 17
  • [39] Real-time semantic segmentation via mutual optimization of spatial details and semantic information
    Ma, Mengyuan
    Huang, Huiling
    Han, Jun
    Feng, Yanbing
    Yang, Yi
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (03) : 6821 - 6834
  • [40] Learning Continuous Facial Actions From Speech for Real-Time Animation
    Pham, Hai X.
    Wang, Yuting
    Pavlovic, Vladimir
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (03) : 1567 - 1580