REAL-TIME BINAURAL SPEECH SEPARATION WITH PRESERVED SPATIAL CUES

被引:0
|
作者
Han, Cong [1 ]
Luo, Yi [1 ]
Mesgarani, Nima [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
基金
美国国家科学基金会;
关键词
Binaural speech separation; interaural cues; deep learning; real-time;
D O I
10.1109/icassp40776.2020.9053215
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep learning speech separation algorithms have achieved great success in improving the quality and intelligibility of separated speech from mixed audio. Most previous methods focused on generating a single-channel output for each of the target speakers, hence discarding the spatial cues needed for the localization of sound sources in space. However, preserving the spatial information is important in many applications that aim to accurately render the acoustic scene such as in hearing aids and augmented reality (AR). Here, we propose a speech separation algorithm that preserves the interaural cues of separated sound sources and can be implemented with low latency and high fidelity, therefore enabling a real-time modification of the acoustic scene. Based on the time-domain audio separation network (TasNet), a single-channel time-domain speech separation system that can be implemented in real-time, we propose a multi-input-multi-output (MIMO) end-to-end extension of TasNet that takes binaural mixed audio as input and simultaneously separates target speakers in both channels. Experimental results show that the proposed end-to-end MIMO system is able to significantly improve the separation performance and keep the perceived location of the modified sources intact in various acoustic scenes.
引用
收藏
页码:6404 / 6408
页数:5
相关论文
共 50 条
  • [21] Real-time Speech Enhancement with GCC-NMF
    Wood, Sean U. N.
    Rouat, Jean
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2665 - 2669
  • [22] Deep Learning Based Binaural Speech Separation in Reverberant Environments
    Zhang, Xueliang
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (05) : 1075 - 1084
  • [23] Binaural reverberant Speech separation based on deep neural networks
    Zhang, Xueliang
    Wang, DeLiang
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2018 - 2022
  • [24] SPEECH SEGMENT CLUSTERING FOR REAL-TIME EXEMPLAR-BASED SPEECH ENHANCEMENT
    Nesbitt, David
    Crookes, Danny
    Ming, Ji
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5419 - 5423
  • [25] Spatial-Semantic Fusion Network for Semantic Segmentation in Real-time
    Fang Yu
    Zhang Xuehe
    Zhang He
    Liu Gangfeng
    Li Changle
    Zhao Jie
    2019 IEEE/ASME INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS (AIM), 2019, : 30 - 35
  • [26] SPEECH RECOGNIZER OPTIMIZATION AND REAL-TIME IMPLEMENTATION ON A MULTITRANSPUTER ARRAY
    CARAZO, J
    ALEXANDRES, S
    MORAN, J
    MICROPROCESSING AND MICROPROGRAMMING, 1992, 34 (1-5): : 219 - 222
  • [27] A Low Computation Cost Model for Real-Time Speech Enhancement
    Wang, Qirui
    Zhou, Lin
    Cao, Yanxiang
    Zhuang, Chenghao
    Cheng, Yunling
    Deng, Yuxi
    2024 13TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS, ICCCAS 2024, 2024, : 267 - 271
  • [28] ONLINE REPET-SIM FOR REAL-TIME SPEECH ENHANCEMENT
    Rafii, Zafar
    Pardo, Bryan
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 848 - 852
  • [29] GeoTrend: Spatial Trending Queries on Real-time Microblogs
    Magdy, Amr
    Aly, Ahmed M.
    Mokbel, Mohamed F.
    Elnikety, Sameh
    He, Yuxiong
    Nath, Suman
    Aref, Walid G.
    24TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2016), 2016,
  • [30] REAL-TIME JOINT NOISE SUPPRESSION AND BANDWIDTH EXTENSION OF NOISY REVERBERANT WIDEBAND SPEECH
    Gomez, Esteban
    Backstrom, Tom
    2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 6 - 10