REAL-TIME BINAURAL SPEECH SEPARATION WITH PRESERVED SPATIAL CUES

被引：0

作者：

Han, Cong ^{[1
]}

Luo, Yi ^{[1
]}

Mesgarani, Nima ^{[1
]}

机构：

[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

基金：

美国国家科学基金会;

关键词：

Binaural speech separation; interaural cues; deep learning; real-time;

D O I：

10.1109/icassp40776.2020.9053215

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep learning speech separation algorithms have achieved great success in improving the quality and intelligibility of separated speech from mixed audio. Most previous methods focused on generating a single-channel output for each of the target speakers, hence discarding the spatial cues needed for the localization of sound sources in space. However, preserving the spatial information is important in many applications that aim to accurately render the acoustic scene such as in hearing aids and augmented reality (AR). Here, we propose a speech separation algorithm that preserves the interaural cues of separated sound sources and can be implemented with low latency and high fidelity, therefore enabling a real-time modification of the acoustic scene. Based on the time-domain audio separation network (TasNet), a single-channel time-domain speech separation system that can be implemented in real-time, we propose a multi-input-multi-output (MIMO) end-to-end extension of TasNet that takes binaural mixed audio as input and simultaneously separates target speakers in both channels. Experimental results show that the proposed end-to-end MIMO system is able to significantly improve the separation performance and keep the perceived location of the modified sources intact in various acoustic scenes.

引用

页码：6404 / 6408

页数：5

共 50 条

[21] Real-time Speech Enhancement with GCC-NMF
Wood, Sean U. N.
Rouat, Jean
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2665 - 2669
[22] Deep Learning Based Binaural Speech Separation in Reverberant Environments
Zhang, Xueliang
Wang, DeLiang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (05) : 1075 - 1084
[23] Binaural reverberant Speech separation based on deep neural networks
Zhang, Xueliang
Wang, DeLiang
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2018 - 2022
[24] SPEECH SEGMENT CLUSTERING FOR REAL-TIME EXEMPLAR-BASED SPEECH ENHANCEMENT
Nesbitt, David
Crookes, Danny
Ming, Ji
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5419 - 5423
[25] Spatial-Semantic Fusion Network for Semantic Segmentation in Real-time
Fang Yu
Zhang Xuehe
Zhang He
Liu Gangfeng
Li Changle
Zhao Jie
2019 IEEE/ASME INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS (AIM), 2019, : 30 - 35
[26] SPEECH RECOGNIZER OPTIMIZATION AND REAL-TIME IMPLEMENTATION ON A MULTITRANSPUTER ARRAY
CARAZO, J
ALEXANDRES, S
MORAN, J
MICROPROCESSING AND MICROPROGRAMMING, 1992, 34 (1-5): : 219 - 222
[27] A Low Computation Cost Model for Real-Time Speech Enhancement
Wang, Qirui
Zhou, Lin
Cao, Yanxiang
Zhuang, Chenghao
Cheng, Yunling
Deng, Yuxi
2024 13TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS, ICCCAS 2024, 2024, : 267 - 271
[28] ONLINE REPET-SIM FOR REAL-TIME SPEECH ENHANCEMENT
Rafii, Zafar
Pardo, Bryan
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 848 - 852
[29] GeoTrend: Spatial Trending Queries on Real-time Microblogs
Magdy, Amr
Aly, Ahmed M.
Mokbel, Mohamed F.
Elnikety, Sameh
He, Yuxiong
Nath, Suman
Aref, Walid G.
24TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2016), 2016,
[30] REAL-TIME JOINT NOISE SUPPRESSION AND BANDWIDTH EXTENSION OF NOISY REVERBERANT WIDEBAND SPEECH
Gomez, Esteban
Backstrom, Tom
2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 6 - 10

← 1 2 3 4 5 →