REAL-TIME BINAURAL SPEECH SEPARATION WITH PRESERVED SPATIAL CUES

被引:0
|
作者
Han, Cong [1 ]
Luo, Yi [1 ]
Mesgarani, Nima [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
基金
美国国家科学基金会;
关键词
Binaural speech separation; interaural cues; deep learning; real-time;
D O I
10.1109/icassp40776.2020.9053215
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep learning speech separation algorithms have achieved great success in improving the quality and intelligibility of separated speech from mixed audio. Most previous methods focused on generating a single-channel output for each of the target speakers, hence discarding the spatial cues needed for the localization of sound sources in space. However, preserving the spatial information is important in many applications that aim to accurately render the acoustic scene such as in hearing aids and augmented reality (AR). Here, we propose a speech separation algorithm that preserves the interaural cues of separated sound sources and can be implemented with low latency and high fidelity, therefore enabling a real-time modification of the acoustic scene. Based on the time-domain audio separation network (TasNet), a single-channel time-domain speech separation system that can be implemented in real-time, we propose a multi-input-multi-output (MIMO) end-to-end extension of TasNet that takes binaural mixed audio as input and simultaneously separates target speakers in both channels. Experimental results show that the proposed end-to-end MIMO system is able to significantly improve the separation performance and keep the perceived location of the modified sources intact in various acoustic scenes.
引用
收藏
页码:6404 / 6408
页数:5
相关论文
共 50 条
  • [41] On Securing Real-Time Speech Transmission over the Internet: An Experimental Study
    Alessandro Aldini
    Marco Roccetti
    Roberto Gorrieri
    EURASIP Journal on Advances in Signal Processing, 2003
  • [42] Multimodal Representations for Synchronized Speech and Real-Time MRI Video Processing
    Kose, Oyku Deniz
    Saraclar, Murat
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1912 - 1924
  • [43] On securing real-time speech transmission over the Internet: An experimental study
    Aldini, A
    Roccetti, M
    Gorrieri, R
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2003, 2003 (10) : 1027 - 1042
  • [44] Real-time semantic segmentation with local spatial pixel adjustment
    Xiao, Cunjun
    Hao, Xingjun
    Li, Haibin
    Li, Yaqian
    Zhang, Wenming
    IMAGE AND VISION COMPUTING, 2022, 123
  • [45] Formalization and real-time implementation of spatial calibration for a projection system
    Sieglinger, B
    Flynn, DS
    Herald, WL
    Thompson, RA
    TECHNOLOGIES FOR SYNTHETIC ENVIRONMENTS: HARDWARE IN THE LOOP TESTING IX, 2004, 5408 : 240 - 248
  • [46] Interaural Coherence Induced Ideal Binary Mask for Binaural Speech Separation and Dereverberation
    Chen, Yi-Ting
    Chen, Tzu-Hao
    Huang, Mao-Chang
    Chi, Tai-Shih
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [47] Real-time motion estimation based video steganography with preserved consistency and local optimality
    Mohamed H.
    Elliethy A.
    Abdelaziz A.
    Aly H.
    Multimedia Tools and Applications, 2025, 84 (8) : 5001 - 5024
  • [48] TEA-PSE 2.0: SUB-BAND NETWORK FOR REAL-TIME PERSONALIZED SPEECH ENHANCEMENT
    Ju, Yukai
    Zhang, Shimin
    Rao, Wei
    Wang, Yannan
    Yu, Tao
    Xie, Lei
    Shang, Shidong
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 472 - 479
  • [49] SPATIAL-TEMPORAL MODELING USING DEEP LEARNING FOR REAL-TIME MONITORING OF ADDITIVE MANUFACTURING
    Ko, Hyunwoong
    Kim, Jaehyuk
    Lu, Yan
    Shin, Dongmin
    Yang, Zhuo
    Oh, Yosep
    PROCEEDINGS OF ASME 2022 INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, IDETC-CIE2022, VOL 2, 2022,
  • [50] Amperometric separation-free immunosensor for real-time environmental monitoring
    Killard, AJ
    Micheli, L
    Grennan, K
    Franek, M
    Kolar, V
    Moscone, D
    Palchetti, I
    Smyth, MR
    ANALYTICA CHIMICA ACTA, 2001, 427 (02) : 173 - 180