LEARNING-BASED PERSONAL SPEECH ENHANCEMENT FOR TELECONFERENCING BY EXPLOITING SPATIAL-SPECTRAL FEATURES

被引:6
|
作者
Hsu, Yicheng [1 ]
Lee, Yonghan [1 ]
Bai, Mingsian R. [1 ,2 ]
机构
[1] Natl Tsing Hua Univ, Dept Power Mech Engn, Hsinchu, Taiwan
[2] Natl Tsing Hua Univ, Dept Elect Engn, Hsinchu, Taiwan
关键词
spatial coherence analysis; target speech enhancement; speaker embedding; convolutional recurrent neural network; SPEAKER EXTRACTION; SEPARATION;
D O I
10.1109/ICASSP43922.2022.9746859
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Teleconferencing is becoming essential during the COVID-19 pandemic. However, in real-world applications, speech quality can deteriorate due to, for example, background interference, noise, or reverberation. To solve this problem, target speech extraction from the mixture signals can be performed with the aid of the user's vocal features. Various features are accounted for in this study's proposed system, including speaker embeddings derived from user enrollment and a novel long-short-term spatial coherence (LSTSC) feature pertaining to the target speaker activity. As a learning-based approach, a target speech sifting network was employed to extract the target signal. The network trained with LSTSC in the proposed approach is robust to microphone array geometries and the number of microphones. Furthermore, the proposed enhancement system was compared with a baseline system with speaker embeddings and interchannel phase difference. The results demonstrated the superior performance of the proposed system over the baseline in enhancement performance and robustness.
引用
收藏
页码:8787 / 8791
页数:5
相关论文
共 50 条
  • [1] Dominance Based Integration of Spatial and Spectral Features for Speech Enhancement
    Nakatani, Tomohiro
    Araki, Shoko
    Yoshioka, Takuya
    Delcroix, Marc
    Fujimoto, Masakiyo
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (12): : 2516 - 2531
  • [2] Learning Spatial-Spectral Features for Hyperspectral Image Classification
    Shu, Lei
    McIsaac, Kenneth
    Osinski, Gordon R.
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (09): : 5138 - 5147
  • [3] Deep Multiple Instance Learning-Based Spatial-Spectral Classification for PAN and MS Imagery
    Liu, Xu
    Jiao, Licheng
    Zhao, Jiaqi
    Zhao, Jin
    Zhang, Dan
    Liu, Fang
    Yang, Shuyuan
    Tang, Xu
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (01): : 461 - 473
  • [4] A HYPERSPECTRAL SPATIAL-SPECTRAL ENHANCEMENT ALGORITHM
    Yi, Chen
    Zhao, Yongqiang
    Yang, Jingxiang
    2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2016, : 7228 - 7231
  • [5] Learning Deep Spatial-Spectral Features for Material Segmentation in Hyperspectral Images
    Zhang, Yu
    King Ngi Ngan
    Cong Phuoc Huynh
    Habili, Narhnan
    2017 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING - TECHNIQUES AND APPLICATIONS (DICTA), 2017, : 172 - 178
  • [6] On the Robustness of Deep Learning-Based Speech Enhancement
    Chhetri, Amit S.
    Hilmes, Philip
    Athi, Mrudula
    Shankar, Nikhil
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1587 - 1594
  • [7] A new hyperspectral image classification method based on spatial-spectral features
    Qu Shenming
    Li Xiang
    Gan Zhihua
    Scientific Reports, 12
  • [8] A new hyperspectral image classification method based on spatial-spectral features
    Qu Shenming
    Li Xiang
    Gan Zhihua
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [9] JOINTLY SPATIAL-SPECTRAL RESOLUTION ENHANCEMENT OF HYPERSPECTRAL IMAGERY
    Zhao, Yongqiang
    Yi, Chen
    Yang, Jingxiang
    2015 7TH WORKSHOP ON HYPERSPECTRAL IMAGE AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING (WHISPERS), 2015,
  • [10] Target exaggeration for deep learning-based speech enhancement
    Kim, Hansol
    Shin, Jong Won
    DIGITAL SIGNAL PROCESSING, 2021, 116