SPATIAL-TEMPORAL GRAPH CONVOLUTION NETWORK FOR MULTICHANNEL SPEECH ENHANCEMENT

被引:4
|
作者
Hao, Minghui [1 ]
Yu, Jingjing [1 ]
Zhang, Luyao [1 ]
机构
[1] Beijing Jiaotong Univ, Elect & Informat Engn, Beijing, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
Graph convolution network; spatial dependency extraction; spatial-temporal convolution module; SII-weighted loss function; speech enhancement;
D O I
10.1109/ICASSP43922.2022.9746054
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spatial dependency related to distributed microphone positions is essential for multichannel speech enhancement task. It is still challenging due to lack of accurate array positions and complex spatial-temporal relations of multichannel noisy signals This paper proposes a spatial-temporal graph convolutional network composed of cascaded spatial-temporal (ST) modules with channel fusion. Without any prior information of array and acoustic scene, a graph convolution block is designed with learnable adjacency matrix to capture the spatial dependency of pairwise channels. Then, it is embedded with time-frequency convolution block as the ST module to fuse the multi-dimensional correlation features for target speech estimation. Furthermore, a novel weighted loss function based on speech intelligibility index (SII) is proposed to assign more attention for the important bands of human understanding during network training. Our framework is demonstrated to achieve over 11% performance improvement on PESQ and intelligibility against prior state-of-the-art approaches in multi-scene speech enhancement experiments.
引用
收藏
页码:6512 / 6516
页数:5
相关论文
共 50 条
  • [41] Multiphase Flow Modeling Using Process Knowledge Integrating Temporal Graph Convolution Network
    Deng, Hongying
    Zhu, Jialiang
    Yang, Qinmin
    Liu, Yi
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
  • [42] Temporal-enhanced graph convolution network for skeleton-based action recognition
    Xie, Yulai
    Zhang, Yang
    Ren, Fang
    IET COMPUTER VISION, 2022, 16 (03) : 266 - 279
  • [43] DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement
    Lee D.
    Choi J.-W.
    IEEE Signal Processing Letters, 2023, 30 : 155 - 159
  • [44] DPCRN: Dual-Path Convolution Recurrent Network for Single Channel Speech Enhancement
    Le, Xiaohuai
    Chen, Hongsheng
    Chen, Kai
    Lu, Jing
    INTERSPEECH 2021, 2021, : 2811 - 2815
  • [45] Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement
    Song, Zhendong
    Ma, Yupeng
    Tan, Fang
    Feng, Xiaoyi
    APPLIED SCIENCES-BASEL, 2022, 12 (07):
  • [46] DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
    Hu, Yanxin
    Liu, Yun
    Lv, Shubo
    Xing, Mengtao
    Zhang, Shimin
    Fu, Yihui
    Wu, Jian
    Zhang, Bihong
    Xie, Lei
    INTERSPEECH 2020, 2020, : 2472 - 2476
  • [47] Three-Dimensional Point Cloud Semantic Segmentation Network Based on Spatial Graph Convolution Network
    Zhang Kun
    Zhu Yawei
    Wang Xiaohong
    Zhang Liting
    Zhong Ruofei
    LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (02)
  • [48] A graph convolution network based latency prediction algorithm for convolution neural network
    Li Z.
    Zhang R.
    Tan W.
    Ren Y.
    Lei M.
    Wu H.
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2022, 48 (12): : 2450 - 2459
  • [49] STGHTN: Spatial-temporal gated hybrid transformer network for traffic flow forecasting
    Liu, Jiansong
    Kang, Yan
    Li, Hao
    Wang, Haining
    Yang, Xuekun
    APPLIED INTELLIGENCE, 2023, 53 (10) : 12472 - 12488
  • [50] STGHTN: Spatial-temporal gated hybrid transformer network for traffic flow forecasting
    Jiansong Liu
    Yan Kang
    Hao Li
    Haining Wang
    Xuekun Yang
    Applied Intelligence, 2023, 53 : 12472 - 12488