SPATIAL-TEMPORAL GRAPH CONVOLUTION NETWORK FOR MULTICHANNEL SPEECH ENHANCEMENT

被引：4

作者：

Hao, Minghui ^{[1
]}

Yu, Jingjing ^{[1
]}

Zhang, Luyao ^{[1
]}

机构：

[1] Beijing Jiaotong Univ, Elect & Informat Engn, Beijing, Peoples R China

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

Graph convolution network; spatial dependency extraction; spatial-temporal convolution module; SII-weighted loss function; speech enhancement;

D O I：

10.1109/ICASSP43922.2022.9746054

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Spatial dependency related to distributed microphone positions is essential for multichannel speech enhancement task. It is still challenging due to lack of accurate array positions and complex spatial-temporal relations of multichannel noisy signals This paper proposes a spatial-temporal graph convolutional network composed of cascaded spatial-temporal (ST) modules with channel fusion. Without any prior information of array and acoustic scene, a graph convolution block is designed with learnable adjacency matrix to capture the spatial dependency of pairwise channels. Then, it is embedded with time-frequency convolution block as the ST module to fuse the multi-dimensional correlation features for target speech estimation. Furthermore, a novel weighted loss function based on speech intelligibility index (SII) is proposed to assign more attention for the important bands of human understanding during network training. Our framework is demonstrated to achieve over 11% performance improvement on PESQ and intelligibility against prior state-of-the-art approaches in multi-scene speech enhancement experiments.

引用

页码：6512 / 6516

页数：5

共 50 条

[41] Multiphase Flow Modeling Using Process Knowledge Integrating Temporal Graph Convolution Network
Deng, Hongying
Zhu, Jialiang
Yang, Qinmin
Liu, Yi
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
[42] Temporal-enhanced graph convolution network for skeleton-based action recognition
Xie, Yulai
Zhang, Yang
Ren, Fang
IET COMPUTER VISION, 2022, 16 (03) : 266 - 279
[43] DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement
Lee D.
Choi J.-W.
IEEE Signal Processing Letters, 2023, 30 : 155 - 159
[44] DPCRN: Dual-Path Convolution Recurrent Network for Single Channel Speech Enhancement
Le, Xiaohuai
Chen, Hongsheng
Chen, Kai
Lu, Jing
INTERSPEECH 2021, 2021, : 2811 - 2815
[45] Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement
Song, Zhendong
Ma, Yupeng
Tan, Fang
Feng, Xiaoyi
APPLIED SCIENCES-BASEL, 2022, 12 (07):
[46] DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
Hu, Yanxin
Liu, Yun
Lv, Shubo
Xing, Mengtao
Zhang, Shimin
Fu, Yihui
Wu, Jian
Zhang, Bihong
Xie, Lei
INTERSPEECH 2020, 2020, : 2472 - 2476
[47] Three-Dimensional Point Cloud Semantic Segmentation Network Based on Spatial Graph Convolution Network
Zhang Kun
Zhu Yawei
Wang Xiaohong
Zhang Liting
Zhong Ruofei
LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (02)
[48] A graph convolution network based latency prediction algorithm for convolution neural network
Li Z.
Zhang R.
Tan W.
Ren Y.
Lei M.
Wu H.
Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2022, 48 (12): : 2450 - 2459
[49] STGHTN: Spatial-temporal gated hybrid transformer network for traffic flow forecasting
Liu, Jiansong
Kang, Yan
Li, Hao
Wang, Haining
Yang, Xuekun
APPLIED INTELLIGENCE, 2023, 53 (10) : 12472 - 12488
[50] STGHTN: Spatial-temporal gated hybrid transformer network for traffic flow forecasting
Jiansong Liu
Yan Kang
Hao Li
Haining Wang
Xuekun Yang
Applied Intelligence, 2023, 53 : 12472 - 12488

← 1 2 3 4 5 →