SPATIAL-TEMPORAL GRAPH CONVOLUTION NETWORK FOR MULTICHANNEL SPEECH ENHANCEMENT

被引：4

作者：

Hao, Minghui ^{[1
]}

Yu, Jingjing ^{[1
]}

Zhang, Luyao ^{[1
]}

机构：

[1] Beijing Jiaotong Univ, Elect & Informat Engn, Beijing, Peoples R China

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

Graph convolution network; spatial dependency extraction; spatial-temporal convolution module; SII-weighted loss function; speech enhancement;

D O I：

10.1109/ICASSP43922.2022.9746054

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Spatial dependency related to distributed microphone positions is essential for multichannel speech enhancement task. It is still challenging due to lack of accurate array positions and complex spatial-temporal relations of multichannel noisy signals This paper proposes a spatial-temporal graph convolutional network composed of cascaded spatial-temporal (ST) modules with channel fusion. Without any prior information of array and acoustic scene, a graph convolution block is designed with learnable adjacency matrix to capture the spatial dependency of pairwise channels. Then, it is embedded with time-frequency convolution block as the ST module to fuse the multi-dimensional correlation features for target speech estimation. Furthermore, a novel weighted loss function based on speech intelligibility index (SII) is proposed to assign more attention for the important bands of human understanding during network training. Our framework is demonstrated to achieve over 11% performance improvement on PESQ and intelligibility against prior state-of-the-art approaches in multi-scene speech enhancement experiments.

引用

页码：6512 / 6516

页数：5

共 50 条

[21] COMPLEX-VALUED SPATIAL AUTOENCODERS FOR MULTICHANNEL SPEECH ENHANCEMENT
Halimeh, Mhd Modar
Kellermann, Walter
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 261 - 265
[22] Multi-source information fusion for dynamic safety risk prediction of aerial building machine using spatial-temporal multi-graph convolution network
Wang, Jiaqi
Fan, Yuqing
Pan, Xi
Sun, Jun
Zhang, Limao
ADVANCED ENGINEERING INFORMATICS, 2025, 65
[23] Spatial temporal graph convolution network for the analysis of regional wall motion in left ventricular opacification echocardiography
Cui, Rongpu
He, Wenfeng
Huang, Junhao
Zhang, Junyan
Zhang, Haozhe
Liang, Shichu
He, Yujun
Liu, Zhiyue
Gao, Shaobing
He, Yong
Peng, Jian
Huang, He
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 103
[24] A Spatial-Temporal Similar Graph Attention Network for Cyber Physical System Perception via Traffic Forecasting
Zhao, Kaidi
Xu, Mingyue
Yang, Zhengzhuang
Han, Dingding
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2022, 31 (06)
[25] NONLINEAR SPATIAL FILTERING FOR MULTICHANNEL SPEECH ENHANCEMENT IN INHOMOGENEOUS NOISE FIELDS
Tesch, Kristina
Gerkmann, Timo
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 196 - 200
[26] DConvT: Deep Convolution-Transformer Network Utilizing Multi-scale Temporal Attention for Speech Enhancement
Hoang Ngoc Chau
Anh Xuan Tran Thi
Quoc Cuong Nguyen
2024 IEEE TENTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS, ICCE 2024, 2024, : 398 - 402
[27] Multichannel parametric speech enhancement
Srinivasan, S
Aichner, R
Kleijn, WB
Kellermann, W
IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) : 304 - 307
[28] Complex Event Recognition via Spatial-Temporal Relation Graph Reasoning
Lin, Huan
Zhao, Hongtian
Yang, Hua
2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
[29] A signal subspace approach to spatio-temporal prediction for multichannel speech enhancement
Adam Borowicz
EURASIP Journal on Audio, Speech, and Music Processing, 2015
[30] ST_AGCNT: Traffic Speed Forecasting Based on Spatial-Temporal Adaptive Graph Convolutional Network with Transformer
Cheng, Rongjun
Liu, Mengxia
Xu, Yuanzi
SUSTAINABILITY, 2025, 17 (05)

← 1 2 3 4 5 →