AN INTRA- AND INTER-FRAME SEQUENCE MODEL WITH DISCRETE COSINE TRANSFORM FOR STREAMING SPEECH ENHANCEMENT<bold> </bold>

被引：0

作者：

Zhang, Yuewei ^{[1
]}

Zhuo, Huanbin ^{[2
]}

Zhu, Jie ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai, Peoples R China

[2] Tencent Video Cloud, Shanghai, Peoples R China

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS, ICMEW 2024 | 2024年

关键词：

Speech enhancement; dual sequence modeling; discrete cosine transform; causal convolution<bold>; </bold>;

D O I：

10.1109/ICMEW63481.2024.10645392

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Nowadays, in order to improve the speech enhancement performance, many methods attempt to reconstruct the target magnitude and phase spectrum simultaneously. They usually process the complex short-time Fourier transform (STFT) spectrum, leading to a huge model complexity. In this paper, we utilize the short-time discrete cosine transform (STDCT) rather than STFT. Since STDCT is a lossless real-valued transformation with implicit phase, our method achieves an excellent performance with lower complexity. Besides, we take convolutional recurrent network (CRN) as the network backbone, and design a dual sequence modeling block to capture the intra-frame correlation among different frequency bins and the inter-frame context along the time dimension simultaneously, so we name our model IICRN. The experimental results indicate that IICRN achieves superior performance over previous advanced methods.<bold> </bold>

引用

页数：4

共 21 条

[1] DISCRETE COSINE TRANSFORM
AHMED, N
NATARAJAN, T
RAO, KR
[J]. IEEE TRANSACTIONS ON COMPUTERS, 1974, C 23 (01) : 90 - 93
[2] Speech Enhancement with Fullband-Subband Cross-Attention Network
Chen, Jun
Rao, Wei
Wang, Zilin
Wu, Zhiyong
Wang, Yannan
Yu, Tao
Shang, Shidong
Meng, Helen
[J]. INTERSPEECH 2022, 2022, : 976 - 980
[3] FullSubNet plus : CHANNEL ATTENTION FULLSUBNET WITH COMPLEX SPECTROGRAMS FOR SPEECH ENHANCEMENT
Chen, Jun
Wang, Zilin
Tuo, Deyi
Wu, Zhiyong
Kang, Shiyin
Meng, Helen
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7857 - 7861
[4] Lightweight Full-band and Sub-band Fusion Network for Real Time Speech Enhancement
Chen, Zhuangqi
Zhang, Pingjian
[J]. INTERSPEECH 2022, 2022, : 921 - 925
[5] Choi Hyeong-Seok, 2019, INT C LEARN REPR
[6] Chuang Geng, 2020, 2020 Proceedings of IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), P379, DOI 10.1109/ICAICA50127.2020.9182513
[7] Real Time Speech Enhancement in the Waveform Domain
Defossez, Alexandre
Synnaeve, Gabriel
Adi, Yossi
[J]. INTERSPEECH 2020, 2020, : 3291 - 3295
[8] CompNet: Complementary network for single-channel speech enhancement
Fan, Cunhang
Zhang, Hongmei
Li, Andong
Xiang, Wang
Zheng, Chengshi
Lv, Zhao
Wu, Xiaopei
[J]. NEURAL NETWORKS, 2023, 168 : 508 - 517
[9] FULLSUBNET: A FULL-BAND AND SUB-BAND FUSION MODEL FOR REAL-TIME SINGLE-CHANNEL SPEECH ENHANCEMENT
Hao, Xiang
Su, Xiangdong
Horaud, Radu
Li, Xiaofei
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6633 - 6637
[10] DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
Hu, Yanxin
Liu, Yun
Lv, Shubo
Xing, Mengtao
Zhang, Shimin
Fu, Yihui
Wu, Jian
Zhang, Bihong
Xie, Lei
[J]. INTERSPEECH 2020, 2020, : 2472 - 2476

← 1 2 3 →