AN INTRA- AND INTER-FRAME SEQUENCE MODEL WITH DISCRETE COSINE TRANSFORM FOR STREAMING SPEECH ENHANCEMENT<bold> </bold>

被引:0
作者
Zhang, Yuewei [1 ]
Zhuo, Huanbin [2 ]
Zhu, Jie [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai, Peoples R China
[2] Tencent Video Cloud, Shanghai, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS, ICMEW 2024 | 2024年
关键词
Speech enhancement; dual sequence modeling; discrete cosine transform; causal convolution<bold>; </bold>;
D O I
10.1109/ICMEW63481.2024.10645392
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, in order to improve the speech enhancement performance, many methods attempt to reconstruct the target magnitude and phase spectrum simultaneously. They usually process the complex short-time Fourier transform (STFT) spectrum, leading to a huge model complexity. In this paper, we utilize the short-time discrete cosine transform (STDCT) rather than STFT. Since STDCT is a lossless real-valued transformation with implicit phase, our method achieves an excellent performance with lower complexity. Besides, we take convolutional recurrent network (CRN) as the network backbone, and design a dual sequence modeling block to capture the intra-frame correlation among different frequency bins and the inter-frame context along the time dimension simultaneously, so we name our model IICRN. The experimental results indicate that IICRN achieves superior performance over previous advanced methods.<bold> </bold>
引用
收藏
页数:4
相关论文
共 21 条
  • [1] DISCRETE COSINE TRANSFORM
    AHMED, N
    NATARAJAN, T
    RAO, KR
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1974, C 23 (01) : 90 - 93
  • [2] Speech Enhancement with Fullband-Subband Cross-Attention Network
    Chen, Jun
    Rao, Wei
    Wang, Zilin
    Wu, Zhiyong
    Wang, Yannan
    Yu, Tao
    Shang, Shidong
    Meng, Helen
    [J]. INTERSPEECH 2022, 2022, : 976 - 980
  • [3] FullSubNet plus : CHANNEL ATTENTION FULLSUBNET WITH COMPLEX SPECTROGRAMS FOR SPEECH ENHANCEMENT
    Chen, Jun
    Wang, Zilin
    Tuo, Deyi
    Wu, Zhiyong
    Kang, Shiyin
    Meng, Helen
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7857 - 7861
  • [4] Lightweight Full-band and Sub-band Fusion Network for Real Time Speech Enhancement
    Chen, Zhuangqi
    Zhang, Pingjian
    [J]. INTERSPEECH 2022, 2022, : 921 - 925
  • [5] Choi Hyeong-Seok, 2019, INT C LEARN REPR
  • [6] Chuang Geng, 2020, 2020 Proceedings of IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), P379, DOI 10.1109/ICAICA50127.2020.9182513
  • [7] Real Time Speech Enhancement in the Waveform Domain
    Defossez, Alexandre
    Synnaeve, Gabriel
    Adi, Yossi
    [J]. INTERSPEECH 2020, 2020, : 3291 - 3295
  • [8] CompNet: Complementary network for single-channel speech enhancement
    Fan, Cunhang
    Zhang, Hongmei
    Li, Andong
    Xiang, Wang
    Zheng, Chengshi
    Lv, Zhao
    Wu, Xiaopei
    [J]. NEURAL NETWORKS, 2023, 168 : 508 - 517
  • [9] FULLSUBNET: A FULL-BAND AND SUB-BAND FUSION MODEL FOR REAL-TIME SINGLE-CHANNEL SPEECH ENHANCEMENT
    Hao, Xiang
    Su, Xiangdong
    Horaud, Radu
    Li, Xiaofei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6633 - 6637
  • [10] DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
    Hu, Yanxin
    Liu, Yun
    Lv, Shubo
    Xing, Mengtao
    Zhang, Shimin
    Fu, Yihui
    Wu, Jian
    Zhang, Bihong
    Xie, Lei
    [J]. INTERSPEECH 2020, 2020, : 2472 - 2476