SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network

被引:1
|
作者
Lv R. [1 ]
Chen N. [1 ]
Cheng S. [1 ]
Fan G. [1 ]
Rao L. [1 ]
Song X. [1 ]
Lv W. [2 ]
Yang D. [3 ]
机构
[1] School of Electronic Information Engineering, Shanghai Dianji University, Shanghai
[2] School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai
[3] Alibaba Group, Shanghai
基金
中国国家自然科学基金;
关键词
autoencoder; deep learning; generative adversarial network; speech enhancement;
D O I
10.3934/mbe.2024172
中图分类号
学科分类号
摘要
Traditional unsupervised speech enhancement models often have problems such as non-aggregation of input feature information, which will introduce additional noise during training, thereby reducing the quality of the speech signal. In order to solve the above problems, this paper analyzed the impact of problems such as non-aggregation of input speech feature information on its performance. Moreover, this article introduced a temporal convolutional neural network and proposed a SASEGAN-TCN speech enhancement model, which captured local features information and aggregated global feature information to improve model effect and training stability. The simulation experiment results showed that the model can achieve 2.1636 and 92.78% in perceptual evaluation of speech quality (PESQ) score and short-time objective intelligibility (STOI) on the Valentini dataset, and can accordingly reach 1.8077 and 83.54% on the THCHS30 dataset. In addition, this article used the enhanced speech data for the acoustic model to verify the recognition accuracy. The speech recognition error rate was reduced by 17.4%, which was a significant improvement compared to the baseline model experimental results. © 2024 the Author(s).
引用
收藏
页码:3860 / 3875
页数:15
相关论文
共 50 条
  • [41] Speech Enhancement of Complex Convolutional Recurrent Network with Attention
    Jiangjiao Zeng
    Lidong Yang
    Circuits, Systems, and Signal Processing, 2023, 42 : 1834 - 1847
  • [42] Unsupervised unpaired multiple fusion adaptation aided with self-attention generative adversarial network for scar tissues segmentation framework
    Qayyum, Abdul
    Razzak, Imran
    Mazher, Moona
    Lu, Xuequan
    Niederer, Steven A.
    INFORMATION FUSION, 2024, 106
  • [43] Occluded offline handwritten Chinese character inpainting via generative adversarial network and self-attention mechanism
    Song, Ge
    Li, Jianwu
    Wang, Zheng
    NEUROCOMPUTING, 2020, 415 : 146 - 156
  • [44] MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR SPEECH ENHANCEMENT
    Zhang, Guochang
    Yu, Libiao
    Wang, Chunliang
    Wei, Jianqiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9122 - 9126
  • [45] CT and MRI fusion based on generative adversarial network and convolutional neural networks under image enhancement
    Liu Y.
    Li J.
    Wang Y.
    Cai W.
    Chen F.
    Liu W.
    Mao X.
    Gan K.
    Wang R.
    Sun D.
    Qiu H.
    Liu B.
    Shengwu Yixue Gongchengxue Zazhi/Journal of Biomedical Engineering, 2023, 40 (02): : 208 - 216
  • [46] A Dual Stream Generative Adversarial Network with Phase Awareness for Speech Enhancement
    Liang, Xintao
    Li, Yuhang
    Li, Xiaomin
    Zhang, Yue
    Ding, Youdong
    INFORMATION, 2023, 14 (04)
  • [47] LumiNet: Multispatial Attention Generative Adversarial Network for Backlit Image Enhancement
    Bose, Samprit
    Nawale, Sahil
    Khut, Dhruv
    Kolekar, Maheshkumar H.
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [48] Calligraphy generation algorithm based on improved generative adversarial network
    Li Y.-H.
    Duan J.-J.
    Su X.-P.
    Zhang L.-T.
    Yu H.-K.
    Liu X.-R.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2023, 57 (07): : 1326 - 1334+1459
  • [49] Speech Enhancement Using Generative Adversarial Network by Distilling Knowledge from Statistical Method
    Wu, Jianfeng
    Hua, Yongzhu
    Yang, Shengying
    Qin, Hongshuai
    Qin, Huibin
    APPLIED SCIENCES-BASEL, 2019, 9 (16):
  • [50] TFDense-GAN: a generative adversarial network for single-channel speech enhancement
    Chen, Haoxiang
    Zhang, Jinxiu
    Fu, Yaogang
    Zhou, Xintong
    Wang, Ruilong
    Xu, Yanyan
    Ke, Dengfeng
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2025, 2025 (01):