SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network

被引:1
|
作者
Lv R. [1 ]
Chen N. [1 ]
Cheng S. [1 ]
Fan G. [1 ]
Rao L. [1 ]
Song X. [1 ]
Lv W. [2 ]
Yang D. [3 ]
机构
[1] School of Electronic Information Engineering, Shanghai Dianji University, Shanghai
[2] School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai
[3] Alibaba Group, Shanghai
基金
中国国家自然科学基金;
关键词
autoencoder; deep learning; generative adversarial network; speech enhancement;
D O I
10.3934/mbe.2024172
中图分类号
学科分类号
摘要
Traditional unsupervised speech enhancement models often have problems such as non-aggregation of input feature information, which will introduce additional noise during training, thereby reducing the quality of the speech signal. In order to solve the above problems, this paper analyzed the impact of problems such as non-aggregation of input speech feature information on its performance. Moreover, this article introduced a temporal convolutional neural network and proposed a SASEGAN-TCN speech enhancement model, which captured local features information and aggregated global feature information to improve model effect and training stability. The simulation experiment results showed that the model can achieve 2.1636 and 92.78% in perceptual evaluation of speech quality (PESQ) score and short-time objective intelligibility (STOI) on the Valentini dataset, and can accordingly reach 1.8077 and 83.54% on the THCHS30 dataset. In addition, this article used the enhanced speech data for the acoustic model to verify the recognition accuracy. The speech recognition error rate was reduced by 17.4%, which was a significant improvement compared to the baseline model experimental results. © 2024 the Author(s).
引用
收藏
页码:3860 / 3875
页数:15
相关论文
共 50 条
  • [21] Temporal Convolutional Network with Frequency Dimension Adaptive Attention for Speech Enhancement
    Zhang, Qiquan
    Song, Qi
    Nicolson, Aaron
    Lan, Tian
    Li, Haizhou
    INTERSPEECH 2021, 2021, : 166 - 170
  • [22] Application of Self-Attention Generative Adversarial Network for Electromagnetic Imaging in Half-Space
    Chiu, Chien-Ching
    Lee, Yang-Han
    Chen, Po-Hsiang
    Shih, Ying-Chen
    Hao, Jiang
    SENSORS, 2024, 24 (07)
  • [23] A Novel Small Samples Fault Diagnosis Method Based on the Self-attention Wasserstein Generative Adversarial Network
    Shang, Zhiwu
    Zhang, Jie
    Li, Wanxiang
    Qian, Shiqi
    Liu, Jingyu
    Gao, Maosheng
    NEURAL PROCESSING LETTERS, 2023, 55 (05) : 6377 - 6407
  • [24] Speech Enhancement Using Generative Adversarial Network (GAN)
    Huq, Mahmudul
    Maskeliunas, Rytis
    HYBRID INTELLIGENT SYSTEMS, HIS 2021, 2022, 420 : 273 - 282
  • [25] On the Use of Audio Fingerprinting Features for Speech Enhancement with Generative Adversarial Network
    Faraji, Farnood
    Attabi, Yazid
    Champagne, Benoit
    Zhu, Wei-Ping
    2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2020, : 77 - 82
  • [26] Defense method of smart grid GPS spoofing attack based on improved self-attention generative adversarial network
    Li Y.
    Yang S.
    Dianli Zidonghua Shebei/Electric Power Automation Equipment, 2021, 41 (11): : 100 - 106
  • [27] Research on clothing patterns generation based on multi-scales self-attention improved generative adversarial network
    Yu, Zi-yan
    Luo, Tian-jian
    INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2021, 14 (04) : 647 - 663
  • [28] SA-CapsGAN: Using Capsule Networks with embedded self-attention for Generative Adversarial Network
    Sun, Guangcong
    Ding, Shifei
    Sun, Tongfeng
    Zhang, Chenglong
    NEUROCOMPUTING, 2021, 423 (423) : 399 - 406
  • [29] Stroke Electroencephalogram Data Synthesizing through Progressive Efficient Self-Attention Generative Adversarial Network
    Wang, Suzhe
    Zhang, Xueying
    Li, Fenglian
    Wu, Zelin
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 81 (01): : 1177 - 1196
  • [30] Enhancing Automatic Speech Recognition Quality with a Second-Stage Speech Enhancement Generative Adversarial Network
    Nossier, Soha A.
    Wall, Julie
    Moniri, Mansour
    Glackin, Cornelius
    Cannings, Nigel
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 546 - 552