SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network

被引：1

作者：

Lv R. ^{[1
]}

Chen N. ^{[1
]}

Cheng S. ^{[1
]}

Fan G. ^{[1
]}

Rao L. ^{[1
]}

Song X. ^{[1
]}

Lv W. ^{[2
]}

Yang D. ^{[3
]}

机构：

[1] School of Electronic Information Engineering, Shanghai Dianji University, Shanghai

[2] School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai

[3] Alibaba Group, Shanghai

来源：

Mathematical Biosciences and Engineering | 2024年 / 21卷 / 03期

基金：

中国国家自然科学基金;

关键词：

autoencoder; deep learning; generative adversarial network; speech enhancement;

D O I：

10.3934/mbe.2024172

中图分类号：

学科分类号：

摘要：

Traditional unsupervised speech enhancement models often have problems such as non-aggregation of input feature information, which will introduce additional noise during training, thereby reducing the quality of the speech signal. In order to solve the above problems, this paper analyzed the impact of problems such as non-aggregation of input speech feature information on its performance. Moreover, this article introduced a temporal convolutional neural network and proposed a SASEGAN-TCN speech enhancement model, which captured local features information and aggregated global feature information to improve model effect and training stability. The simulation experiment results showed that the model can achieve 2.1636 and 92.78% in perceptual evaluation of speech quality (PESQ) score and short-time objective intelligibility (STOI) on the Valentini dataset, and can accordingly reach 1.8077 and 83.54% on the THCHS30 dataset. In addition, this article used the enhanced speech data for the acoustic model to verify the recognition accuracy. The speech recognition error rate was reduced by 17.4%, which was a significant improvement compared to the baseline model experimental results. © 2024 the Author(s).

引用

页码：3860 / 3875

页数：15

共 50 条

[21] Temporal Convolutional Network with Frequency Dimension Adaptive Attention for Speech Enhancement
Zhang, Qiquan
Song, Qi
Nicolson, Aaron
Lan, Tian
Li, Haizhou
INTERSPEECH 2021, 2021, : 166 - 170
[22] Application of Self-Attention Generative Adversarial Network for Electromagnetic Imaging in Half-Space
Chiu, Chien-Ching
Lee, Yang-Han
Chen, Po-Hsiang
Shih, Ying-Chen
Hao, Jiang
SENSORS, 2024, 24 (07)
[23] A Novel Small Samples Fault Diagnosis Method Based on the Self-attention Wasserstein Generative Adversarial Network
Shang, Zhiwu
Zhang, Jie
Li, Wanxiang
Qian, Shiqi
Liu, Jingyu
Gao, Maosheng
NEURAL PROCESSING LETTERS, 2023, 55 (05) : 6377 - 6407
[24] Speech Enhancement Using Generative Adversarial Network (GAN)
Huq, Mahmudul
Maskeliunas, Rytis
HYBRID INTELLIGENT SYSTEMS, HIS 2021, 2022, 420 : 273 - 282
[25] On the Use of Audio Fingerprinting Features for Speech Enhancement with Generative Adversarial Network
Faraji, Farnood
Attabi, Yazid
Champagne, Benoit
Zhu, Wei-Ping
2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2020, : 77 - 82
[26] Defense method of smart grid GPS spoofing attack based on improved self-attention generative adversarial network
Li Y.
Yang S.
Dianli Zidonghua Shebei/Electric Power Automation Equipment, 2021, 41 (11): : 100 - 106
[27] Research on clothing patterns generation based on multi-scales self-attention improved generative adversarial network
Yu, Zi-yan
Luo, Tian-jian
INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2021, 14 (04) : 647 - 663
[28] SA-CapsGAN: Using Capsule Networks with embedded self-attention for Generative Adversarial Network
Sun, Guangcong
Ding, Shifei
Sun, Tongfeng
Zhang, Chenglong
NEUROCOMPUTING, 2021, 423 (423) : 399 - 406
[29] Stroke Electroencephalogram Data Synthesizing through Progressive Efficient Self-Attention Generative Adversarial Network
Wang, Suzhe
Zhang, Xueying
Li, Fenglian
Wu, Zelin
CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 81 (01): : 1177 - 1196
[30] Enhancing Automatic Speech Recognition Quality with a Second-Stage Speech Enhancement Generative Adversarial Network
Nossier, Soha A.
Wall, Julie
Moniri, Mansour
Glackin, Cornelius
Cannings, Nigel
2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 546 - 552

← 1 2 3 4 5 →