A Mel Spectrogram Enhancement Paradigm Based on CWT in Speech Synthesis

被引：0

作者：

Hu, Guoqiang ^{[1
]}

Tan, Huaning ^{[1
]}

Li, Ruilai ^{[1
]}

机构：

[1] Jinan Univ, Int Sch, Guangzhou, Peoples R China

来源：

2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024 | 2024年

关键词：

Mel spectrogram; Speech Synthesis; Fine Grainedness; Continuous Wavelet Transform;

D O I：

10.1109/IALP63756.2024.10661192

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Acoustic features play an important role in improving the quality of the synthesised speech. Currently, the Mel spectrogram is a widely employed acoustic feature in most acoustic models. However, due to the fine-grained loss caused by its Fourier transform process, the clarity of speech synthesised by Mel spectrogram is compromised in mutant signals. In order to obtain a more detailed Mel spectrogram, we propose a Mel spectrogram enhancement paradigm based on the continuous wavelet transform (CWT). This paradigm introduces an additional task: a more detailed wavelet spectrogram, which like the post-processing network takes as input the Mel spectrogram output by the decoder. We choose Tacotron2 and Fastspeech2 for experimental validation in order to test autoregressive (AR) and non-autoregressive (NAR) speech systems, respectively. The experimental results demonstrate that the speech synthesised using the model with the Mel spectrogram enhancement paradigm exhibits higher MOS, with an improvement of 0.14 and 0.09 compared to the baseline model, respectively. These findings provide some validation for the universality of the enhancement paradigm, as they demonstrate the success of the paradigm in different architectures.

引用

页码：401 / 405

页数：5

共 50 条

[41] Whispered Speech Enhancement Based on Improved Mel Frequency Scale and Modified Compensated Phase Spectrum
Yi Wei
Chen Li
Tianfeng Li
Yumin Zeng
Circuits, Systems, and Signal Processing, 2019, 38 : 5839 - 5860
[42] A Bitrate-Scalable Variational Recurrent Mel-Spectrogram Coder for Real-Time Resynthesis-Based Speech Coding
Stahl, Benjamin
Windtner, Simon
Sontacchi, Alois
IEEE ACCESS, 2024, 12 : 159239 - 159251
[43] Speech Enhancement Algorithm Based on Robust Principal Component Analysis with Whitened Spectrogram Rearrangement in Colored Noise
Luo Yongjiang
Yang Tengfei
Zhao Dong
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2021, 43 (12) : 3671 - 3679
[44] Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction
Shi, Hao
Wang, Longbiao
Li, Sheng
Dang, Jianwu
Kawahara, Tatsuya
INTERSPEECH 2022, 2022, : 221 - 225
[45] Single-channel speech enhancement based on multi-band spectrogram-rearranged RPCA
Luo, Yongjiang
Mao, Yu
ELECTRONICS LETTERS, 2019, 55 (07) : 415 - +
[46] Multiple Channels Model Based on Mel Spectrogram for Classifying Abnormalities in Lung Sound
Huong, Pham Thi Viet
Thinh, Le Duc
Kien, Phung Van
Vu, Tran Anh
JOURNAL OF BIOMIMETICS BIOMATERIALS AND BIOMEDICAL ENGINEERING, 2023, 63 : 63 - 72
[47] Acoustic scene classification method based on Mel-spectrogram separation and LSCNet
Fei H.
Wu W.
Li P.
Cao Y.
Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2022, 54 (05): : 124 - 130and123
[48] Exponentiated magnitude spectrogram-based relative-to-maximum masking for speech enhancement in adverse environments
Lin, Chen-Li
Lin, Zi-Qiang
Wang, Syu-Siang
Tsao, Yu
Hung, Jeih-weih
2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TAIWAN), 2020,
[49] Quantitative analysis of terahertz signals using CWT-based spectrogram and Zernike image moments
Zhou, Shengling
Tang, Xin
Zou, Jiaqi
Zhu, Shiping
SPECTROCHIMICA ACTA PART A-MOLECULAR AND BIOMOLECULAR SPECTROSCOPY, 2023, 290
[50] AN EXPLORATION OF LOG-MEL SPECTROGRAM AND MFCC FEATURES FOR ALZHEIMER'S DEMENTIA RECOGNITION FROM SPONTANEOUS SPEECH
Meghanani, Amit
Anoop, C. S.
Ramakrishnan, A. G.
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 670 - 677

← 1 2 3 4 5 →