A Mel Spectrogram Enhancement Paradigm Based on CWT in Speech Synthesis

被引:0
|
作者
Hu, Guoqiang [1 ]
Tan, Huaning [1 ]
Li, Ruilai [1 ]
机构
[1] Jinan Univ, Int Sch, Guangzhou, Peoples R China
来源
2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024 | 2024年
关键词
Mel spectrogram; Speech Synthesis; Fine Grainedness; Continuous Wavelet Transform;
D O I
10.1109/IALP63756.2024.10661192
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Acoustic features play an important role in improving the quality of the synthesised speech. Currently, the Mel spectrogram is a widely employed acoustic feature in most acoustic models. However, due to the fine-grained loss caused by its Fourier transform process, the clarity of speech synthesised by Mel spectrogram is compromised in mutant signals. In order to obtain a more detailed Mel spectrogram, we propose a Mel spectrogram enhancement paradigm based on the continuous wavelet transform (CWT). This paradigm introduces an additional task: a more detailed wavelet spectrogram, which like the post-processing network takes as input the Mel spectrogram output by the decoder. We choose Tacotron2 and Fastspeech2 for experimental validation in order to test autoregressive (AR) and non-autoregressive (NAR) speech systems, respectively. The experimental results demonstrate that the speech synthesised using the model with the Mel spectrogram enhancement paradigm exhibits higher MOS, with an improvement of 0.14 and 0.09 compared to the baseline model, respectively. These findings provide some validation for the universality of the enhancement paradigm, as they demonstrate the success of the paradigm in different architectures.
引用
收藏
页码:401 / 405
页数:5
相关论文
共 50 条
  • [41] Whispered Speech Enhancement Based on Improved Mel Frequency Scale and Modified Compensated Phase Spectrum
    Yi Wei
    Chen Li
    Tianfeng Li
    Yumin Zeng
    Circuits, Systems, and Signal Processing, 2019, 38 : 5839 - 5860
  • [42] A Bitrate-Scalable Variational Recurrent Mel-Spectrogram Coder for Real-Time Resynthesis-Based Speech Coding
    Stahl, Benjamin
    Windtner, Simon
    Sontacchi, Alois
    IEEE ACCESS, 2024, 12 : 159239 - 159251
  • [43] Speech Enhancement Algorithm Based on Robust Principal Component Analysis with Whitened Spectrogram Rearrangement in Colored Noise
    Luo Yongjiang
    Yang Tengfei
    Zhao Dong
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2021, 43 (12) : 3671 - 3679
  • [44] Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction
    Shi, Hao
    Wang, Longbiao
    Li, Sheng
    Dang, Jianwu
    Kawahara, Tatsuya
    INTERSPEECH 2022, 2022, : 221 - 225
  • [45] Single-channel speech enhancement based on multi-band spectrogram-rearranged RPCA
    Luo, Yongjiang
    Mao, Yu
    ELECTRONICS LETTERS, 2019, 55 (07) : 415 - +
  • [46] Multiple Channels Model Based on Mel Spectrogram for Classifying Abnormalities in Lung Sound
    Huong, Pham Thi Viet
    Thinh, Le Duc
    Kien, Phung Van
    Vu, Tran Anh
    JOURNAL OF BIOMIMETICS BIOMATERIALS AND BIOMEDICAL ENGINEERING, 2023, 63 : 63 - 72
  • [47] Acoustic scene classification method based on Mel-spectrogram separation and LSCNet
    Fei H.
    Wu W.
    Li P.
    Cao Y.
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2022, 54 (05): : 124 - 130and123
  • [48] Exponentiated magnitude spectrogram-based relative-to-maximum masking for speech enhancement in adverse environments
    Lin, Chen-Li
    Lin, Zi-Qiang
    Wang, Syu-Siang
    Tsao, Yu
    Hung, Jeih-weih
    2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TAIWAN), 2020,
  • [49] Quantitative analysis of terahertz signals using CWT-based spectrogram and Zernike image moments
    Zhou, Shengling
    Tang, Xin
    Zou, Jiaqi
    Zhu, Shiping
    SPECTROCHIMICA ACTA PART A-MOLECULAR AND BIOMOLECULAR SPECTROSCOPY, 2023, 290
  • [50] AN EXPLORATION OF LOG-MEL SPECTROGRAM AND MFCC FEATURES FOR ALZHEIMER'S DEMENTIA RECOGNITION FROM SPONTANEOUS SPEECH
    Meghanani, Amit
    Anoop, C. S.
    Ramakrishnan, A. G.
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 670 - 677