A Mel Spectrogram Enhancement Paradigm Based on CWT in Speech Synthesis

被引:0
|
作者
Hu, Guoqiang [1 ]
Tan, Huaning [1 ]
Li, Ruilai [1 ]
机构
[1] Jinan Univ, Int Sch, Guangzhou, Peoples R China
来源
2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024 | 2024年
关键词
Mel spectrogram; Speech Synthesis; Fine Grainedness; Continuous Wavelet Transform;
D O I
10.1109/IALP63756.2024.10661192
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Acoustic features play an important role in improving the quality of the synthesised speech. Currently, the Mel spectrogram is a widely employed acoustic feature in most acoustic models. However, due to the fine-grained loss caused by its Fourier transform process, the clarity of speech synthesised by Mel spectrogram is compromised in mutant signals. In order to obtain a more detailed Mel spectrogram, we propose a Mel spectrogram enhancement paradigm based on the continuous wavelet transform (CWT). This paradigm introduces an additional task: a more detailed wavelet spectrogram, which like the post-processing network takes as input the Mel spectrogram output by the decoder. We choose Tacotron2 and Fastspeech2 for experimental validation in order to test autoregressive (AR) and non-autoregressive (NAR) speech systems, respectively. The experimental results demonstrate that the speech synthesised using the model with the Mel spectrogram enhancement paradigm exhibits higher MOS, with an improvement of 0.14 and 0.09 compared to the baseline model, respectively. These findings provide some validation for the universality of the enhancement paradigm, as they demonstrate the success of the paradigm in different architectures.
引用
收藏
页码:401 / 405
页数:5
相关论文
共 50 条
  • [1] Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization
    Li, Li
    Kameoka, Hirokazu
    Toda, Tomoki
    Makino, Shoji
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1998 - 2002
  • [2] Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method
    Boulal H.
    Hamidi M.
    Abarkan M.
    Barkani J.
    International Journal of Speech Technology, 2024, 27 (01) : 287 - 296
  • [3] GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram
    Juvela, Lauri
    Bollepalli, Bajibabu
    Yamagishi, Junichi
    Alku, Paavo
    INTERSPEECH 2019, 2019, : 694 - 698
  • [4] Speech enhancement based on perceptually motivated guided spectrogram filtering
    Wang, Jie
    Yan, Linhuang
    Yang, Qiaohe
    Yuan, Minmin
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (03) : 5443 - 5454
  • [5] Speech Enhancement Based On Spectrogram Conditional Generative Adversarial Networks
    Han, Ru
    Liu, Jianming
    Wang, Mingwen
    ELEVENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2019), 2020, 11373
  • [6] Speech Enhancement Algorithm based on Reassigned Spectrogram and Auditory Masking
    Wang, Jie
    Yang, Chengcheng
    Huang, Manlu
    Yan, Linhuang
    Sang, Jinqiu
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 771 - 778
  • [7] On the Use of Spectrogram Inversion for Speech Enhancement
    Bedoui, Raja Abdelmalek
    Mnasri, Zied
    Benzarti, Faouzi
    2021 18TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2021, : 852 - 857
  • [8] BLIND SPEECH SEGMENTATION USING SPECTROGRAM IMAGE-BASED FEATURES AND MEL CEPSTRAL COEFFICIENTS
    Stan, Adriana
    Valentini-Botinhao, Cassia
    Orza, Bogdan
    Giurgiu, Mircea
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 597 - 602
  • [9] The impact of MFCC, spectrogram, and Mel-Spectrogram on deep learning models for Amazigh speech recognition system
    Meryam Telmem
    Naouar Laaidi
    Hassan Satori
    International Journal of Speech Technology, 2025, 28 (1) : 299 - 312
  • [10] Speech enhancement algorithm of improved OMLSA based on bilateral spectrogram filtering
    Wang, Jie
    Yan, Linhuang
    Tian, Jiayi
    Yuan, Minmin
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (05) : 6881 - 6889