A continuous vocoder for statistical parametric speech synthesis and its evaluation using an audio-visual phonetically annotated Arabic corpus

被引:5
|
作者
Al-Radhi, Mohammed Salah [1 ]
Abdo, Omnia [2 ]
Csapo, Tamas Gabor [1 ,4 ]
Abdou, Sherif [3 ]
Nemeth, Geza [1 ]
Fashal, Mervat [2 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary
[2] Alexandria Univ, Dept Phonet & Linguist, Alexandria, Egypt
[3] Cairo Univ, Fac Comp & Informat, Giza, Egypt
[4] MTA ELTE Lendulet Lingual Articulat Res Grp, Budapest, Hungary
关键词
Speech synthesis; Continuous vocoder; Envelope; Arabic; PLUS NOISE MODEL; ENVELOPE; INTELLIGIBILITY; EXTRACTION; HMM;
D O I
10.1016/j.csl.2019.101025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present an extension of a novel continuous residual-based vocoder for statistical parametric speech synthesis by addressing two objectives. First, because the noise component is often not accurately modelled in modern vocoders (e.g. STRAIGHT), a new technique for modelling unvoiced sounds is proposed by adding time domain envelope to the unvoiced segments to avoid any residual buzziness. Four time-domain envelopes (Amplitude, Hilbert, Triangular and True) are investigated, enhanced, and then applied to the noise component of the excitation in our continuous vocoder, i.e. of which all parameters are continuous. With the future aim of producing high-quality Arabic speech synthesis, we secondly apply this vocoder on a modern standard Arabic audio-visual corpus which is annotated both phonetically and visually, and dedicated to emotional speech processing studies. In an objective experiment, we investigated the Phase Distortion Deviation, whereas a MUSHRA type subjective listening test was conducted comparing natural and vocoded speech samples. As a result, both experiments based on the proposed noise modelling have shown satisfactory results in terms of naturalness and intelligibility, while outperforming STRAIGHT and other earlier residual-based approaches. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页数:15
相关论文
共 6 条
  • [1] A Continuous Vocoder Using Sinusoidal Model for Statistical Parametric Speech Synthesis
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 11 - 20
  • [2] Time-domain envelope modulating the noise component of excitation in a continuous residual-based vocoder for statistical parametric speech synthesis
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 434 - 438
  • [3] Continuous Noise Masking Based Vocoder for Statistical Parametric Speech Synthesis
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (05) : 1099 - 1107
  • [4] Statistical parametric speech synthesis for Arabic language using ANN
    Ilyes, Rebai
    BenAyed, Yassine
    2014 1ST INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP 2014), 2014, : 452 - 457
  • [5] Building a Synchronous Corpus of Acoustic and 3D Facial Marker Data for Adaptive Audio-visual Speech Synthesis
    Schabus, Dietmar
    Pucher, Michael
    Hofer, Gregor
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3313 - 3316
  • [6] Audio-visual speech synthesis from French text:: Eight years of models, designs and evaluation at the ICP
    Benoît, C
    Le Goff, B
    SPEECH COMMUNICATION, 1998, 26 (1-2) : 117 - 129