Speech waveform reconstruction from speech parameters for an effective text to speech synthesis system using minimum phase harmonic sinusoidal model for Punjabi

被引:3
作者
Kaur, Navdeep [1 ,2 ]
Singh, Parminder [3 ]
机构
[1] Govt Polytech Coll Girls, Comp Sci & Engn, Amritsar, Punjab, India
[2] IK Gujral Punjab Tech Univ, Kapurthala, India
[3] Guru Nanak Dev Engn Coll, Comp Sci & Engn, Ludhiana, Punjab, India
关键词
Frame overlapping; Spectral subtraction; Speech parameters extraction; Speech synthesis; Minimum phase harmonic sinusoidal model; GENERATION;
D O I
10.1007/s11042-022-12850-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech processing plays a vital role in current speech communication applications. The major objective of digital speech is transmission of messages among human and computer systems. A Text-to-speech synthesizer is utilized for these transmission of speech. Many significant works are carried out in the previous speech synthesis framework but still having issues in quality speech generation. This work presents an effective text to speech synthesis system using minimum phase harmonic sinusoidal modeling in Punjabi language. To develop the presented system, initially input phoneme speech is converted into an overlapping frames. Then, spectral subtraction technique is utilized to eliminate the noise of overlapped frames for speech enhancement. Subsequently, spectral speech parameters such as Mel frequency cepstral coefficients (MFCC), excitation parmeters such as fundamental frequency, energy and their first order(delta) and second order (delta-delta) time derivatives are extracted. After extracting all the parameters, the speech signal is synthesized from the extracted speech parameters using minimum phase harmonic sinusoidal model. Here, this synthesis model is dependent on the extracted speech parameters and also the amplitudes, phases of sine waves for the production of synthetic speech.The presented speech waveform reconstruction method to speeh synthesis process is implemented in PYTHON platform. The experimental outcomes of the presented methodology proved that the presented work is significantly better in terms of various effective performance measures like execution time (0.125 s), SNR (16.4 dB), RMSE (0.074 dB), BSD (3.21%), accuracy (97.6%).
引用
收藏
页码:26101 / 26120
页数:20
相关论文
共 30 条
[1]  
Agiomyrgiannakis Y, 2015, INT CONF ACOUST SPEE, P4230, DOI 10.1109/ICASSP.2015.7178768
[2]  
[Anonymous], 2015, TENCON 2015 2015 IEE
[3]  
Arik SÖ, 2017, ADV NEUR IN, V30
[4]   Significance of spectral cues in automatic speech segmentation for Indian language speech synthesizers [J].
Baby, Arun ;
Prakash, Jeena J. ;
Subramanian, Aswin Shanmugam ;
Murthy, Hema A. .
SPEECH COMMUNICATION, 2020, 123 :10-25
[5]   Speech recognition in a dialog system: from conventional to deep processing [J].
Becerra, Aldonso ;
Ismael de la Rosa, J. ;
Gonzalez, Efren .
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (12) :15875-15911
[6]  
Choi H., 2018, P 2 INT C MECH SYST, P107
[7]  
Haq M.R., 2019, Islamic Corporate Finance, P1
[8]  
Jia Y, 2018, ADV NEUR IN, V31
[9]  
Juvela L, 2019, INT CONF ACOUST SPEE, P6915, DOI [10.1109/icassp.2019.8683271, 10.1109/ICASSP.2019.8683271]
[10]   A heterogeneous speech feature vectors generation approach with hybrid hmm classifiers [J].
Kadyan V. ;
Mantri A. ;
Aggarwal R.K. .
International Journal of Speech Technology, 2017, 20 (04) :761-769