Signal processing: New stochastic feature of unvoiced pronunciation for whisper speech modeling and synthesis

被引:0
作者
Zhuang X.D. [1 ,3 ]
Zhu H. [2 ]
Mastorakis N.E. [3 ]
机构
[1] Electronic Information College, Qingdao University
[2] Qingdao University of Science and Technology
[3] Technical University of Sofia, Industrial Engineering Department, Kliment Ohridski 8, Sofia
来源
International Journal of Circuits, Systems and Signal Processing | 2020年 / 14卷
关键词
Short-time spectrum; Signal Processing; Speech synthesis; Standard deviation coefficient; Unvoiced pronunciation; Whisper;
D O I
10.46300/9106.2020.14.144
中图分类号
学科分类号
摘要
Whisper is an indispensable way in speech communication, especially for private conversation or human-machine interaction in public places such as library and hospital. Whisper is unvoiced pronunciation, and voiceless sound is usually considered as noise-like signals. However, unvoiced sound has unique acoustic features and can carry enough information for effective communication. Although it is a significant form of communication, currently there is much less research work on whisper signal than common speech and voiced pronunciation. Our work extends the research of unvoiced pronunciation signal by introducing a novel signal feature, which is further applied in unvoiced signal modeling and whisper sound synthesis. The statistics of amplitude for each frequency component is studied individually, based on which a new feature of “consistent standard deviation coefficient” is revealed for the amplitude spectrum of unvoiced pronunciation. A synthesis method for unvoiced pronunciation is proposed based on the new feature, which is implemented by STFT with artificially generated short-time spectrum with random amplitude and phase. The synthesis results have identical quality of auditory perception as the original pronunciation, and have similar autocorrelation as that of the original signal, which proves the effectiveness of the proposed stochastic model of short-time spectrum for unvoiced pronunciation. © 2020, North Atlantic University Union NAUN. All rights reserved.
引用
收藏
页码:1162 / 1175
页数:13
相关论文
共 50 条
[1]  
Cotescu M., Drugman T., Huybrechts G., Lorenzo-Trueba J., Moinet A., Voice Conversion for Whispered Speech Synthesis, IEEE Signal Processing Letters, 27, pp. 186-190, (2020)
[2]  
Naini A. R., Ghosh P. K., Formant-gaps Features for Speaker Verification Using Whispered Speech, 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6231-6235, (2019)
[3]  
Vestman V., Gowda D., Sahidullah Md, Alku P., Kinnunen T., Speaker recognition from whispered speech: A tutorial survey and an application of time-varying linear prediction, Speech Communication, 99, pp. 62-79, (2018)
[4]  
Kelly F., Hansen J. H. L., Detection and Calibration of Whisper for Speaker Recognition, 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 1060-1065, (2018)
[5]  
Grozdic D. T., Jovicic S. T., Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25, 12, pp. 2313-2322, (2017)
[6]  
Konno H., Kudo M., Imai H., Sugimoto M., Whisper to normal speech conversion using pitch estimated from spectrum, Speech Communication, 83, pp. 10-20, (2016)
[7]  
Sharifzadeh H. R., McLoughlin I. V., Russell M. J., A Comprehensive Vowel Space for Whispered Speech, Journal of Voice, 26, 2, pp. e49-e56, (2012)
[8]  
Sundberg J., Scherer R., Hess M., Muller F., Whispering-A Single-Subject Study of Glottal Configuration and Aerodynamics, Journal of Voice, 24, 5, pp. 574-584, (2010)
[9]  
Jovicic S. T., Saric Z., Acoustic Analysis of Consonants in Whispered Speech, Journal of Voice, 22, 3, pp. 263-274, (2008)
[10]  
Parmar M., Doshi S., Shah N. J., Patel M., Patil H. A., Effectiveness of Cross-Domain Architectures for Whisper-to-Normal Speech Conversion, 27th European Signal Processing Conference (EUSIPCO), pp. 1-5, (2019)