Signal processing: New stochastic feature of unvoiced pronunciation for whisper speech modeling and synthesis

被引：0

作者：

Zhuang X.D. ^{[1
,3
]}

Zhu H. ^{[2
]}

Mastorakis N.E. ^{[3
]}

机构：

[1] Electronic Information College, Qingdao University

[2] Qingdao University of Science and Technology

[3] Technical University of Sofia, Industrial Engineering Department, Kliment Ohridski 8, Sofia

来源：

International Journal of Circuits, Systems and Signal Processing | 2020年 / 14卷

关键词：

Short-time spectrum; Signal Processing; Speech synthesis; Standard deviation coefficient; Unvoiced pronunciation; Whisper;

D O I：

10.46300/9106.2020.14.144

中图分类号：

学科分类号：

摘要：

Whisper is an indispensable way in speech communication, especially for private conversation or human-machine interaction in public places such as library and hospital. Whisper is unvoiced pronunciation, and voiceless sound is usually considered as noise-like signals. However, unvoiced sound has unique acoustic features and can carry enough information for effective communication. Although it is a significant form of communication, currently there is much less research work on whisper signal than common speech and voiced pronunciation. Our work extends the research of unvoiced pronunciation signal by introducing a novel signal feature, which is further applied in unvoiced signal modeling and whisper sound synthesis. The statistics of amplitude for each frequency component is studied individually, based on which a new feature of “consistent standard deviation coefficient” is revealed for the amplitude spectrum of unvoiced pronunciation. A synthesis method for unvoiced pronunciation is proposed based on the new feature, which is implemented by STFT with artificially generated short-time spectrum with random amplitude and phase. The synthesis results have identical quality of auditory perception as the original pronunciation, and have similar autocorrelation as that of the original signal, which proves the effectiveness of the proposed stochastic model of short-time spectrum for unvoiced pronunciation. © 2020, North Atlantic University Union NAUN. All rights reserved.

引用

页码：1162 / 1175

页数：13

共 50 条

[41]

Sinder D. J., Krane M. H., Flanagan J. L., Synthesis of fricative sounds using an aeroacoustic noise generation model, Proceedings of 16th International Congress Acoustics, 1, pp. 249-250, (1998)

[42]

Mittal R., Erath B. D., Plesniak M. W., Fluid dynamics of human phonation and speech, Annual Review of Fluid Mechanics, 45, pp. 437-467, (2013)

[43]

Lu X. B., Thorpe C. W., Cater J. E., Hunter P. J., Aeroacoustic modeling of frictives /s/ and /sh, Proceedings of the 18th international congress on sound & vibration, pp. 373-380, (2011)

[44]

Sinder D., Richard G., Duncan H., Lin Q., Flanagan J., A fluid flow approach to speech generation, First ETRW on Speech Production Modelling, pp. 203-206, (1996)

[45]

Hirschberg A., Some fluid dynamic aspects of speech, Bulletin de la Communication Parlee, 2, pp. 7-30, (1992)

[46]

McGowan R. S., An aeroacoustics approach to phonation: some experimental and theoretical observations, pp. 107-116, (1987)

[47]

Weibull W., A statistical distribution function of wide applicability, Journal of Applied Mechanics, 18, pp. 293-297, (1951)

[48]

Lindquist E. S., Strength of materials and the Weibull distribution, Probabilistic Engineering Mechanics, 9, 3, pp. 191-194, (1994)

[49]

Khaledi B.-E., Kochar S., Weibull distribution: Some stochastic comparisons results, Journal of Statistical Planning and Inference, 136, 9, pp. 3121-3129, (2006)

[50]

Szymkowiak M., Iwinska M., Characterizations of Discrete Weibull related distributions, Statistics & Probability Letters, 111, pp. 41-48, (2016)

← 1 2 3 4 5 →