Detrending the Waveforms of Steady-State Vowels

被引:0
作者
Van Soom, Marnix [1 ]
de Boer, Bart [1 ]
机构
[1] Vrije Univ Brussel, Artificial Intelligence Lab, Pl Laan 2, B-1050 Brussels, Belgium
关键词
formant; steady-state; vowel; detrending; acoustic phonetics; source-filter theory; probability theory; uncertainty quantification; model averaging; nested sampling; FREQUENCIES;
D O I
10.3390/e22030331
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Steady-state vowels are vowels that are uttered with a momentarily fixed vocal tract configuration and with steady vibration of the vocal folds. In this steady-state, the vowel waveform appears as a quasi-periodic string of elementary units called pitch periods. Humans perceive this quasi-periodic regularity as a definite pitch. Likewise, so-called pitch-synchronous methods exploit this regularity by using the duration of the pitch periods as a natural time scale for their analysis. In this work, we present a simple pitch-synchronous method using a Bayesian approach for estimating formants that slightly generalizes the basic approach of modeling the pitch periods as a superposition of decaying sinusoids, one for each vowel formant, by explicitly taking into account the additional low-frequency content in the waveform which arises not from formants but rather from the glottal pulse. We model this low-frequency content in the time domain as a polynomial trend function that is added to the decaying sinusoids. The problem then reduces to a rather familiar one in macroeconomics: estimate the cycles (our decaying sinusoids) independently from the trend (our polynomial trend function); in other words, detrend the waveform of steady-state waveforms. We show how to do this efficiently.
引用
收藏
页数:21
相关论文
共 53 条
[1]  
[Anonymous], 2006, Data_analysis:_a_Bayesian_tutorial
[2]  
[Anonymous], 1985, Speech Transmission Laboratory Quarterly Progress Scientific Report
[3]  
[Anonymous], SPEECH TRANSMISSION
[4]  
Boersma Paul., 2001, GLOT INT, V5, P341, DOI DOI 10.1097/AUD.0B013E31821473F7
[5]  
Bonastre J.F., FORENSIC SPEAKER REC
[6]  
Bretthorst G. L., 1988, Bayesian spectrum Analysis and parameter estimation
[7]  
Chen C. J, 2016, ELEMENTS HUMAN VOICE, DOI [10.1142/9891, DOI 10.1142/9891]
[8]  
De Witte W., 2017, THESIS
[9]   ON THE TIME DOMAIN PROPERTIES OF THE 2-POLE MODEL OF THE GLOTTAL WAVEFORM AND IMPLICATIONS FOR LPC [J].
DELLER, JR .
SPEECH COMMUNICATION, 1983, 2 (01) :57-63
[10]   Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review [J].
Drugman, Thomas ;
Thomas, Mark ;
Gudnason, Jon ;
Naylor, Patrick ;
Dutoit, Thierry .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03) :994-1006