Detrending the Waveforms of Steady-State Vowels

被引：0

作者：

Van Soom, Marnix ^{[1
]}

de Boer, Bart ^{[1
]}

机构：

[1] Vrije Univ Brussel, Artificial Intelligence Lab, Pl Laan 2, B-1050 Brussels, Belgium

来源：

ENTROPY | 2020年 / 22卷 / 03期

关键词：

formant; steady-state; vowel; detrending; acoustic phonetics; source-filter theory; probability theory; uncertainty quantification; model averaging; nested sampling; FREQUENCIES;

D O I：

10.3390/e22030331

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

Steady-state vowels are vowels that are uttered with a momentarily fixed vocal tract configuration and with steady vibration of the vocal folds. In this steady-state, the vowel waveform appears as a quasi-periodic string of elementary units called pitch periods. Humans perceive this quasi-periodic regularity as a definite pitch. Likewise, so-called pitch-synchronous methods exploit this regularity by using the duration of the pitch periods as a natural time scale for their analysis. In this work, we present a simple pitch-synchronous method using a Bayesian approach for estimating formants that slightly generalizes the basic approach of modeling the pitch periods as a superposition of decaying sinusoids, one for each vowel formant, by explicitly taking into account the additional low-frequency content in the waveform which arises not from formants but rather from the glottal pulse. We model this low-frequency content in the time domain as a polynomial trend function that is added to the decaying sinusoids. The problem then reduces to a rather familiar one in macroeconomics: estimate the cycles (our decaying sinusoids) independently from the trend (our polynomial trend function); in other words, detrend the waveform of steady-state waveforms. We show how to do this efficiently.

引用

页数：21

共 53 条

[1]

[Anonymous], 2006, Data_analysis:_a_Bayesian_tutorial

[2]

[Anonymous], 1985, Speech Transmission Laboratory Quarterly Progress Scientific Report

[3]

[Anonymous], SPEECH TRANSMISSION

[4]

Boersma Paul., 2001, GLOT INT, V5, P341, DOI DOI 10.1097/AUD.0B013E31821473F7

[5]

Bonastre J.F., FORENSIC SPEAKER REC

[6]

Bretthorst G. L., 1988, Bayesian spectrum Analysis and parameter estimation

[7]

Chen C. J, 2016, ELEMENTS HUMAN VOICE, DOI [10.1142/9891, DOI 10.1142/9891]

[8]

De Witte W., 2017, THESIS

[9] ON THE TIME DOMAIN PROPERTIES OF THE 2-POLE MODEL OF THE GLOTTAL WAVEFORM AND IMPLICATIONS FOR LPC [J].

DELLER, JR .

SPEECH COMMUNICATION, 1983, 2 (01) :57-63

[10] Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review [J].

Drugman, Thomas ;

Thomas, Mark ;

Gudnason, Jon ;

Naylor, Patrick ;

Dutoit, Thierry .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03) :994-1006

← 1 2 3 4 5 6 →