Physical task stress and speaker variability in voice quality

被引:20
作者
Godin, Keith W. [1 ]
Hansen, John H. L. [1 ,2 ]
机构
[1] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, CRSS, 800 W Campbell Rd, Richardson, TX 75080 USA
[2] Univ Texas Dallas, Dept Elect Engn, Erik Jonsson Sch Engn & Comp Sci, CRSS, Richardson, TX 75080 USA
关键词
Physical task stress; Glottal waveform analysis; Speech variability; Speaker variability; WAVE-FORM; TALK TEST; SPEECH; CLASSIFICATION; RECOGNITION; INTENSITY; EXERCISE; DYSPNEA; FLOW;
D O I
10.1186/s13636-015-0072-7
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The presence of physical task stress induces changes in the speech production system which in turn produces changes in speaking behavior. This results in measurable acoustic correlates including changes to formant center frequencies, breath pause placement, and fundamental frequency. Many of these changes are due to the subject's internal competition between speaking and breathing during the performance of the physical task, which has a corresponding impact on muscle control and airflow within the glottal excitation structure as well as vocal tract articulatory structure. This study considers the effect of physical task stress on voice quality. Three signal processing-based values which include (i) the normalized amplitude quotient (NAQ), (ii) the harmonic richness factor (HRF), and (iii) the fundamental frequency are used to measure voice quality. The effects of physical stress on voice quality depend on the speaker as well as the specific task. While some speakers do not exhibit changes in voice quality, a subset exhibits changes in NAQ and HRF measures of similar magnitude to those observed in studies of soft, loud, and pressed speech. For those speakers demonstrating voice quality changes, the observed changes tend toward breathy or soft voicing as observed in other studies. The effect of physical stress on the fundamental frequency is correlated with the effect of physical stress on the HRF (r = -0.34) and the NAQ (r = -0.53). Also, the inter-speaker variation in baseline NAQ is significantly higher than the variation in NAQ induced by physical task stress. The results illustrate systematic changes in speech production under physical task stress, which in theory will impact subsequent speech technology such as speech recognition, speaker recognition, and voice diarization systems.
引用
收藏
页数:13
相关论文
共 70 条
[1]   Normalized amplitude quotient for parametrization of the glottal flow [J].
Alku, P ;
Bäckström, T ;
Vilkman, E .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 112 (02) :701-710
[2]  
Alku P., 1994, Folia phoniatrica et logopaedica, V48, P250
[3]  
[Anonymous], BR J SPORTS MED
[4]  
[Anonymous], ISCA INTERSPEECH 201
[5]  
[Anonymous], THESIS
[6]  
[Anonymous], EVALUATION ACOUSTIC
[7]  
[Anonymous], IEEE ICASSP 2008 INT
[8]  
[Anonymous], ISCA INTERSPEECH 201
[9]  
[Anonymous], 2011, INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association
[10]  
[Anonymous], 1999, AC232ISTTG01 NATO