Measurement of signal-to-noise ratio in dysphonic voices by image processing of spectrograms

被引:3
|
作者
Vieira, Maurilio N. [1 ]
Sansao, Joao Pedro H. [2 ,3 ]
Yehia, Hani C. [1 ]
机构
[1] Univ Fed Minas Gerais, Dept Elect Engn, BR-31270010 Belo Horizonte, MG, Brazil
[2] Univ Fed Minas Gerais, Programa Posgrad Engn Eletr, BR-31270901 Belo Horizonte, MG, Brazil
[3] Univ Fed Sao Joao del Rei, Dept Engn Telecomunicacoes & Mecatron, BR-36420000 Ouro Branco, MG, Brazil
关键词
Signal-to-noise ratio; Breathiness; Dysphonic voice; 2D speech processing; PATHOLOGICAL VOICE; ADDITIVE NOISE; VOCAL QUALITY; PERTURBATION; SPEECH; COMPUTATION; PREDICTION; FREQUENCY; JITTER; INDEX;
D O I
10.1016/j.specom.2014.04.001
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The measurement of glottal noise was investigated in human and synthesized dysphonic voices by means of two-dimensional (2D) speech processing. A prime objective was the reduction of measurement sensitivities to fundamental frequency (f(o)) tracking errors and phonatory aperiodicities. An available fingerprint image enhancement algorithm was used for signal-to-noise measurement in narrow band spectrographic images. This spectrographic signal-to-noise ratio estimation method ((SNR)-N-2) creates binary masks, mainly based on the orientation field of the partials, to separate energy in regions with strong harmonics from energy in noisy areas. Synthesized vowels with additive noise were used to calibrate the algorithm, validate the calibration, and systematically evaluate its dependence on f(o), shimmer (cycle-to-cycle amplitude perturbation), and jitter (cycle-to-cycle fo perturbation). In synthesized voices with known signal-to-noise ratios in the 5-40 dB range, (SNR)-N-2 estimates were, on average, accurate within +/- 3.2 dB and robust to variations in f(o) (120 Hz or 220 Hz), jitter (0-3%), and shimmer (0-30%). In human /a/ produced by dysphonic speakers, (SNR)-N-2 values and perceptual ratings of breathiness revealed a non-linear but monotonic decay of (SNR)-N-2 with increased breathiness. Comparison between (SNR)-N-2 and related acoustic measurements indicated similar behaviors regarding the relationship with breathiness and immunity to shimmer, but the other methods had marked influence of jitter. Overall, the (SNR)-N-2 method did not rely on accurate fo estimation, was robust to vocal perturbations and largely independent of vowel type, having also potential application in running speech. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:17 / 32
页数:16
相关论文
共 50 条