Automatic voice onset time estimation from reassignment spectra

被引:22
作者
Stouten, Veronique [1 ]
Van Hamme, Hugo [1 ]
机构
[1] Katholieke Univ Leuven, ESAT Dept, B-3001 Louvain, Belgium
关键词
Voice Onset Time; Speech attributes; Estimation; Reassignment spectrum; Lattice rescoring; FREQUENCY; PLOSIVES;
D O I
10.1016/j.specom.2009.06.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We describe an algorithm to automatically estimate the voice onset time (VOT) of plosives. The VOT is the time delay between the burst onset and the start of periodicity when it is followed by a voiced sound. Since the VOT is affected by factors like place of articulation and voicing it can be used for inference of these factors. The algorithm uses the reassignment spectrum of the speech signal, a high resolution time-frequency representation which simplifies the detection of the acoustic events in a plosive. The performance of our algorithm is evaluated on a subset of the TIMIT database by comparison with manual VOT measurements. On average, the difference is smaller than 10 ms for 76.1% and smaller than 20 ms for 91.4% of the plosive segments. We also provide analysis statistics of the VOT of /b/, /d/, /g/, /p/, /t/ and /k/ and experimentally verify some sources of variability. Finally, to illustrate possible applications, we integrate the automatic VOT estimates as an additional feature in an HMM-based speech recognition system and show a small but statistically significant improvement in phone recognition rate. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:1194 / 1205
页数:12
相关论文
共 24 条
  • [1] [Anonymous], CUEDFINFENGTR459
  • [2] IMPROVING THE READABILITY OF TIME-FREQUENCY AND TIME-SCALE REPRESENTATIONS BY THE REASSIGNMENT METHOD
    AUGER, F
    FLANDRIN, P
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1995, 43 (05) : 1068 - 1089
  • [3] Beyerlein P, 1998, INT CONF ACOUST SPEE, P481, DOI 10.1109/ICASSP.1998.674472
  • [4] Graphical model architectures for speech recognition
    Bilmes, JA
    Bartels, C
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2005, 22 (05) : 89 - 100
  • [5] Borden G.J., 1984, Speech science primer: Physiology, acoustics and perception of speech
  • [6] DEMUYNCK K, 2006, P INT C SPOK LANG PR, P1622
  • [7] DEMUYNCK K, 2001, THESIS K U LEUVEN
  • [8] GAROFOLO J, 1990, SPEECH DISC 1 1 1
  • [9] On the perception of voicing in syllable-initial plosives in noise
    Jiang, JT
    Chen, M
    Alwan, A
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 119 (02) : 1092 - 1105
  • [10] KAZEMZADEH A, 2006, P ICSLP PITTSB PA US