Tracking formants in spectrograms and its application in speaker verification

被引:0
作者
Leu, Jia-Guu [1 ]
Geeng, Liang-tsair [2 ]
Pu, Chang En [2 ]
Shiau, Jyh-Bin [2 ]
机构
[1] Natl Taipei Univ, Dept Comp Sci, 151 Univ Rd, New Taipei City, Taiwan
[2] Minist Justice, Invest Bur, Dept Forens Sci, 151 Univ Rd, New Taipei City, Taiwan
来源
46TH ANNUAL 2012 IEEE INTERNATIONAL CARNAHAN CONFERENCE ON SECURITY TECHNOLOGY | 2012年
关键词
spectrogram; formant; tracking; speaker verification; text sensitive;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Formants are the most visible features in spectrograms and they also hold the most valuable speech information. Traditionally, formant tracks are found by first finding formant points in individual frames, then the formants points in neighboring frames are joined together to form tracks. In this paper we present a formant tracking approach based on image processing techniques. Our approach is to first find the running directions of the formants in a spectrogram. Then we perform smoothing on the spectrogram along the directions of the formants to produce formants that are more continuous and stable. Then we perform ridge detection to find formant track candidates in the spectrogram. After removing tracks that are too short or too weak, we fit the remaining tracks with 2nd degree polynomial curves to extract formants that are both smooth and continuous. Besides extracting thin formant tracks, we also extracted formant tracks with width. These thick formants are able to indication not only the locations of the formants but also the width of the formants. Using the voices of 70 people, we conducted experiments to test the effectiveness of the thin formants and the thick formants when they are used in speaker verification. Using only one sentence (6 to 10 words, 3 seconds in length) for comparison, the thin formants and the thick formants are able to achieve 88.3% and 93.8% of accuracy in speaker verification, respectively. When the number of sentences for comparison increased to seven, the accuracy rate improved to 93.8% and 98.7%, respectively.
引用
收藏
页码:83 / 89
页数:7
相关论文
共 11 条
  • [1] [Anonymous], 2001, Discrete-Time Speech Signal Processing:Principles and Practice
  • [2] Speaker recognition: A tutorial
    Campbell, JP
    [J]. PROCEEDINGS OF THE IEEE, 1997, 85 (09) : 1437 - 1462
  • [3] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES
    DAVIS, SB
    MERMELSTEIN, P
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04): : 357 - 366
  • [4] Furui S., 1989, Digital speech processing, synthesis, and recognition
  • [5] A Robust Algorithm for Word Boundary Detection in the Presence of Noise
    Junqua, Jean-Claude
    Mak, Brian
    Reaves, Ben
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (03): : 406 - 412
  • [6] Leu J.G., 2011, P 45 INT CARN C SEC, P31
  • [7] McAndrew A., 2011, INTRO DIGITAL IMAGE, P236
  • [8] AN INTRODUCTION TO SPEECH AND SPEAKER RECOGNITION
    PEACOCKE, RD
    GRAF, DH
    [J]. COMPUTER, 1990, 23 (08) : 26 - 33
  • [9] Shen, 1998, 5 INT C SPOK LANG PR, V98
  • [10] Snell Roy, 1993, IEEE T SPEECH AUDIO, V1