Improved chirp group delay based algorithms with applications to vocal tract estimation and speech recognition

被引:0
|
作者
Jayesh, M. K. [1 ]
Ramalingam, C. S. [1 ]
机构
[1] IIT, Dept Elect Engn, Madras 600036, Tamil Nadu, India
关键词
Vocal tract estimation; Phase processing; Group delay; ASR; MODEL ESTIMATION; REPRESENTATION; SHAPES; WAVE;
D O I
10.1016/j.specom.2016.02.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we propose two algorithms for estimating the vocal tract from the Fourier transform phase of a given speech segment. In the first approach, we find the zeros of the z-transform, reflect all outside-unit-circle zeros inside, and then compute the chirp group delay spectrum. This method eliminates many of the drawbacks in Bozkurt's CGDGCI method, and is able to model well the spectral valleys present. In the case of high pitch sounds, the vocal tract estimate in the proposed method is corrupted by source oscillations. In the second approach, by casting the problem within the framework of Independent Component Analysis, we propose a method wherein these effects are considerably suppressed. ASR results on the TIMIT database using features derived from the first method are comparable to those obtained using MFCC features. Further improvement in the recognition accuracy (compared with the baseline MFCC) was obtained by using lattice combining technique, resulting in a Phone Error Rate of 17%. Also, by using our abilities to model spectral valleys well, we propose additional features that are able to distinguish the nasals /m/, /n/, and /ng/, which in turn lead to an increase in their recognition accuracy. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:72 / 89
页数:18
相关论文
共 50 条
  • [1] AN IMPROVED CHIRP GROUP DELAY BASED ALGORITHM FOR ESTIMATING THE VOCAL TRACT RESPONSE
    Jayesh, M. K.
    Ramalingam, C. S.
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 2295 - 2299
  • [2] Chirp group delay analysis of speech signals
    Bozkurt, Baris
    Couvreur, Laurent
    Dutoit, Thierry
    SPEECH COMMUNICATION, 2007, 49 (03) : 159 - 176
  • [3] Vocal Tract Representation in the Recognition of Cerebral Palsied Speech
    Rudzicz, Frank
    Hirst, Graeme
    van Lieshout, Pascal
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2012, 55 (04): : 1190 - 1207
  • [4] SPEECH BANDWIDTH EXTENSION BASED ON SPEECH PHONETIC CONTENT AND SPEAKER VOCAL TRACT SHAPE ESTIMATION
    Katsir, Itai
    Cohen, Israel
    Malah, David
    19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 461 - 465
  • [5] Spectral analysis of speech signals using chirp group delay
    Bozkurt, Baris
    Dutoit, Thierry
    Couvreur, Laurent
    PROGRESS IN NONLINEAR SPEECH PROCESSING, 2007, 4391 : 41 - +
  • [6] Vocal tract length normalization using rapid maximum-likelihood estimation for speech recognition
    Emori, Tadashi
    Shinoda, Koichi
    Systems and Computers in Japan, 2002, 33 (05): : 30 - 40
  • [7] Improved Emotional Speech Recognition Algorithms
    Rajeswari, A.
    Sowmbika, P.
    Kalaimagal, P.
    Ramya, M.
    Ranjitha, M.
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 2362 - 2366
  • [8] Group Delay based Methods for Detection and Recognition of Whispered Speech
    Vedvyasan, Kishore
    Nathwani, Karan
    Hegde, Rajesh M.
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 499 - 505
  • [9] Group Delay Based Methods for Recognition of Distant talking Speech
    Mandala, Rohan
    Shukla, Mrityunjaya
    Hegde, Rajesh
    2010 CONFERENCE RECORD OF THE FORTY FOURTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2010, : 1702 - 1706
  • [10] Projection-based group delay scheme for speech recognition
    Tung, SL
    Lei, IS
    Juang, YT
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (02): : 138 - 140