Group delay functions and its applications in speech technology

被引：0

作者：

HEMA A MURTHY

B YEGNANARAYANA

机构：

[1] Indian Institute of Technology Madras,Department of Computer Science and Engineering

[2] International Institute of Information Technology,undefined

来源：

Sadhana | 2011年 / 36卷

关键词：

Fourier transform phase; group delay functions; feature extraction from phase; feature switching; mutual information; K-L divergence;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Traditionally, the information in speech signals is represented in terms of features derived from short-time Fourier analysis. In this analysis the features extracted from the magnitude of the Fourier transform (FT) are considered, ignoring the phase component. Although the significance of the FT phase was highlighted in several studies over the recent three decades, the features of the FT phase were not exploited fully due to difficulty in computing the phase and also in processing the phase function. The information in the short-time FT phase function can be extracted by processing the derivative of the FT phase, i.e., the group delay function. In this paper, the properties of the group delay functions are reviewed, highlighting the importance of the FT phase for representing information in the speech signal. Methods to process the group delay function are discussed to capture the characteristics of the vocal-tract system in the form of formants or through a modified group delay function. Applications of group delay functions for speech processing are discussed in some detail. They include segmentation of speech into syllable boundaries, exploiting the additive and high resolution properties of the group delay functions. The effectiveness of segmentation of speech, and the features derived from the modified group delay are demonstrated in applications such as language identification, speech recognition and speaker recognition. The paper thus demonstrates the need to exploit the potential of the group delay functions for development of speech systems.

引用

页码：745 / 782

页数：37

共 44 条

[1]

Alsteris LD(2006)Further intelligibility results from human listening tests using the short-time phase spectrum Speech Commun. 48 727-736

[2]

Paliwal KK(2000)Score normalisation for text-independent speaker verification systems Digital Signal Process. 10 42-54

[3]

Auckentaler R(2007)Chirp group delay analysis of speech signals Speech Commun. 49 159-176

[4]

Carey M(1977)The cepstrum: A guide to processing Proc. IEEE 68 1428-1443

[5]

Lloyd-Thomas H(1980)Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences IEEE Trans. Acoust. Speech, Signal Process 28 357-366

[6]

Bozkurt B(2000)Audio-visual speech modeling for continuous speech recognition IEEE Trans. Multimedia 2 141-151

[7]

Couvreur L(1999)Speaking in short hand - A syllable centric perspective for understanding pronounciation variation Speech Commun. 29 159-176

[8]

Dutoit T(1990)Perceptually linear predictive (plp) analysis of speech J. of the Acoust. Soc. of Am 87 1738-1752

[9]

Childers DG(2004)Automatic segmentation of continuous speech using minimum phase group delay functions Speech Commun. 42 429-446

[10]

Davis S(2003)Training of stream weights for the decoding of speech using parallel feature streams Proc. IEEE Int. Conf. Acoust. Speech Signal Process 1 832-835

← 1 2 3 4 5 →