Epoch Extraction From Speech Signals

被引:455
作者
Murty, K. Sri Rama [1 ]
Yegnanarayana, B. [2 ]
机构
[1] Indian Inst Technol, Dept Comp Sci & Engn, Madras 600036, Tamil Nadu, India
[2] Int Inst Informat Technol, Hyderabad 500032, Andhra Pradesh, India
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2008年 / 16卷 / 08期
关键词
Epoch extraction; glottal closure instant; group-delay; Hilbert envelope; instantaneous frequency;
D O I
10.1109/TASL.2008.2004526
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Epoch is the instant of significant excitation of the vocal-tract system during production of speech. For most voiced speech, the most significant excitation takes place around the instant of glottal closure. Extraction of epochs from speech is a challenging task due to time-varying characteristics of the source and the system. Most epoch extraction methods attempt to remove the characteristics of the vocal-tract system, in order to emphasize the excitation characteristics in the residual. The performance of such methods depends critically on our ability to model the system. In this paper, we propose a method for epoch extraction which does not depend critically on characteristics of the time-varying vocaltract system. The method exploits the nature of impulse-like excitation. The proposed zero resonance frequency filter output brings out the epoch locations with high accuracy and reliability. The performance of the method is demonstrated using CMU-Arctic database using the epoch information from the electro-glottograph as. reference. The proposed method performs significantly better than the other methods currently available for epoch extraction. The interesting part of the results is that the epoch extraction by the proposed method seems to be robust against degradations like white noise, babble, high-frequency channel, and vehicle noise.
引用
收藏
页码:1602 / 1613
页数:12
相关论文
共 33 条
[1]   EPOCH EXTRACTION OF VOICED SPEECH [J].
ANANTHAPADMANABHA, TV ;
YEGNANARAYANA, B .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, 23 (06) :562-570
[2]   EPOCH EXTRACTION FROM LINEAR PREDICTION RESIDUAL FOR IDENTIFICATION OF CLOSED GLOTTIS INTERVAL [J].
ANANTHAPADMANABHA, TV ;
YEGNANARAYANA, B .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (04) :309-319
[3]  
[Anonymous], NOISEX 92
[4]  
[Anonymous], 2004, DISCRETE TIME SPEECH
[5]  
[Anonymous], CMU ARCTIC SPEECH SY
[6]  
[Anonymous], 5 ISCA WORKSH SPEECH
[7]   SPEECH ANALYSIS AND SYNTHESIS BY LINEAR PREDICTION OF SPEECH WAVE [J].
ATAL, BS ;
HANAUER, SL .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 50 (02) :637-+
[8]   ESTIMATING AND INTERPRETING THE INSTANTANEOUS FREQUENCY OF A SIGNAL .1. FUNDAMENTALS [J].
BOASHASH, B .
PROCEEDINGS OF THE IEEE, 1992, 80 (04) :520-538
[9]   A quantitative assessment of group delay methods for identifying glottal closures in voiced speech [J].
Brookes, M ;
Naylor, PA ;
Gudnason, J .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02) :456-466
[10]  
Brookes M., 2006, VOICEBOX SPEECH PROC