Speech reconstruction from mel frequency cepstral coefficients and pitch frequency

被引:0
作者
Chazan, D [1 ]
Hoory, R [1 ]
Cohen, G [1 ]
Zibulski, M [1 ]
机构
[1] IBM Corp, Res Lab, MATAM, IL-31905 Haifa, Israel
来源
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI | 2000年
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a novel low complexity, frequency domain algorithm for reconstruction of speech from the mel-frequency cepstral coefficients (MFCC), commonly used by speech recognition systems, and the pitch frequency values. The reconstruction technique is based on the sinusoidal speech representation. A set of sine-wave frequencies is derived using the pitch frequency and voicing decisions, and synthetic phases are then assigned to each respective sine wave. The sine-wave amplitudes are generated by sampling a linear combination of frequency domain basis functions. The basis function gains are determined such that the mel-frequency binned spectrum of the reconstructed speech is similar to the mel-frequency binned spectrum, obtained from the original MFCC vector by IDCT and antilog operations. Natural sounding, good quality intelligible speech is obtained by this procedure.
引用
收藏
页码:1299 / 1302
页数:4
相关论文
共 9 条
[1]  
Bjorck A., 1996, NUMERICAL METHODS LE, DOI DOI 10.1137/1.9781611971484
[2]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[3]  
KOISHIDA K, 1995, INT CONF ACOUST SPEE, P33, DOI 10.1109/ICASSP.1995.479266
[4]   SPEECH ANALYSIS SYNTHESIS BASED ON A SINUSOIDAL REPRESENTATION [J].
MCAULAY, RJ ;
QUATIERI, TF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (04) :744-754
[5]  
MCAULAY RJ, 1995, SPEECH CODING SYNTHE, P121
[6]  
Rabiner L.R., 2010, Digital Processing of Speech Signals
[7]  
RAMASWAMY GN, 1998, P ICASSP
[8]   Continuous probabilistic transform for voice conversion [J].
Stylianou, Y ;
Cappe, O ;
Moulines, E .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (02) :131-142
[9]  
Young S, 1996, IEEE SIGNAL PROC MAG, V13, P45, DOI 10.1109/79.536824