Speech reconstruction from mel frequency cepstral coefficients and pitch frequency

被引：0

作者：

Chazan, D ^{[1
]}

Hoory, R ^{[1
]}

Cohen, G ^{[1
]}

Zibulski, M ^{[1
]}

机构：

[1] IBM Corp, Res Lab, MATAM, IL-31905 Haifa, Israel

来源：

2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI | 2000年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a novel low complexity, frequency domain algorithm for reconstruction of speech from the mel-frequency cepstral coefficients (MFCC), commonly used by speech recognition systems, and the pitch frequency values. The reconstruction technique is based on the sinusoidal speech representation. A set of sine-wave frequencies is derived using the pitch frequency and voicing decisions, and synthetic phases are then assigned to each respective sine wave. The sine-wave amplitudes are generated by sampling a linear combination of frequency domain basis functions. The basis function gains are determined such that the mel-frequency binned spectrum of the reconstructed speech is similar to the mel-frequency binned spectrum, obtained from the original MFCC vector by IDCT and antilog operations. Natural sounding, good quality intelligible speech is obtained by this procedure.

引用

页码：1299 / 1302

页数：4

共 9 条

[1]

Bjorck A., 1996, NUMERICAL METHODS LE, DOI DOI 10.1137/1.9781611971484

[2] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].

DAVIS, SB ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366

[3]

KOISHIDA K, 1995, INT CONF ACOUST SPEE, P33, DOI 10.1109/ICASSP.1995.479266

[4] SPEECH ANALYSIS SYNTHESIS BASED ON A SINUSOIDAL REPRESENTATION [J].

MCAULAY, RJ ;

QUATIERI, TF .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (04) :744-754

[5]

MCAULAY RJ, 1995, SPEECH CODING SYNTHE, P121

[6]

Rabiner L.R., 2010, Digital Processing of Speech Signals

[7]

RAMASWAMY GN, 1998, P ICASSP

[8] Continuous probabilistic transform for voice conversion [J].

Stylianou, Y ;

Cappe, O ;

Moulines, E .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (02) :131-142

[9]

Young S, 1996, IEEE SIGNAL PROC MAG, V13, P45, DOI 10.1109/79.536824

← 1 →