Tandem-straight: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation

被引：242

作者：

Kawahara, H. ^{[1
]}

Morise, M. ^{[1
]}

Takahashi, T. ^{[1
]}

Nisimura, R. ^{[1
]}

Irino, T. ^{[1
]}

Banno, H. ^{[2
]}

机构：

[1] Wakayama Univ, Fac Syst Engn, 930 Sakaedani, Wakayama 6408510, Japan

[2] Meijo Univ, Nagoya, Aichi 4688502, Japan

来源：

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 | 2008年

关键词：

periodic signal; power spectrum; consistent sampling; periodicity; speech processing;

D O I：

10.1109/ICASSP.2008.4518514

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A simple new method for estimating temporally stable power spectra is introduced to provide a unified basis for computing an interference-free spectrum, the fundamental frequency (F0), as well as aperiodicity estimation. F0 adaptive spectral smoothing and cepstral liftering based on consistent sampling theory are employed for interference-free spectral estimation. A perturbation spectrum, calculated from temporally stable power and interference-free spectra, provides the basis for both F0 and aperiodicity estimation. The proposed approach eliminates ad-hoc parameter tuning and the heavy demand on computational power, from which STRAIGHT has suffered in the past.

引用

页码：3933 / +

页数：2

共 11 条

[1]

ABE T, 1997, P ASVA 97, P423

[2] Synthesis fidelity and time-varying spectral change in vowels [J].

Assmann, PF ;

Katz, WF .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 117 (02) :886-895

[3] Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:: Possible role of a repetitive structure in sounds [J].

Kawahara, H ;

Masuda-Katsuse, I ;

de Cheveigné, A .

SPEECH COMMUNICATION, 1999, 27 (3-4) :187-207

[4]

KAWAHARA H, 1999, P EUROSPEECH 99, V6, P2781

[5] STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds [J].

Kawahara, Hideki .

ACOUSTICAL SCIENCE AND TECHNOLOGY, 2006, 27 (06) :349-353

[6] Vowel formant discrimination for high-fidelity speech [J].

Liu, C ;

Kewley-Port, D .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2004, 116 (02) :1224-1233

[7]

Morise M., 2007, T IEICE D, V90, P3265

[8] Development of an F0 control model based on F0 dynamic characteristics for singing-voice synthesis [J].

Saitou, T ;

Unoki, M ;

Akagi, M .

SPEECH COMMUNICATION, 2005, 46 (3-4) :405-417

[9] A GENERAL SAMPLING THEORY FOR NONIDEAL ACQUISITION DEVICES [J].

UNSER, M ;

ALDROUBI, A .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1994, 42 (11) :2915-2925

[10] Sampling - 50 years after Shannon [J].

Unser, M .

PROCEEDINGS OF THE IEEE, 2000, 88 (04) :569-587

← 1 2 →