Tandem-straight: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation

被引:242
作者
Kawahara, H. [1 ]
Morise, M. [1 ]
Takahashi, T. [1 ]
Nisimura, R. [1 ]
Irino, T. [1 ]
Banno, H. [2 ]
机构
[1] Wakayama Univ, Fac Syst Engn, 930 Sakaedani, Wakayama 6408510, Japan
[2] Meijo Univ, Nagoya, Aichi 4688502, Japan
来源
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 | 2008年
关键词
periodic signal; power spectrum; consistent sampling; periodicity; speech processing;
D O I
10.1109/ICASSP.2008.4518514
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A simple new method for estimating temporally stable power spectra is introduced to provide a unified basis for computing an interference-free spectrum, the fundamental frequency (F0), as well as aperiodicity estimation. F0 adaptive spectral smoothing and cepstral liftering based on consistent sampling theory are employed for interference-free spectral estimation. A perturbation spectrum, calculated from temporally stable power and interference-free spectra, provides the basis for both F0 and aperiodicity estimation. The proposed approach eliminates ad-hoc parameter tuning and the heavy demand on computational power, from which STRAIGHT has suffered in the past.
引用
收藏
页码:3933 / +
页数:2
相关论文
共 11 条
[1]  
ABE T, 1997, P ASVA 97, P423
[2]   Synthesis fidelity and time-varying spectral change in vowels [J].
Assmann, PF ;
Katz, WF .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 117 (02) :886-895
[3]   Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:: Possible role of a repetitive structure in sounds [J].
Kawahara, H ;
Masuda-Katsuse, I ;
de Cheveigné, A .
SPEECH COMMUNICATION, 1999, 27 (3-4) :187-207
[4]  
KAWAHARA H, 1999, P EUROSPEECH 99, V6, P2781
[5]   STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds [J].
Kawahara, Hideki .
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2006, 27 (06) :349-353
[6]   Vowel formant discrimination for high-fidelity speech [J].
Liu, C ;
Kewley-Port, D .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2004, 116 (02) :1224-1233
[7]  
Morise M., 2007, T IEICE D, V90, P3265
[8]   Development of an F0 control model based on F0 dynamic characteristics for singing-voice synthesis [J].
Saitou, T ;
Unoki, M ;
Akagi, M .
SPEECH COMMUNICATION, 2005, 46 (3-4) :405-417
[9]   A GENERAL SAMPLING THEORY FOR NONIDEAL ACQUISITION DEVICES [J].
UNSER, M ;
ALDROUBI, A .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1994, 42 (11) :2915-2925
[10]   Sampling - 50 years after Shannon [J].
Unser, M .
PROCEEDINGS OF THE IEEE, 2000, 88 (04) :569-587