Technical foundations of TANDEM-STRAIGHT, a speech analysis, modification and synthesis framework

被引:94
作者
Kawahara, Hideki [1 ]
Morise, Masanori [2 ]
机构
[1] Wakayama Univ, Fac Syst Engn, Wakayama 6408510, Japan
[2] Ritsumeikan Univ, Coll Informat Sci & Engn, Kusatsu 5258577, Japan
来源
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES | 2011年 / 36卷 / 05期
基金
日本科学技术振兴机构; 日本学术振兴会;
关键词
Speech analysis; fundamental frequency; speech synthesis; consistent sampling; periodic signals; WINDOWS;
D O I
10.1007/s12046-011-0043-3
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
This article presents comprehensive technical information about STRAIGHT and TANDEM-STRAIGHT, a widely used speech modification tool and its successor. They share the same concept: the periodic excitation found in voiced sounds is an efficient mechanism for transmitting underlying smooth time-frequency representation. The tools are also based on the perceptual equivalence of two sets of independent Gaussian random signals. This equivalence makes it possible to discard input phase information intentionally and enables flexible manipulation of parameters.
引用
收藏
页码:713 / 727
页数:15
相关论文
共 10 条
[1]  
HARRIS FJ, 1978, P IEEE, V66, P51, DOI 10.1109/PROC.1978.10837
[2]   Tandem-straight: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation [J].
Kawahara, H. ;
Morise, M. ;
Takahashi, T. ;
Nisimura, R. ;
Irino, T. ;
Banno, H. .
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :3933-+
[3]   Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:: Possible role of a repetitive structure in sounds [J].
Kawahara, H ;
Masuda-Katsuse, I ;
de Cheveigné, A .
SPEECH COMMUNICATION, 1999, 27 (3-4) :187-207
[4]  
KAWAHARA H, 1999, P EUROSPEECH 99, V6, P2781
[5]  
KAWAHARA H, 2005, P INT, P537
[6]   STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds [J].
Kawahara, Hideki .
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2006, 27 (06) :349-353
[7]  
Morise M., 2007, Trans. IEICE, V90, P3265
[8]   SOME WINDOWS WITH VERY GOOD SIDELOBE BEHAVIOR [J].
NUTTALL, AH .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1981, 29 (01) :84-91
[9]   Sampling - 50 years after Shannon [J].
Unser, M .
PROCEEDINGS OF THE IEEE, 2000, 88 (04) :569-587
[10]   USE OF FAST FOURIER TRANSFORM FOR ESTIMATION OF POWER SPECTRA - A METHOD BASED ON TIME AVERAGING OVER SHORT MODIFIED PERIODOGRAMS [J].
WELCH, PD .
IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1967, AU15 (02) :70-+