Directional dependency of cepstrum on vocal tract length

被引:2
作者
Saito, Daisuke [1 ]
Matsuura, Ryo [1 ]
Asakawa, Satoshi [1 ]
Minematsu, Nobuaki [1 ]
Hirose, Keikichi [2 ]
机构
[1] Univ Tokyo, Grad Sch Frontier Sci, Tokyo 1138654, Japan
[2] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo 1138654, Japan
来源
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 | 2008年
关键词
frequency warping; cepstrum; rotation; matrix; vocal tract length;
D O I
10.1109/ICASSP.2008.4518652
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we prove that the direction of cepstrum vectors strongly depends on vocal tract length and that this dependency is represented as rotation in the n dimensional cepstrum space. In speech recognition studies, vocal tract length normalization (VTLN) techniques are widely used to cancel age- and gender-differences. In VTLN, a frequency warping is often carried out and it can be implemented as a linear transformation in a cepstrum space; (c) over cap = Ac. However, the geometric properties of this transformation matrix A have not been well discussed. In this study, its properties are made clear using n dimensional geometry and it is shown that the matrix rotates any cepstrum vector similarly and apparently. Experimental results using resynthesized speech demonstrate that cepstrum vectors extracted from a speaker of 180 [cm] in height and those from another speaker of 120 [cm] in height are reasonably orthogonal. This result makes clear one of the reasons why children's speech is very difficult for conventional speech recognizers to deal with adequately.
引用
收藏
页码:4485 / +
页数:2
相关论文
共 31 条
[21]   Voice Discrimination by Adults with Cochlear Implants: the Benefits of Early Implantation for Vocal-Tract Length Perception [J].
Yael Zaltz ;
Raymond L. Goldsworthy ;
Liat Kishon-Rabin ;
Laurie S. Eisenberg .
Journal of the Association for Research in Otolaryngology, 2018, 19 :193-209
[22]   FINITE LENGTH CEPSTRUM MODELING - A SIMPLE SPECTRUM ESTIMATION TECHNIQUE [J].
NADEU, C .
SIGNAL PROCESSING, 1992, 26 (01) :49-59
[23]   Unsupervised Vocal-tract Length Estimation Through Model-based Acoustic-to-Articulatory Inversion [J].
Cai, Shanqing ;
Bunnell, H. Timothy ;
Patel, Rupal .
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, :1711-1715
[24]   The Power of AI-Generated Voices: How Digital Vocal Tract Length Shapes Product Congruency and Ad Performance [J].
Efthymiou, Fotis ;
Hildebrand, Christian ;
de Bellis, Emanuel ;
Hampton, William H. .
JOURNAL OF INTERACTIVE MARKETING, 2024, 59 (02) :117-134
[25]   Determining the length and cross-sectional area of the vocal tract jointly from formants using acoustic sensitivity function [J].
Kaburagi, Tokihiko .
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2014, 35 (06) :290-299
[26]   Correlation between vocal tract length, body height, formant frequencies, and pitch frequency for the five Japanese vowels uttered by fifteen male speakers [J].
Hatano, Hiroaki ;
Kitamura, Tatsuya ;
Takemoto, Hironori ;
Mokhtari, Parham ;
Honda, Kiyoshi ;
Masaki, Shinobu .
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, :402-405
[27]   ESTIMATION OF VOCAL TRACT PARAMETERS FOR THE CLASSIFICATION OF SPEECH UNDER STRESS [J].
Yao, Xiao ;
Jitsuhiro, Takatoshi ;
Miyajima, Chiyomi ;
Kitaoka, Norihide ;
Takeda, Kazuya .
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, :7532-7536
[28]   Correlations between Vocal Tract Parameters and Body Heights in Adult Humans [J].
Cao, Honglin ;
Kong, Jiangping .
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, :367-367
[29]   A method for estimating vocal-tract shape from a target speech spectrum [J].
Kaburagi, Tokihiko .
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2015, 36 (05) :428-437
[30]   Reliable methods for estimating relative vocal tract lengths from formant trajectories of common words [J].
Watanabe, Akira ;
Sakata, Tadashi .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04) :1193-1204