Directional dependency of cepstrum on vocal tract length

被引：2

作者：

Saito, Daisuke ^{[1
]}

Matsuura, Ryo ^{[1
]}

Asakawa, Satoshi ^{[1
]}

Minematsu, Nobuaki ^{[1
]}

Hirose, Keikichi ^{[2
]}

机构：

[1] Univ Tokyo, Grad Sch Frontier Sci, Tokyo 1138654, Japan

[2] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo 1138654, Japan

来源：

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 | 2008年

关键词：

frequency warping; cepstrum; rotation; matrix; vocal tract length;

D O I：

10.1109/ICASSP.2008.4518652

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we prove that the direction of cepstrum vectors strongly depends on vocal tract length and that this dependency is represented as rotation in the n dimensional cepstrum space. In speech recognition studies, vocal tract length normalization (VTLN) techniques are widely used to cancel age- and gender-differences. In VTLN, a frequency warping is often carried out and it can be implemented as a linear transformation in a cepstrum space; (c) over cap = Ac. However, the geometric properties of this transformation matrix A have not been well discussed. In this study, its properties are made clear using n dimensional geometry and it is shown that the matrix rotates any cepstrum vector similarly and apparently. Experimental results using resynthesized speech demonstrate that cepstrum vectors extracted from a speaker of 180 [cm] in height and those from another speaker of 120 [cm] in height are reasonably orthogonal. This result makes clear one of the reasons why children's speech is very difficult for conventional speech recognizers to deal with adequately.

引用

页码：4485 / +

页数：2

共 31 条

[21] Voice Discrimination by Adults with Cochlear Implants: the Benefits of Early Implantation for Vocal-Tract Length Perception [J].

Yael Zaltz ;

Raymond L. Goldsworthy ;

Liat Kishon-Rabin ;

Laurie S. Eisenberg .

Journal of the Association for Research in Otolaryngology, 2018, 19 :193-209

[22] FINITE LENGTH CEPSTRUM MODELING - A SIMPLE SPECTRUM ESTIMATION TECHNIQUE [J].

NADEU, C .

SIGNAL PROCESSING, 1992, 26 (01) :49-59

[23] Unsupervised Vocal-tract Length Estimation Through Model-based Acoustic-to-Articulatory Inversion [J].

Cai, Shanqing ;

Bunnell, H. Timothy ;

Patel, Rupal .

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, :1711-1715

[24] The Power of AI-Generated Voices: How Digital Vocal Tract Length Shapes Product Congruency and Ad Performance [J].

Efthymiou, Fotis ;

Hildebrand, Christian ;

de Bellis, Emanuel ;

Hampton, William H. .

JOURNAL OF INTERACTIVE MARKETING, 2024, 59 (02) :117-134

[25] Determining the length and cross-sectional area of the vocal tract jointly from formants using acoustic sensitivity function [J].

Kaburagi, Tokihiko .

ACOUSTICAL SCIENCE AND TECHNOLOGY, 2014, 35 (06) :290-299

[26] Correlation between vocal tract length, body height, formant frequencies, and pitch frequency for the five Japanese vowels uttered by fifteen male speakers [J].

Hatano, Hiroaki ;

Kitamura, Tatsuya ;

Takemoto, Hironori ;

Mokhtari, Parham ;

Honda, Kiyoshi ;

Masaki, Shinobu .

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, :402-405

[27] ESTIMATION OF VOCAL TRACT PARAMETERS FOR THE CLASSIFICATION OF SPEECH UNDER STRESS [J].

Yao, Xiao ;

Jitsuhiro, Takatoshi ;

Miyajima, Chiyomi ;

Kitaoka, Norihide ;

Takeda, Kazuya .

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, :7532-7536

[28] Correlations between Vocal Tract Parameters and Body Heights in Adult Humans [J].

Cao, Honglin ;

Kong, Jiangping .

2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, :367-367

[29] A method for estimating vocal-tract shape from a target speech spectrum [J].

Kaburagi, Tokihiko .

ACOUSTICAL SCIENCE AND TECHNOLOGY, 2015, 36 (05) :428-437

[30] Reliable methods for estimating relative vocal tract lengths from formant trajectories of common words [J].

Watanabe, Akira ;

Sakata, Tadashi .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04) :1193-1204

← 1 2 3 4 →