AUTOMATIC PHONETIC SEGMENTATION IN MANDARIN CHINESE: BOUNDARY MODELS, GLOTTAL FEATURES AND TONE

被引：0

作者：

Yuan, Jiahong ^{[1
]}

Ryant, Neville ^{[1
]}

Liberman, Mark ^{[1
]}

机构：

[1] Univ Penn, Philadelphia, PA 19104 USA

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

Forced alignment; boundary model; glottal features; tone; Mandarin Chinese; PHONATION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We conducted experiments on forced alignment in Mandarin Chinese. A corpus of 7,849 utterances was created for the purpose of the study. Systems differing in their use of explicit phone boundary models, glottal features, and tone information were trained and evaluated on the corpus. Results showed that employing special one-state phone boundary HMM models significantly improved forced alignment accuracy, even when no manual phonetic segmentation was available for training. Spectral features extracted from glottal waveforms (by performing glottal inverse filtering from the speech waveforms) also improved forced alignment accuracy. Tone dependent models only slightly outperformed tone independent models. The best system achieved 93.1% agreement (of phone boundaries) within 20 ms compared to manual segmentation without boundary correction.

引用

页数：5

共 35 条

[1] GLOTTAL WAVE ANALYSIS WITH PITCH SYNCHRONOUS ITERATIVE ADAPTIVE INVERSE FILTERING
ALKU, P
[J]. SPEECH COMMUNICATION, 1992, 11 (2-3) : 109 - 118
[2] [Anonymous], P ICSLP
[3] AUTOMATIC SEGMENTATION AND LABELING OF SPEECH-BASED ON HIDDEN MARKOV-MODELS
BRUGNARA, F
FALAVIGNA, D
OMOLOGO, M
[J]. SPEECH COMMUNICATION, 1993, 12 (04) : 357 - 370
[4] CHAO H, 2012, P ICASSP 2012, P4741
[5] Chen C.J., 1997, P EUR, P1543
[6] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES
DAVIS, SB
MERMELSTEIN, P
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04): : 357 - 366
[7] Duanmu S, 2000, PHONOLOGY STANDARD C, P35
[8] PERCEPTUAL LINEAR PREDICTIVE (PLP) ANALYSIS OF SPEECH
HERMANSKY, H
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) : 1738 - 1752
[9] Hosom J.-P., 2000, THESIS OREGON GRADUA
[10] Speaker-independent phoneme alignment using transition-dependent states
Hosom, John-Paul
[J]. SPEECH COMMUNICATION, 2009, 51 (04) : 352 - 368

← 1 2 3 4 →