PHONETIC SEGMENTATION USING STATISTICAL CORRECTION AND MULTI-RESOLUTION FUSION

被引：0

作者：

Zhao, Sixuan ^{[1
]}

Soon, Ing Yann ^{[1
]}

Koh, Soo Ngee ^{[1
]}

Luke, Kang Kwong ^{[2
]}

机构：

[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore, Singapore

[2] Nanyang Technol Univ, Sch Humanities & Social Sci, Singapore, Singapore

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2013年

关键词：

phonetic segmentation; statistical correction; state selection; multi-resolution; SELECTION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper focuses on the generation of accurate phonetic segmentations. Statistical methods based on absolute and relative correction are discussed and experimented on both monophone and biphone models to improve the segmentation results. The influence of search range on the statistical correction process is studied and a state selection technique is used to enhance the correction results. This paper also explores the influence of resolution (stepsize) of HMMs and proposes a multi-resolution fusion process to further refine the statistically corrected results. Improvements of segmentation results in terms of segmentation accuracy, mean absolute error (MAE), and root mean square error (RMSE) can be observed by applying the proposed refinement methods.

引用

页码：6694 / 6698

页数：5

共 12 条

[1] Robust Detection of Phone Boundaries Using Model Selection Criteria With Few Observations
Almpanidis, George
Kotti, Margarita
Kotropoulos, Constantine
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (02): : 287 - 298
[2] LIBSVM: A Library for Support Vector Machines
Chang, Chih-Chung
Lin, Chih-Jen
[J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[3] Foreign accent conversion in computer assisted pronunciation training
Felps, Daniel
Bortfeld, Heather
Gutierrez-Osuna, Ricardo
[J]. SPEECH COMMUNICATION, 2009, 51 (10) : 920 - 932
[4] Hunt AJ, 1996, INT CONF ACOUST SPEE, P373, DOI 10.1109/ICASSP.1996.541110
[5] A large margin algorithm for speech-to-phoneme and music-to-score alignment
Keshet, Joseph
Shalev-Shwartz, Shai
Singer, Yoram
Chazan, Dan
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08): : 2373 - 2382
[6] Khanagha V, 2011, INT CONF ACOUST SPEE, P4484
[7] SPEAKER-INDEPENDENT PHONE RECOGNITION USING HIDDEN MARKOV-MODELS
LEE, KF
HON, HW
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (11): : 1641 - 1648
[8] Speech segmentation using regression fusion of boundary predictions
Mporas, Iosif
Ganchev, Todor
Fakotakis, Nikos
[J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (02) : 273 - 288
[9] On using multiple models for automatic speech segmentation
Park, Seung Seop
Kim, Nam Soo
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08): : 2202 - 2212
[10] Automatic speech segmentation based on boundary-type candidate selection
Park, Seung Seop
Kim, Nam Soo
[J]. IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (10) : 640 - 643

← 1 2 →