PHONETIC SEGMENTATION USING STATISTICAL CORRECTION AND MULTI-RESOLUTION FUSION

被引:0
作者
Zhao, Sixuan [1 ]
Soon, Ing Yann [1 ]
Koh, Soo Ngee [1 ]
Luke, Kang Kwong [2 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore, Singapore
[2] Nanyang Technol Univ, Sch Humanities & Social Sci, Singapore, Singapore
来源
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2013年
关键词
phonetic segmentation; statistical correction; state selection; multi-resolution; SELECTION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper focuses on the generation of accurate phonetic segmentations. Statistical methods based on absolute and relative correction are discussed and experimented on both monophone and biphone models to improve the segmentation results. The influence of search range on the statistical correction process is studied and a state selection technique is used to enhance the correction results. This paper also explores the influence of resolution (stepsize) of HMMs and proposes a multi-resolution fusion process to further refine the statistically corrected results. Improvements of segmentation results in terms of segmentation accuracy, mean absolute error (MAE), and root mean square error (RMSE) can be observed by applying the proposed refinement methods.
引用
收藏
页码:6694 / 6698
页数:5
相关论文
共 12 条
  • [1] Robust Detection of Phone Boundaries Using Model Selection Criteria With Few Observations
    Almpanidis, George
    Kotti, Margarita
    Kotropoulos, Constantine
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (02): : 287 - 298
  • [2] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [3] Foreign accent conversion in computer assisted pronunciation training
    Felps, Daniel
    Bortfeld, Heather
    Gutierrez-Osuna, Ricardo
    [J]. SPEECH COMMUNICATION, 2009, 51 (10) : 920 - 932
  • [4] Hunt AJ, 1996, INT CONF ACOUST SPEE, P373, DOI 10.1109/ICASSP.1996.541110
  • [5] A large margin algorithm for speech-to-phoneme and music-to-score alignment
    Keshet, Joseph
    Shalev-Shwartz, Shai
    Singer, Yoram
    Chazan, Dan
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08): : 2373 - 2382
  • [6] Khanagha V, 2011, INT CONF ACOUST SPEE, P4484
  • [7] SPEAKER-INDEPENDENT PHONE RECOGNITION USING HIDDEN MARKOV-MODELS
    LEE, KF
    HON, HW
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (11): : 1641 - 1648
  • [8] Speech segmentation using regression fusion of boundary predictions
    Mporas, Iosif
    Ganchev, Todor
    Fakotakis, Nikos
    [J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (02) : 273 - 288
  • [9] On using multiple models for automatic speech segmentation
    Park, Seung Seop
    Kim, Nam Soo
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08): : 2202 - 2212
  • [10] Automatic speech segmentation based on boundary-type candidate selection
    Park, Seung Seop
    Kim, Nam Soo
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (10) : 640 - 643