Acoustic Features for Hidden Conditional Random Fields-Based Thai Tone Classification

被引:2
作者
Kertkeidkachorn, Natthawut [1 ]
Punyabukkana, Proadpran [1 ]
Suchato, Atiwong [1 ]
机构
[1] Chulalongkorn Univ, Dept Comp Engn, Fac Engn, Bangkok, Thailand
关键词
Design; Algorithms; Experimentation; Performance; Thai tone classification; hidden conditional random fields; acoustic features; tone features; energy; spectral information; RECOGNITION;
D O I
10.1145/2833088
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the Thai language, tone information is necessary for Thai speech recognition systems. Previous studies show that many acoustic cues are attributed to shapes of tones. Nevertheless, most Thai tone classification studies mainly adopted F-0 values and their derivatives without considering other acoustic features. In this article, other acoustic features for Thai tone classification are investigated. In the experiment, energy values and spectral information represented by three spectral-based features including the LPC-based feature, PLP-based feature, and MFCC-based feature are applied to the HCRF-based Thai tone classification, which was reported as the best approach for Thai tone classification. The energy values provide an error rate reduction of 22.40% in the isolated word scenario, while there are slight improvements in the continuous speech scenario. On the contrary, spectral-based features greatly contribute to Thai tone classification in the continuous-speech scenario, whereas spectral-based features slightly degrade performances in the isolated-word scenario. The best achievement in the continuous-speech scenario is obtained from the PLP-based feature, which yields an error rate reduction of 13.90%. Therefore, findings in this article are that energy values and spectral-based features, especially the PLP-based feature, are the main contributors to the improvement of the performances of Thai tone classification in the isolated-word scenario and the continuous-speech scenario, respectively.
引用
收藏
页数:26
相关论文
共 38 条
[1]  
[Anonymous], 1993, IFA P, DOI DOI 10.1371/JOURNAL.PONE.0069107
[2]  
[Anonymous], ORIENTAL COCOSDA 200
[3]  
Boersma P., 2011, PRAAT 5 3 11 SYSTEM
[4]  
Chen X., 2008, P INT C AUD LANG IM
[5]  
Dong JT, 2011, PROCEEDINGS OF THE 49TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE (ACMSE '11), P263
[6]   Voice quality and tone identification in White Hmong [J].
Garellek, Marc ;
Keating, Patricia ;
Esposito, Christina M. ;
Kreiman, Jody .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 133 (02) :1078-1089
[7]   A probabilistic framework for segment-based speech recognition [J].
Glass, JR .
COMPUTER SPEECH AND LANGUAGE, 2003, 17 (2-3) :137-152
[8]  
Gunawardana Asela., 2005, Proceedings of Nineth European Conference on Speech Communication and Technology (EuroSpeech 2005), P1117
[9]  
Hong Quang Nguyen, 2008, 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies (RIVF 2008), P103, DOI 10.1109/RIVF.2008.4586340
[10]   Network traffic anomalies detection and identification with flow monitoring [J].
Nguyen, Huy Anh ;
Nguyen, Tam Van ;
Kim, Dong Il ;
Choi, Deokjai .
2008 IFIP INTERNATIONAL CONFERENCE ON WIRELESS AND OPTICAL COMMUNICATIONS NETWORKS, 2008, :235-+