EIGENTRIPHONES: A BASIS FOR CONTEXT-DEPENDENT ACOUSTIC MODELING

被引：0

作者：

Ko, Tom ^{[1
]}

Mak, Brian ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Hong Kong, Peoples R China

来源：

2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2011年

关键词：

Eigenvoices; eigentriphones; context-dependent acoustic modeling; adaptation; HIDDEN MARKOV-MODELS; SPEECH RECOGNITION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In context-dependent acoustic modeling, it is important to strike a balance between detailed modeling and data sufficiency for robust estimation of model parameters. In the past, parameter sharing or tying is one of the most common techniques to solve the problem. In recent years, another technique which may be loosely and collectively called the subspace approach tries to express a phonetic or sub-phonetic unit in terms of a small set of canonical vectors or units. In this paper, we investigate the development of an eigenbasis over the triphones and model each triphone as a point in the basis. We call the eigenvectors in the basis eigentriphones. From another perspective, we investigate the use of the eigenvoice adaptation method as a general acoustic modeling method for training triphones - especially the less frequent triphones without tying their states so that all the triphones are really distinct from each other and thus may be more discriminative. Experimental evaluation on the 5K-vocabulary HUB2 recognition task shows that a triphone HMM system trained using only eigentriphones without state tying may achieve slightly better performance than the common tied-state triphones.

引用

页码：4892 / 4895

页数：4

共 11 条

[1] Subspace distribution clustering hidden Markov model
Bocchieri, E
Mak, BKW
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03): : 264 - 275
[2] Chang Hung-An, 2009, P INTERSPEECH, P232
[3] GALES MJF, 2010, P INTERSPEECH, P58
[4] Huang X. D., 1989, Computer Speech and Language, V3, P239, DOI 10.1016/0885-2308(89)90020-X
[5] Shared-Distribution Hidden Markov Models for Speech Recognition
Hwang, Mei-Yuh
Huang, Xuedong
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (04): : 414 - 420
[6] Rapid speaker adaptation in eigenvoice space
Kuhn, R
Junqua, JC
Nguyen, P
Niedzielski, N
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (06): : 695 - 707
[7] Lee K.F., 1989, The Development of the SPHINX System
[8] CONTEXT-DEPENDENT PHONETIC HIDDEN MARKOV-MODELS FOR SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION
LEE, KF
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1990, 38 (04): : 599 - 609
[9] SUBSPACE GAUSSIAN MIXTURE MODELS FOR SPEECH RECOGNITION
Povey, Daniel
Burget, Lukas
Agarwal, Mohit
Akyazi, Pinar
Feng, Kai
Ghoshal, Arnab
Glembek, Ondrej
Goel, Nagendra Kumar
Karafiat, Martin
Rastrow, Ariya
Rose, Richard C.
Schwarz, Petr
Thomas, Samuel
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4330 - 4333
[10] Wijitha Senadeera Wijitha Senadeera, 2006, International Journal of Food Engineering, V2, P7

← 1 2 →