EIGENTRIPHONES: A BASIS FOR CONTEXT-DEPENDENT ACOUSTIC MODELING

被引:0
作者
Ko, Tom [1 ]
Mak, Brian [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Hong Kong, Peoples R China
来源
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2011年
关键词
Eigenvoices; eigentriphones; context-dependent acoustic modeling; adaptation; HIDDEN MARKOV-MODELS; SPEECH RECOGNITION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In context-dependent acoustic modeling, it is important to strike a balance between detailed modeling and data sufficiency for robust estimation of model parameters. In the past, parameter sharing or tying is one of the most common techniques to solve the problem. In recent years, another technique which may be loosely and collectively called the subspace approach tries to express a phonetic or sub-phonetic unit in terms of a small set of canonical vectors or units. In this paper, we investigate the development of an eigenbasis over the triphones and model each triphone as a point in the basis. We call the eigenvectors in the basis eigentriphones. From another perspective, we investigate the use of the eigenvoice adaptation method as a general acoustic modeling method for training triphones - especially the less frequent triphones without tying their states so that all the triphones are really distinct from each other and thus may be more discriminative. Experimental evaluation on the 5K-vocabulary HUB2 recognition task shows that a triphone HMM system trained using only eigentriphones without state tying may achieve slightly better performance than the common tied-state triphones.
引用
收藏
页码:4892 / 4895
页数:4
相关论文
共 11 条
  • [1] Subspace distribution clustering hidden Markov model
    Bocchieri, E
    Mak, BKW
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03): : 264 - 275
  • [2] Chang Hung-An, 2009, P INTERSPEECH, P232
  • [3] GALES MJF, 2010, P INTERSPEECH, P58
  • [4] Huang X. D., 1989, Computer Speech and Language, V3, P239, DOI 10.1016/0885-2308(89)90020-X
  • [5] Shared-Distribution Hidden Markov Models for Speech Recognition
    Hwang, Mei-Yuh
    Huang, Xuedong
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (04): : 414 - 420
  • [6] Rapid speaker adaptation in eigenvoice space
    Kuhn, R
    Junqua, JC
    Nguyen, P
    Niedzielski, N
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (06): : 695 - 707
  • [7] Lee K.F., 1989, The Development of the SPHINX System
  • [8] CONTEXT-DEPENDENT PHONETIC HIDDEN MARKOV-MODELS FOR SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION
    LEE, KF
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1990, 38 (04): : 599 - 609
  • [9] SUBSPACE GAUSSIAN MIXTURE MODELS FOR SPEECH RECOGNITION
    Povey, Daniel
    Burget, Lukas
    Agarwal, Mohit
    Akyazi, Pinar
    Feng, Kai
    Ghoshal, Arnab
    Glembek, Ondrej
    Goel, Nagendra Kumar
    Karafiat, Martin
    Rastrow, Ariya
    Rose, Richard C.
    Schwarz, Petr
    Thomas, Samuel
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4330 - 4333
  • [10] Wijitha Senadeera Wijitha Senadeera, 2006, International Journal of Food Engineering, V2, P7