UNSUPERVISED TRAINING OF SUBSPACE GAUSSIAN MIXTURE MODELS FOR CONVERSATIONAL TELEPHONE SPEECH RECOGNITION

被引:0
|
作者
Ma, Zejun [1 ]
Wang, Xiaorui [1 ]
Xu, Bo [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Digital Content Technol Res Ctr, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
来源
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2012年
关键词
Speech recognition with low resources; unsupervised learning; subspace acoustic model; BROADCAST NEWS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents our preliminary works on exploring unsupervised training of subspace gaussian mixture models for under-resourced CTS recognition task. The subspace model yields better performance than conventional GMM model, particularly in small or middle-sized training set. As an effective way to save human efforts, unsupervised learning is often applied to automatically transcribe a large amount of speech archives. The additional auto-transcribed data may help to improve model accuracy. In this paper, experiments are carried out on two publicly available English conversational telephone speech corpora. Both GMM and SGMM model in combination with unsupervised learning are examined and compared in this paper.
引用
收藏
页码:4829 / 4832
页数:4
相关论文
共 50 条
  • [1] UNSUPERVISED TRAINING OF SUBSPACE GAUSSIAN MIXTURE MODELS FOR CONVERSATIONAL TELEPHONE SPEECH RECOGNITION
    Ma, Zejun
    Wang, Xiaorui
    Xu, Bo
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4829 - 4832
  • [2] SUBSPACE GAUSSIAN MIXTURE MODELS FOR SPEECH RECOGNITION
    Povey, Daniel
    Burget, Lukas
    Agarwal, Mohit
    Akyazi, Pinar
    Feng, Kai
    Ghoshal, Arnab
    Glembek, Ondrej
    Goel, Nagendra Kumar
    Karafiat, Martin
    Rastrow, Ariya
    Rose, Richard C.
    Schwarz, Petr
    Thomas, Samuel
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4330 - 4333
  • [3] Regularized Subspace Gaussian Mixture Models for Speech Recognition
    Lu, Liang
    Ghoshal, Arnab
    Renals, Steve
    IEEE SIGNAL PROCESSING LETTERS, 2011, 18 (07) : 419 - 422
  • [4] Subspace constrained Gaussian mixture models for speech recognition
    Axelrod, S
    Goel, V
    Gopinath, RA
    Olsen, PA
    Visweswariah, K
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (06): : 1144 - 1160
  • [5] DEALING WITH ACOUSTIC MISMATCH FOR TRAINING MULTILINGUAL SUBSPACE GAUSSIAN MIXTURE MODELS FOR SPEECH RECOGNITION
    Mohan, Aanchan
    Ghalehjegh, Sina Hamidi
    Rose, Richard C.
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4893 - 4896
  • [6] Comparison of Subspace Methods for Gaussian Mixture Models in Speech Recognition
    Varjokallio, Matti
    Kurimo, Mikko
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 181 - 184
  • [7] Discriminative estimation of subspace constrained Gaussian mixture models for speech recognition
    Axelrod, Scott
    Goel, Vaibhava
    Gopinath, Ramesh
    Olsen, Peder
    Visweswariah, Karthik
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01): : 172 - 189
  • [8] Noise Compensation for Speech Recognition Using Subspace Gaussian Mixture Models
    Bouallegue, Mohamed
    Rouvier, Mickael
    Matrouf, Driss
    Linares, Georges
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 318 - 321
  • [9] A SIMPLIFIED SUBSPACE GAUSSIAN MIXTURE TO COMPACT ACOUSTIC MODELS FOR SPEECH RECOGNITION
    Bouallegue, Mohamed
    Matrouf, Driss
    Linares, Georges
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4896 - 4899
  • [10] MULTILINGUAL ACOUSTIC MODELING FOR SPEECH RECOGNITION BASED ON SUBSPACE GAUSSIAN MIXTURE MODELS
    Burget, Lukas
    Schwarz, Petr
    Agarwal, Mohit
    Akyazi, Pinar
    Feng, Kai
    Ghoshal, Arnab
    Glembek, Ondrej
    Goel, Nagendra
    Karafiat, Martin
    Povey, Daniel
    Rastrow, Ariya
    Rose, Richard C.
    Thomas, Samuel
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4334 - 4337