Speaker selection training for large vocabulary continuous speech recognition

被引:0
|
作者
Huang, C [1 ]
Chen, T [1 ]
Chang, E [1 ]
机构
[1] Microsoft Res Asia, Beijing 100080, Peoples R China
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Acoustic variability across speakers is one of the challenges of speaker independent (SI) speech recognition systems, As a powerful solution, dominant speaker adaptation technologies such as MLLR and MAP may become inefficient because of the lack of enough enrollment data. In this paper, we propose an adaptation method based on speaker selection training, which makes full use of statistics of training corpus. Relative error rate reduction of 5.31% is achieved when only one utterance is available. We compare different speaker selection strategies, namely, PCA, HMM and GMM based methods. In addition, impacts of number of selected cohort speakers and number of utterances from target speaker are investigated. Furthermore, comparison and integration with MLLR adaptation are also shown. Finally, some ongoing work such as dynamically varying number of selected speakers, measuring the relative contribution among the selected speakers and speeding up the computationally expensive procedure of re-estimation with model synthesis are also discussed.
引用
收藏
页码:609 / 612
页数:4
相关论文
共 50 条
  • [41] Parallel Scalability in Speech Recognition Inference engines in large vocabulary continuous speech recognition
    You, Kisun
    Chong, Jike
    Yi, Youngmin
    Gonina, Ekaterina
    Hughes, Christopher J.
    Chen, Yen-Kuang
    Sung, Wonyong
    Keutzer, Kurt
    IEEE SIGNAL PROCESSING MAGAZINE, 2009, 26 (06) : 124 - 135
  • [42] Aging speech recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech
    Das, Biswajit
    Mandal, Sandipan
    Mitra, Pabitra
    Basu, Anupam
    PATTERN RECOGNITION LETTERS, 2013, 34 (03) : 335 - 343
  • [43] LARGE-VOCABULARY SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION WITH SEMICONTINUOUS HIDDEN MARKOV-MODELS
    HUANG, XD
    HON, HW
    LEE, KF
    SPEECH AND NATURAL LANGUAGE, 1989, : 276 - 279
  • [44] Lattice segmentation and minimum Bayes risk discriminative training for large vocabulary continuous speech recognition
    Doumpiotis, V
    Byrne, W
    SPEECH COMMUNICATION, 2006, 48 (02) : 142 - 160
  • [45] Automatic selection of phonetically distributed sentence sets for speaker adaptation with application to large vocabulary Mandarin speech recognition
    Shen, JL
    Wang, HM
    Lyu, RY
    Lee, LS
    COMPUTER SPEECH AND LANGUAGE, 1999, 13 (01): : 79 - 97
  • [46] End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow
    Variani, Ehsan
    Bagby, Tom
    McDermott, Erik
    Bacchiani, Michiel
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1641 - 1645
  • [47] Size matters: An empirical study of neural network training for large vocabulary continuous speech recognition
    Ellis, D
    Morgan, N
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 1013 - 1016
  • [48] A Segmental CRF Approach to Large Vocabulary Continuous Speech Recognition
    Zweig, Geoffrey
    Nguyen, Patrick
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 152 - 157
  • [50] A large vocabulary continuous speech recognition system for Persian language
    Sameti, Hossein
    Veisi, Hadi
    Bahrani, Mohammad
    Babaali, Bagher
    Hosseinzadeh, Khosro
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011, : 1 - 12