EM-based Phoneme Confusion Matrix Generation for Low-resource Spoken Term Detection

被引:0
作者
Xu, Di [1 ]
Wang, Yun [1 ]
Metze, Florian [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Language Technol Inst, Pittsburgh, PA 15213 USA
来源
2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014 | 2014年
关键词
Expectation-maximization algorithm; machine learning; information retrieval; spoken term detection; out-of-vocabulary words;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The idea of using a data-driven phoneme confusion matrix (PCM) to enhance speech recognition and retrieval performance is not new to the speech community. Although empirical results show various degrees of improvements brought by introducing a PCM, the underlying data-driven processes introduced in most papers are rather ad-hoc and lack rigorous statistical justifications. In this paper we will focus on the statistical aspects of PCM generation, propose and justify a novel expectation-maximization based algorithm for data-driven PCM generation. We will evaluate the performance of the generated PCMs under the context of low-resource spoken term detection, with primary focus on out-of-vocabulary keywords.
引用
收藏
页码:424 / 429
页数:6
相关论文
共 22 条
  • [1] [Anonymous], 2006, PATTERN RECOGN, DOI DOI 10.1117/1.2819119
  • [2] [Anonymous], 2010, P 14 C COMPUTATIONAL
  • [3] Joint-sequence models for grapheme-to-phoneme conversion
    Bisani, Maximilian
    Ney, Hermann
    [J]. SPEECH COMMUNICATION, 2008, 50 (05) : 434 - 451
  • [4] Matching Criteria for Vocabulary-Independent Search
    Chaudhari, Upendra V.
    Picheny, Michael
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (05): : 1633 - 1643
  • [5] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [6] Harper M., 2011, IARPA BAA
  • [7] Hofleitner A, 2011, IEEE INT C INTELL TR, P815, DOI 10.1109/ITSC.2011.6083050
  • [8] Finding consensus in speech recognition: word error minimization and other applications of confusion networks
    Mangu, L
    Brill, E
    Stolcke, A
    [J]. COMPUTER SPEECH AND LANGUAGE, 2000, 14 (04) : 373 - 400
  • [9] Miller D. R., 2007, P INTERSPEECH, P314
  • [10] The expectation-maximization algorithm
    Moon, TK
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 1996, 13 (06) : 47 - 60