Exploiting Discriminative Point Process Models for Spoken Term Detection

被引:0
作者
Norouzian, Atta [1 ]
Jansen, Aren
Rose, Richard [1 ]
Thomas, Samuel
机构
[1] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ, Canada
来源
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | 2012年
关键词
spoken term detection; point process model; discriminative training; whole word model;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art spoken term detection (STD) systems are built on top of large vocabulary speech recognition engines, which generate lattices that encode candidate occurrences of each invocabulary query. These lattices specifiy start and stop times of hypothesized term occurrences, providing a clear opportunity to return to the acoustics to incorporate novel confidence measures for verification. In this paper, we introduce a novel exemplar distance metric to the recently proposed discriminative point process modeling (DPPM) framework and use the resulting whole word models to generate STD confidence scores. In doing so, we introduce STD to a completely distinct acoustic modeling pipeline, trading Gaussian mixture models (GMM) for multi-layer perceptrons and replacing dictionary-derived hidden Markov models (HMM) with exemplar-based point process models. We find that whole word DPPM scores both perform comparably and are complementary to lattice posterior scores produced by a state-of-the-art speech recognition engine.
引用
收藏
页码:2441 / 2444
页数:4
相关论文
共 12 条
  • [1] Chen Y. N., 2011, P ICASSP
  • [2] A new multineuron spike train metric
    Houghton, Conor
    Sen, Kamal
    [J]. NEURAL COMPUTATION, 2008, 20 (06) : 1495 - 1511
  • [3] Jansen A., 2011, P ICASSP
  • [4] Point Process Models for Spotting Keywords in Continuous Speech
    Jansen, Aren
    Niyogi, Partha
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (08): : 1457 - 1470
  • [5] Kintzley K., 2011, P INTERSPEECH
  • [6] Measuring spike train synchrony
    Kreuz, Thomas
    Haas, Julie S.
    Morelli, Alice
    Abarbanel, Henry D. I.
    Politi, Antonio
    [J]. JOURNAL OF NEUROSCIENCE METHODS, 2007, 165 (01) : 151 - 161
  • [7] Mccowan I., 2005, P MEAS BEH 2005 5 IN
  • [8] Miller D., 2007, P INTERSPEECH
  • [9] Norouzian A., 2010, Proceedings 2010 IEEE Spoken Language Technology Workshop (SLT 2010), P194, DOI 10.1109/SLT.2010.5700850
  • [10] SOLTAU H, 2007, P ICASSP