Exploiting Discriminative Point Process Models for Spoken Term Detection

被引：0

作者：

Norouzian, Atta ^{[1
]}

Jansen, Aren

Rose, Richard ^{[1
]}

Thomas, Samuel

机构：

[1] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ, Canada

来源：

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | 2012年

关键词：

spoken term detection; point process model; discriminative training; whole word model;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

State-of-the-art spoken term detection (STD) systems are built on top of large vocabulary speech recognition engines, which generate lattices that encode candidate occurrences of each invocabulary query. These lattices specifiy start and stop times of hypothesized term occurrences, providing a clear opportunity to return to the acoustics to incorporate novel confidence measures for verification. In this paper, we introduce a novel exemplar distance metric to the recently proposed discriminative point process modeling (DPPM) framework and use the resulting whole word models to generate STD confidence scores. In doing so, we introduce STD to a completely distinct acoustic modeling pipeline, trading Gaussian mixture models (GMM) for multi-layer perceptrons and replacing dictionary-derived hidden Markov models (HMM) with exemplar-based point process models. We find that whole word DPPM scores both perform comparably and are complementary to lattice posterior scores produced by a state-of-the-art speech recognition engine.

引用

页码：2441 / 2444

页数：4

共 12 条

[1] Chen Y. N., 2011, P ICASSP
[2] A new multineuron spike train metric
Houghton, Conor
Sen, Kamal
[J]. NEURAL COMPUTATION, 2008, 20 (06) : 1495 - 1511
[3] Jansen A., 2011, P ICASSP
[4] Point Process Models for Spotting Keywords in Continuous Speech
Jansen, Aren
Niyogi, Partha
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (08): : 1457 - 1470
[5] Kintzley K., 2011, P INTERSPEECH
[6] Measuring spike train synchrony
Kreuz, Thomas
Haas, Julie S.
Morelli, Alice
Abarbanel, Henry D. I.
Politi, Antonio
[J]. JOURNAL OF NEUROSCIENCE METHODS, 2007, 165 (01) : 151 - 161
[7] Mccowan I., 2005, P MEAS BEH 2005 5 IN
[8] Miller D., 2007, P INTERSPEECH
[9] Norouzian A., 2010, Proceedings 2010 IEEE Spoken Language Technology Workshop (SLT 2010), P194, DOI 10.1109/SLT.2010.5700850
[10] SOLTAU H, 2007, P ICASSP

← 1 2 →