EFFICIENT SUBWORD LATTICE RETRIEVAL FOR GERMAN SPOKEN TERM DETECTION

被引:12
作者
Mertens, Timo [1 ,2 ]
Schneider, Daniel [2 ]
机构
[1] NTNU, Dept Elect & Telecommun, Trondheim, Norway
[2] Fraunhofer IAIS, Schloss Birlinghoven, St Augustin 53754, Germany
来源
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS | 2009年
关键词
spoken term detection; spoken document retrieval; speech recognition; speech search;
D O I
10.1109/ICASSP.2009.4960726
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a lattice-based STD method for German broadcast news data and compare it to a previously proposed fuzzy search. Due to the important out-of-vocabulary (OOV) problem in German, we evaluate suitable subword indexing units for lattice retrieval. Hybrid lattice retrieval of words and subwords is investigated because of the robust nature of words as an indexing unit. We show that by using efficient lattice graph and score pruning techniques, precision of subword retrieval is increased by 8% absolute with only a small loss in recall. Additionally, a speed-up of up to 6 times can be observed.
引用
收藏
页码:4885 / +
页数:2
相关论文
共 11 条
  • [1] BURGET L, 2006, P TSD
  • [2] LARSON M, 2003, P EUROSPEECH
  • [3] LARSON M, 2007, SEARCH SPONT CONV SP
  • [4] LEE A, 2001, P EUROSPEECH
  • [5] Mamou J, 2008, SEARCH SPONT CONV SP, P20
  • [6] MCTAIT K, 2003, P EUROSPEECH
  • [7] Spoken term detection for Turkish Broadcast News
    Parlak, Siddika
    Saraclar, Murat
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5244 - 5247
  • [8] Saraclar M, 2004, HLT-NAACL 2004: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, P129
  • [9] SCHNEIDER D, 2008, SEARCH SPONT CONV SP, P34
  • [10] SZOKE I, 2008, SEARCH SPONT CONV SP, P42