A multistage algorithm for spotting new words in speech

被引:21
作者
Dharanipragada, S [1 ]
Roukos, S [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2002年 / 10卷 / 08期
关键词
audio indexing; fast match; keyword spotting; multimedia browsing; new-word detection;
D O I
10.1109/TSA.2002.804543
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present a fast, vocabulary independent, algorithm for spotting words in speech. The algorithm consists of a phone-ngram representation (indexing) stage and a coarse-to-detailed search stage for spotting a word/phone sequence in speech. The phone-ngram representation stage provides a phoneme-level representation of the speech that can be searched efficiently. We present a novel method for phoneme-recognition using a vocabulary prefix tree to guide the creation of the phone-ngram index. The coarse search, consisting of phone-ngram matching, identifies regions of speech as putative word hits. The detailed acoustic match is then conducted only at the putative hits identified in the coarse match. This gives us vocabulary independence and the desired accuracy and speed in wordspotting. Current lattice-based phoneme-matching algorithms are similar to the coarse-match step of our Algorithm. We show that our combined algorithm gives a factor of two improvement over the coarse match. The algorithm has wide-ranging use in distributed and pervasive speech recognition applications such as audio-indexing, spoken message retrieval and video-browsing.
引用
收藏
页码:542 / 550
页数:9
相关论文
共 50 条
  • [1] Spotting words in silent speech videos: a retrieval-based approach
    Abhishek Jha
    Vinay P. Namboodiri
    C. V. Jawahar
    Machine Vision and Applications, 2019, 30 : 217 - 229
  • [2] Spotting words in silent speech videos: a retrieval-based approach
    Jha, Abhishek
    Namboodiri, Vinay P.
    Jawahar, C. V.
    MACHINE VISION AND APPLICATIONS, 2019, 30 (02) : 217 - 229
  • [3] Keyword Spotting System for Tamil Isolated Words using Multidimensional MFCC and DTW Algorithm
    Senthildevi, K. A.
    Chandra, E.
    2015 INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND SIGNAL PROCESSING (ICCSP), 2015, : 550 - 554
  • [4] Fast Keyword Spotting in Telephone Speech
    Nouza, Jan
    Silovsky, Jan
    RADIOENGINEERING, 2009, 18 (04) : 665 - 670
  • [5] Comparison of Keyword Spotting Methods for Searching in Speech
    Smidl, Lubos
    Psutka, Josef V.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1894 - 1897
  • [6] Speech Keyword Spotting with Rule Based Segmentation
    Greibus, Mindaugas
    Telksnys, Laimutis
    INFORMATION AND SOFTWARE TECHNOLOGIES (ICIST 2013), 2013, 403 : 186 - 197
  • [7] Realizing Speech to Gesture Conversion by Keyword Spotting
    Zhao, Na
    Yang, Hongwu
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [8] Binary Speech Features for Keyword Spotting Tasks
    Riviello, Alexandre
    David, Jean-Pierre
    INTERSPEECH 2019, 2019, : 3460 - 3464
  • [9] Rapid yet accurate speech indexing using dynamic match lattice spotting
    Thambiratnam, Kishan
    Sridharan, Sridha
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01): : 346 - 357
  • [10] Speech Augmentation Based Unsupervised Learning for Keyword Spotting
    Luo, Jian
    Wang, Jianzong
    Cheng, Ning
    Tang, Haobin
    Xiao, Jing
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,