A multistage algorithm for spotting new words in speech

被引:21
作者
Dharanipragada, S [1 ]
Roukos, S [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2002年 / 10卷 / 08期
关键词
audio indexing; fast match; keyword spotting; multimedia browsing; new-word detection;
D O I
10.1109/TSA.2002.804543
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present a fast, vocabulary independent, algorithm for spotting words in speech. The algorithm consists of a phone-ngram representation (indexing) stage and a coarse-to-detailed search stage for spotting a word/phone sequence in speech. The phone-ngram representation stage provides a phoneme-level representation of the speech that can be searched efficiently. We present a novel method for phoneme-recognition using a vocabulary prefix tree to guide the creation of the phone-ngram index. The coarse search, consisting of phone-ngram matching, identifies regions of speech as putative word hits. The detailed acoustic match is then conducted only at the putative hits identified in the coarse match. This gives us vocabulary independence and the desired accuracy and speed in wordspotting. Current lattice-based phoneme-matching algorithms are similar to the coarse-match step of our Algorithm. We show that our combined algorithm gives a factor of two improvement over the coarse match. The algorithm has wide-ranging use in distributed and pervasive speech recognition applications such as audio-indexing, spoken message retrieval and video-browsing.
引用
收藏
页码:542 / 550
页数:9
相关论文
共 50 条
  • [31] Speech Keyword Spotting Method Based on Swin-Transformer Model
    Chengli Sun
    Bikang Chen
    Feilong Chen
    Yan Leng
    Qiaosheng Guo
    [J]. International Journal of Computational Intelligence Systems, 17
  • [32] ADAPTATION OF RNN TRANSDUCER WITH TEXT-TO-SPEECH TECHNOLOGY FOR KEYWORD SPOTTING
    Sharma, Eva
    Ye, Guoli
    Wei, Wenning
    Zhao, Rui
    Tian, Yao
    Wu, Jian
    He, Lei
    Lin, Ed
    Gong, Yifan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7484 - 7488
  • [33] Keyword Spotting in Continuous Speech Using Spectral and Prosodic Information Fusion
    Pandey, Laxmi
    Hegde, Rajesh M.
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2019, 38 (06) : 2767 - 2791
  • [34] KEYWORD-SPECIFIC NORMALIZATION BASED KEYWORD SPOTTING FOR SPONTANEOUS SPEECH
    Li, Weifeng
    Liao, Qingmin
    [J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 233 - 237
  • [35] ADAPTIVE BOOSTED NON-UNIFORM MCE FOR KEYWORD SPOTTING ON SPONTANEOUS SPEECH
    Weng, Chao
    Luang, Biing-Hwang
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6960 - 6964
  • [36] Open Vocabulary Keyword Spotting through Transfer Learning from Speech Synthesis
    Kesavaraj, V
    Vuppala, Anil
    [J]. 2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024, 2024,
  • [37] Speech densely connected convolutional networks for small-footprint keyword spotting
    Tsai, Tsung-Han
    Lin, Xin-Hui
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (25) : 39119 - 39137
  • [38] Speech densely connected convolutional networks for small-footprint keyword spotting
    Tsung-Han Tsai
    Xin-Hui Lin
    [J]. Multimedia Tools and Applications, 2023, 82 : 39119 - 39137
  • [39] A 34.7 μW Speech Keyword Spotting IC Based on Subband Energy Feature Extraction
    Wu, Gexuan
    Wei, Jianlong
    Wang, Shuai
    Wei, Guangshun
    Li, Bing
    [J]. ELECTRONICS, 2023, 12 (15)
  • [40] Building and benchmarking an Arabic Speech Commands dataset for small-footprint keyword spotting
    Ghandoura, Abdulkader
    Hjabo, Farouk
    Al Dakkak, Oumayma
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 102