A multistage algorithm for spotting new words in speech

被引:21
作者
Dharanipragada, S [1 ]
Roukos, S [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2002年 / 10卷 / 08期
关键词
audio indexing; fast match; keyword spotting; multimedia browsing; new-word detection;
D O I
10.1109/TSA.2002.804543
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present a fast, vocabulary independent, algorithm for spotting words in speech. The algorithm consists of a phone-ngram representation (indexing) stage and a coarse-to-detailed search stage for spotting a word/phone sequence in speech. The phone-ngram representation stage provides a phoneme-level representation of the speech that can be searched efficiently. We present a novel method for phoneme-recognition using a vocabulary prefix tree to guide the creation of the phone-ngram index. The coarse search, consisting of phone-ngram matching, identifies regions of speech as putative word hits. The detailed acoustic match is then conducted only at the putative hits identified in the coarse match. This gives us vocabulary independence and the desired accuracy and speed in wordspotting. Current lattice-based phoneme-matching algorithms are similar to the coarse-match step of our Algorithm. We show that our combined algorithm gives a factor of two improvement over the coarse match. The algorithm has wide-ranging use in distributed and pervasive speech recognition applications such as audio-indexing, spoken message retrieval and video-browsing.
引用
收藏
页码:542 / 550
页数:9
相关论文
共 50 条
[41]   Building and benchmarking an Arabic Speech Commands dataset for small-footprint keyword spotting [J].
Ghandoura, Abdulkader ;
Hjabo, Farouk ;
Al Dakkak, Oumayma .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 102
[42]   A 34.7 μW Speech Keyword Spotting IC Based on Subband Energy Feature Extraction [J].
Wu, Gexuan ;
Wei, Jianlong ;
Wang, Shuai ;
Wei, Guangshun ;
Li, Bing .
ELECTRONICS, 2023, 12 (15)
[43]   Discriminative Training Using Non-uniform Criteria for Keyword Spotting on Spontaneous Speech [J].
Weng, Chao ;
Juang, Biing-Hwang ;
Povey, Daniel .
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, :558-561
[44]   Wav2KWS: Transfer Learning From Speech Representations for Keyword Spotting [J].
Seo, Deokjin ;
Oh, Heung-Seon ;
Jung, Yuchul .
IEEE ACCESS, 2021, 9 :80682-80691
[45]   Discriminative Training Using Non-Uniform Criteria for Keyword Spotting on Spontaneous Speech [J].
Weng, Chao ;
Juang, Biing-Hwang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (02) :300-312
[46]   IMPLICIT ACOUSTIC ECHO CANCELLATION FOR KEYWORD SPOTTING AND DEVICE-DIRECTED SPEECH DETECTION [J].
Cornell, Samuele ;
Balestri, Thomas ;
Senechal, Thibaud .
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, :1052-1058
[47]   A Fast Fuzzy Keyword Spotting Algorithm Based on Syllable Confusion Network [J].
Shao, Jian ;
Zhao, Qingwei ;
Zhang, Pengyuan ;
Liu, Zhaojie ;
Yan, Yonghong .
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, :1665-1668
[48]   Improving the performance of keyword spotting system for children's speech through prosody modification [J].
Shahnawazuddin, S. ;
Maity, Karabi ;
Pradhan, Gayadhar .
DIGITAL SIGNAL PROCESSING, 2019, 86 :11-18
[49]   A low power keyword spotting algorithm for memory constrained embedded systems [J].
Benelli, Gionata ;
Meoni, Gabriele ;
Fanucci, Luca .
PROCEEDINGS OF THE 2018 26TH IFIP/IEEE INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC), 2018, :267-272
[50]   Combining Tandem and Hybrid Systems for Improved Speech Recognition and Keyword Spotting on Low Resource Languages [J].
Rath, Shakti P. ;
Knill, Kate M. ;
Ragni, Anton ;
Gales, Mark J. E. .
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, :835-839