A multistage algorithm for spotting new words in speech

被引：21

作者：

Dharanipragada, S ^{[1
]}

Roukos, S ^{[1
]}

机构：

[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2002年 / 10卷 / 08期

关键词：

audio indexing; fast match; keyword spotting; multimedia browsing; new-word detection;

D O I：

10.1109/TSA.2002.804543

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we present a fast, vocabulary independent, algorithm for spotting words in speech. The algorithm consists of a phone-ngram representation (indexing) stage and a coarse-to-detailed search stage for spotting a word/phone sequence in speech. The phone-ngram representation stage provides a phoneme-level representation of the speech that can be searched efficiently. We present a novel method for phoneme-recognition using a vocabulary prefix tree to guide the creation of the phone-ngram index. The coarse search, consisting of phone-ngram matching, identifies regions of speech as putative word hits. The detailed acoustic match is then conducted only at the putative hits identified in the coarse match. This gives us vocabulary independence and the desired accuracy and speed in wordspotting. Current lattice-based phoneme-matching algorithms are similar to the coarse-match step of our Algorithm. We show that our combined algorithm gives a factor of two improvement over the coarse match. The algorithm has wide-ranging use in distributed and pervasive speech recognition applications such as audio-indexing, spoken message retrieval and video-browsing.

引用

页码：542 / 550

页数：9

共 50 条

[31] Robust Dual-Modal Speech Keyword Spotting for XR Headsets [J].

Cai, Zhuojiang ;

Ma, Yuhan ;

Lu, Feng .

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (05) :2507-2516

[32] Speech Keyword Spotting Method Based on Swin-Transformer Model [J].

Chengli Sun ;

Bikang Chen ;

Feilong Chen ;

Yan Leng ;

Qiaosheng Guo .

International Journal of Computational Intelligence Systems, 17

[33] ADAPTATION OF RNN TRANSDUCER WITH TEXT-TO-SPEECH TECHNOLOGY FOR KEYWORD SPOTTING [J].

Sharma, Eva ;

Ye, Guoli ;

Wei, Wenning ;

Zhao, Rui ;

Tian, Yao ;

Wu, Jian ;

He, Lei ;

Lin, Ed ;

Gong, Yifan .

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :7484-7488

[34] Keyword Spotting in Continuous Speech Using Spectral and Prosodic Information Fusion [J].

Pandey, Laxmi ;

Hegde, Rajesh M. .

CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2019, 38 (06) :2767-2791

[35] KEYWORD-SPECIFIC NORMALIZATION BASED KEYWORD SPOTTING FOR SPONTANEOUS SPEECH [J].

Li, Weifeng ;

Liao, Qingmin .

2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, :233-237

[36] ADAPTIVE BOOSTED NON-UNIFORM MCE FOR KEYWORD SPOTTING ON SPONTANEOUS SPEECH [J].

Weng, Chao ;

Luang, Biing-Hwang .

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, :6960-6964

[37] Keyword spotting for dialectal speech and Introduction of wav2vec2.0 [J].

Ariga, Tomohiro ;

Minakawa, Reo ;

Kojima, Kazunori ;

Lee, Shi-wook ;

Itoh, Yoshiaki .

2024 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2024,

[38] Open Vocabulary Keyword Spotting through Transfer Learning from Speech Synthesis [J].

Kesavaraj, V ;

Vuppala, Anil .

2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024, 2024,

[39] Speech densely connected convolutional networks for small-footprint keyword spotting [J].

Tsai, Tsung-Han ;

Lin, Xin-Hui .

MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (25) :39119-39137

[40] Speech densely connected convolutional networks for small-footprint keyword spotting [J].

Tsung-Han Tsai ;

Xin-Hui Lin .

Multimedia Tools and Applications, 2023, 82 :39119-39137

← 1 2 3 4 5 →