A Voice Trigger System using Keyword and Speaker Recognition for Mobile Devices

被引：12

作者：

Lee, Hyeopwoo ^{[1
]}

Chang, Sukmoon ^{[2
]}

Yook, Dongsuk ^{[1
]}

Kim, Yongserk ^{[3
]}

机构：

[1] Korea Univ, Dept Comp & Commun Engn, Speech Informat Proc Lab, Seoul 136701, South Korea

[2] Penn State Univ, Middletown, PA 17057 USA

[3] Samsung Elect Co Ltd, Acoust Technol Ctr, Suwon 443742, South Korea

来源：

IEEE TRANSACTIONS ON CONSUMER ELECTRONICS | 2009年 / 55卷 / 04期

关键词：

Voice trigger; keyword recognition; speaker recognition; dynamic time warping; vector quantization; Gaussian mixture model; hidden Markov model; HIDDEN MARKOV-MODELS; SPEECH RECOGNITION; VERIFICATION; IDENTIFICATION; ALGORITHM;

D O I：

10.1109/TCE.2009.5373813

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Voice activity detection plays an important role for an efficient voice interface between human and mobile devices, since it can be used as a trigger to activate an automatic speech recognition module of a mobile device. If the input speech signal can be recognized as a predefined magic word coming from a legitimate user, it can be utilized as a trigger. In this paper, we propose a voice trigger system using a keyword-dependent speaker recognition technique. The voice trigger must be able to perform keyword recognition, as well as speaker recognition, without using computationally demanding speech recognizers to properly trigger a mobile device with low computational power consumption. We propose a template based method and a hidden Markov model (HMM) based method for the voice trigger to solve this problem. The experiments using a Korean word corpus show that the template based method performed 4.1 times faster than the HMM based method However, the HMM based method reduced the recognition error by 27.8% relatively compared to the template based method The proposed methods are complementary and can be used selectively depending on the device of interest.(1)

引用

页码：2377 / 2384

页数：8

共 18 条

[1]

Bhattacharyya A.K., 1943, Bull. Calcutta Math. Soc., V35, P99, DOI DOI 10.1038/157869B0

[2] Support vector machines for speaker and language recognition [J].

Campbell, WM ;

Campbell, JP ;

Reynolds, DA ;

Singer, E ;

Torres-Carrasquillo, PA .

COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3) :210-229

[3]

Chung H, 2006, IEEE T CONSUM ELECTR, V52, P792

[4] Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold [J].

Davis, A ;

Nordholm, S ;

Togneri, R .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02) :412-424

[5] The NIST speaker recognition evaluation - Overview, methodology, systems, results, perspective [J].

Doddington, GR ;

Przybocki, MA ;

Martin, AF ;

Reynolds, DA .

SPEECH COMMUNICATION, 2000, 31 (2-3) :225-254

[6] Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].

Gauvain, Jean-Luc ;

Lee, Chin-Hui .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298

[7] Maximum a posteriori adaptation of the centroid model for speaker verification [J].

Hautamaki, Ville ;

Kinnunen, Tomi ;

Karkkainen, Ismo ;

Saastamoinen, Juhani ;

Tuononen, Marko ;

Franti, Pasi .

IEEE SIGNAL PROCESSING LETTERS, 2008, 15 (162-165) :162-165

[8] A Smart Universal Remote Control based on Audio-Visual Device Virtualization [J].

Huang, Hsien-Chao ;

Lin, Ting-Ching ;

Huang, Yueh-Min .

IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2009, 55 (01) :172-178

[9] Text-independent speaker identification using soft channel selection in home robot environments [J].

Ji, Mikyong ;

Kim, Sungtak ;

Kim, Hoirin ;

Yoon, Ho-Sub .

IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2008, 54 (01) :140-144

[10] Real-time speaker identification and verification [J].

Kinnunen, T ;

Karpov, E ;

Fränti, P .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01) :277-288

← 1 2 →