Stressed Speech Analysis Using Sparse Representation Over Temporal Information Based Dictionary

被引：0

作者：

Priya, Bhanu ^{[1
]}

Dandapat, S. ^{[1
]}

机构：

[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati 781039, India

来源：

2015 ANNUAL IEEE INDIA CONFERENCE (INDICON) | 2015年

关键词：

Stressed speech; speech recognition; K-SVD algorithm; exemplar dictionary; HMM mean vector; sparse representation; CLASSIFICATION; RECOGNITION; COMPENSATION; FEATURES; NOISE;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In this paper, a novel sparse representation over learned and exemplar dictionaries is explored to estimate the speech information of stressed speech. Stressed speech contains speech and stress informations. The acoustic variabilities are induced due to presence of stress information, which results in degradation of the performance of speech recognition system. In this work, the acoustic variabilities are reduced by representing both neutral and stressed speech in sparse domain with respect to the dictionaries, which contain speech information. K-SVD algorithm is used to learn the redundant dictionary using neutral speech. Exemplar dictionaries consist of mean vectors of GMM and mean vectors of Gaussian mixture density in each state of HMM, which are used to model the neutral speech. All the experiments in this work are done by parametrizing neutral and stressed speech as nonlinear (TEO-CB-Auto-Env) features. Experimental results indicate that speech information under stress conditions can be estimated efficiently when sparse representations of neutral and stressed speech are done over exemplar dictionaries, which is estimated using mean vectors of Gaussian mixture densities in each state of HMM i.e. time dependent features of neutral speech. A relative improvement in the percentage of word accuracy of 8.51% (62.14% to 67.43%) is achieved for speech under angry condition.

引用

页数：6

共 26 条

[1] K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation [J].

Aharon, Michal ;

Elad, Michael ;

Bruckstein, Alfred .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (11) :4311-4322

[2]

[Anonymous], 2015, Encyclopedia of Biometrics, DOI [10.1007/978-0-387-73003-5%20196, DOI 10.1007/978-0-387-73003-5196]

[3] A comparative study of traditional and newly proposed features for recognition of speech under stress [J].

Bou-Ghazale, SE ;

Hansen, JHL .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04) :429-442

[4] CEPSTRAL DOMAIN TALKER STRESS COMPENSATION FOR ROBUST SPEECH RECOGNITION [J].

CHEN, YN .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1988, 36 (04) :433-439

[5] Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization [J].

Donoho, DL ;

Elad, M .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (05) :2197-2202

[6] On sparse representations in arbitrary redundant bases [J].

Fuchs, JJ .

IEEE TRANSACTIONS ON INFORMATION THEORY, 2004, 50 (06) :1341-1344

[7] SOURCE GENERATOR EQUALIZATION AND ENHANCEMENT OF SPECTRAL PROPERTIES FOR ROBUST SPEECH RECOGNITION IN NOISE AND STRESS [J].

HANSEN, JHL ;

CLEMENTS, MA .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (05) :407-415

[8] Analysis and Compensation of Lombard Speech Across Noise Type and Levels With Application to In-Set/Out-of-Set Speaker Recognition [J].

Hansen, John H. L. ;

Varadarajan, Vaishnevi .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (02) :366-378

[9]

Nwe TL, 2003, ICICS-PCM 2003, VOLS 1-3, PROCEEDINGS, P1619

[10]

Nwe TL, 2003, INT CONF ACOUST SPEE, P9

← 1 2 3 →