Improving Robustness to Compressed Speech in Speaker Recognition

被引：0

作者：

McLaren, Mitchell ^{[1
]}

Abrash, Victor ^{[1
]}

Graciarena, Martin ^{[1
]}

Lei, Yun ^{[1
]}

Pesan, Jan ^{[2
]}

机构：

[1] SRI Int, Speech Technol & Res Lab, Menlo Pk, CA 94025 USA

[2] Brno Univ Technol, Speech FIT Grp, CS-61090 Brno, Czech Republic

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

speaker recognition; speech coding; codec degradation; speaker verification;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The goal of this paper is to analyze the impact of codecdegraded speech on a state-of-the-art speaker recognition system and propose mitigation techniques. Several acoustic features are analyzed, including the standard Mel filterbank cepstral coefficients (MFCC), as well as the noise-robust medium duration modulation cepstrum (MDMC) and power normalized cepstral coefficients (PNCC), to determine whether robustness to noise generalizes to audio compression. Using a speaker recognition system based on i-vectors and probabilistic linear discriminant analysis (PLDA), we compared four PLDA training scenarios. The first involves training PLDA on clean data, the second included additional noisy and reverberant speech, a third introduces transcoded data matched to the evaluation conditions and the fourth, using codec-degraded speech mismatched to the evaluation conditions. We found that robustness to compressed speech was marginally improved by exposing PLDA to noisy and reverberant speech, with little improvement using trancoded speech in PLDA based on codecs mismatched to the evaluation conditions. Noise-robust features offered a degree of robustness to compressed speech while more significant improvements occurred when PLDA had observed the codec matching the evaluation conditions. Finally, we tested i-vector fusion from the different features, which increased overall system performance but did not improve robustness to codec-degraded speech.

引用

页码：3665 / 3669

页数：5

共 15 条

[1]

[Anonymous], P INTERSPEECH

[2]

[Anonymous], P EUSIPCO

[3]

[Anonymous], P IEEE ICASSP

[4]

[Anonymous], P NIST SRE AN WORKSH

[5]

[Anonymous], IEEE P ISCASS

[6]

Boves L., 1997, P EUROSPEECH, V2, P975

[7]

Campbell J. P. Jr., 1991, Digital Signal Processing, V1, P145, DOI 10.1016/1051-2004(91)90106-U

[8] Front-End Factor Analysis for Speaker Verification [J].

Dehak, Najim ;

Kenny, Patrick J. ;

Dehak, Reda ;

Dumouchel, Pierre ;

Ouellet, Pierre .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798

[9]

Dunn RB, 2001, CONF REC ASILOMAR C, P1562, DOI 10.1109/ACSSC.2001.987749

[10] FEATURE EXTRACTION FOR ROBUST SPEECH RECOGNITION BASED ON MAXIMIZING THE SHARPNESS OF THE POWER DISTRIBUTION AND ON POWER FLOORING [J].

Kim, Chanwoo ;

Stern, Richard M. .

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4574-4577

← 1 2 →