A Two Stage Mask Estimation Approach to Robust Speaker Verification

被引：0

作者：

Zhao, Yali ^{[1
]}

Xie, Lei ^{[1
]}

Fu, Zhonghua ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Shaanxi Prov Key Lab Speech & Image Informat Proc, Xian 710072, Peoples R China

来源：

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | 2012年

关键词：

speaker verification; missing feature theory; dual-microphone; binary mask estimation; SPEECH RECOGNITION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a two-stage mask estimation approach to robust speaker verification (SV) in noise environments. We consider a practical semi-blind SV scenario: the location of the target speaker is fixed while the locations of al-1 interferers are unknown. In the first stage, we use a dual-microphone and a semi-blind degenerate unmixing estimation technique (DUET) to estimate an initial binary mask. In the second stage, we refine the mask based on the time and frequency histograms of the initial mask. As a result, only highly reliable time-frequency components in the spectro-temporal features are kept for downstream verification. Experiments show that the proposed approach is superior to a baseline MFCC approach and a recent local SNR based mask estimation approach.

引用

页码：2653 / 2656

页数：4

共 8 条

[1] Robust automatic speech recognition with missing and unreliable acoustic data [J].

Cooke, M ;

Green, P ;

Josifovski, L ;

Vizinho, A .

SPEECH COMMUNICATION, 2001, 34 (03) :267-285

[2]

Fu Z.-H., 2010, P ICSLP TAIW

[3] Mask estimation for missing data speech recognition based on statistics of binaural interaction [J].

Harding, S ;

Barker, J ;

Brown, GJ .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01) :58-67

[4]

Hirsch H.-G., 2000, 6 INT C SPOKEN LANGU, P181

[5] An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification [J].

Lu, Xugang ;

Dang, Jianwu .

SPEECH COMMUNICATION, 2008, 50 (04) :312-322

[6] Noise power spectral density estimation based on optimal smoothing and minimum statistics [J].

Martin, R .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (05) :504-512

[7] Noise-Robust Speaker Recognition Combining Missing Data Techniques and Universal Background Modeling [J].

May, Tobias ;

van de Par, Steven ;

Kohlrausch, Armin .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :108-121

[8] Speech segregation based on sound localization [J].

Roman, N ;

Wang, DL ;

Brown, GJ .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2003, 114 (04) :2236-2252

← 1 →