A BAYESIAN ATTENTION NEURAL NETWORK LAYER FOR SPEAKER RECOGNITION

被引：0

作者：

Zhu, Weizhong ^{[1
]}

Pelecanos, Jason ^{[1
,2
]}

机构：

[1] IBM Res AI, Yorktown Hts, NY 10598 USA

[2] IBM Corp, Yorktown Hts, NY USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

attention modeling; Bayesian statistics; deep neural networks; speaker recognition;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Neural network based attention modeling has found utility in areas such as visual analysis, speech recognition and more recently speaker recognition. Attention represents a gating ( or weighting) function on information and governs how the corresponding statistics are accumulated. In the context of speaker recognition, attention can be incorporated as a frame weighted mean of an information stream. These weights can be made to sum to one ( the standard approach) or be calculated in other ways. If the weights can be made to represent event observation probabilities, we can extend the approach to be within a Bayesian framework. More specifically, we combine prior information with the frame weighted statistics to produce an adapted or posterior estimate of the mean. We evaluate the proposed method on NIST data.

引用

页码：6241 / 6245

页数：5

共 22 条

[1]

[Anonymous], THESIS

[2]

Chowdhury F. A. R. R., 2018, ICASSP

[3]

DeGroot MH, 1970, OPTIMAL STAT DECISIO

[4]

Gauvain J., 1995, EUROSPEECH

[5] Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].

Gauvain, Jean-Luc ;

Lee, Chin-Hui .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298

[6] Robustness to telephone handset distortion in speaker recognition by discriminative feature design [J].

Heck, LP ;

Konig, Y ;

Sönmez, MK ;

Weintraub, M .

SPEECH COMMUNICATION, 2000, 31 (2-3) :181-192

[7]

Heigold G, 2016, INT CONF ACOUST SPEE, P5115, DOI 10.1109/ICASSP.2016.7472652

[8]

LibriVox, LIBRIVOX FREE PUBL D

[9] VoxCeleb: a large-scale speaker identification dataset [J].

Nagrani, Arsha ;

Chung, Joon Son ;

Zisserman, Andrew .

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2616-2620

[10]

NIST, SPEAK REC EV 2010

← 1 2 3 →