HiLAM-aligned kernel discriminant analysis for text-dependent speaker verification

被引：4

作者：

Laskar, Mohammad Azharuddin ^{[1
]}

Laskar, Rabul Hussain ^{[1
]}

机构：

[1] Natl Inst Technol Silchar, Dept ECE, Silchar 788010, Assam, India

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2021年 / 182卷

关键词：

Text-dependent speaker verification; Kernel Discriminant Analysis; HiLAM; Online i-vector/PLDA; X-vector; RECOGNITION;

D O I：

10.1016/j.eswa.2021.115281

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Probabilistic Linear Discriminant Analysis (PLDA) has been a commonly used backend classifier for many text-dependent speaker verification (TDSV) systems. Lately, PLDA projections have been integrated with the traditional Dynamic Time Warping (DTW) template matching framework, resulting in the DTW Online i-vector/PLDA system. The system is shown to achieve state-of-the-art performance for TDSV task. PLDA model serves to train a subspace that compensates for channel and session variabilities. It assumes linear separability between speaker-phrase information and other components. However, this relationship is known to be non-linear. The non-linearity is more prominent in case of short speech extracts, as in the case of the online i-vectors. This results in loss of vital speaker-phrase information at PLDA modeling. To this end, this work explores Kernel Discriminant Analysis (KDA) for TDSV task. It further proposes to use Hierarchical Multi-Layer Acoustic Model (HiLAM) to complement KDA with a more effective speaker-text class definition. The proposed system is hypothesized to benefit on three counts - non-linear modeling ability of KDA, speaker idiosyncrasy information associated with HiLAM-defined speaker-text units and modeling of the exact context of the pass-phrase, as offered by HiLAM. It shows a relative Equal Error Rate (EER) reduction of up to 50.63% on Part 1 of the RSR2015 database when compared to the baseline DTW Online i-vector/PLDA system.

引用

页数：12

共 33 条

[21] Machlica L, 2012, 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, P1570
[22] Madikeri S, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3105
[23] Martin AF, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P2734
[24] National Institute of Standards and Technology, 2007, SPEAK REC EV
[25] Novoselov S., 2018, ICASSP IEEE INT C AC, DOI DOI 10.1109/ICASSP.2018.8462358
[26] Local spectral variability features for speaker verification
Sahidullah, Md
Kinnunen, Tomi
[J]. DIGITAL SIGNAL PROCESSING, 2016, 50 : 1 - 11
[27] Stafylakis T, 2013, INTERSPEECH, P3651
[28] Stafylakis Themos, 2013, COMPUT SPEECH LANG, P1
[29] Speaker identification features extraction methods: A systematic review
Tirumala, Sreenivas Sremath
Shahamiri, Seyed Reza
Garhwal, Abhimanyu Singh
Wang, Ruili
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 90 : 250 - 271
[30] Text-dependent speaker verification based on i-vectors, Neural Networks and Hidden Markov Models
Zeinali, Hossein
Sameti, Hossein
Burget, Lukas
Cernocky, Jan Honza
[J]. COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 53 - 71

← 1 2 3 4 →