HiLAM-aligned kernel discriminant analysis for text-dependent speaker verification

被引:4
作者
Laskar, Mohammad Azharuddin [1 ]
Laskar, Rabul Hussain [1 ]
机构
[1] Natl Inst Technol Silchar, Dept ECE, Silchar 788010, Assam, India
关键词
Text-dependent speaker verification; Kernel Discriminant Analysis; HiLAM; Online i-vector/PLDA; X-vector; RECOGNITION;
D O I
10.1016/j.eswa.2021.115281
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Probabilistic Linear Discriminant Analysis (PLDA) has been a commonly used backend classifier for many text-dependent speaker verification (TDSV) systems. Lately, PLDA projections have been integrated with the traditional Dynamic Time Warping (DTW) template matching framework, resulting in the DTW Online i-vector/PLDA system. The system is shown to achieve state-of-the-art performance for TDSV task. PLDA model serves to train a subspace that compensates for channel and session variabilities. It assumes linear separability between speaker-phrase information and other components. However, this relationship is known to be non-linear. The non-linearity is more prominent in case of short speech extracts, as in the case of the online i-vectors. This results in loss of vital speaker-phrase information at PLDA modeling. To this end, this work explores Kernel Discriminant Analysis (KDA) for TDSV task. It further proposes to use Hierarchical Multi-Layer Acoustic Model (HiLAM) to complement KDA with a more effective speaker-text class definition. The proposed system is hypothesized to benefit on three counts - non-linear modeling ability of KDA, speaker idiosyncrasy information associated with HiLAM-defined speaker-text units and modeling of the exact context of the pass-phrase, as offered by HiLAM. It shows a relative Equal Error Rate (EER) reduction of up to 50.63% on Part 1 of the RSR2015 database when compared to the baseline DTW Online i-vector/PLDA system.
引用
收藏
页数:12
相关论文
共 33 条
  • [1] Adedokun O., 2012, J MULTIDISCIPLINARY, V8, P125, DOI DOI 10.56645/JMDE.V8I17.336
  • [2] [Anonymous], 2018, 2018 26 SIGN PROC CO
  • [3] [Anonymous], 2014, ODYSSEY
  • [4] [Anonymous], 1994, CUEDFINFENGTR152
  • [5] Generalized discriminant analysis using a kernel approach
    Baudat, G
    Anouar, FE
    [J]. NEURAL COMPUTATION, 2000, 12 (10) : 2385 - 2404
  • [6] Speaker recognition: A tutorial
    Campbell, JP
    [J]. PROCEEDINGS OF THE IEEE, 1997, 85 (09) : 1437 - 1462
  • [7] Chen NX, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P185
  • [8] Exploring kernel discriminant analysis for speaker verification with limited test data
    Das, Rohan Kumar
    Manam, Akhil Babu
    Prasanna, S. R. Mahadeva
    [J]. PATTERN RECOGNITION LETTERS, 2017, 98 : 26 - 31
  • [9] Front-End Factor Analysis for Speaker Verification
    Dehak, Najim
    Kenny, Patrick J.
    Dehak, Reda
    Dumouchel, Pierre
    Ouellet, Pierre
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798
  • [10] Template-matching for text-dependent speaker verification
    Dey, Subhadeep
    Motlicek, Petr
    Madikeri, Srikanth
    Ferras, Marc
    [J]. SPEECH COMMUNICATION, 2017, 88 : 96 - 105