Maximum Gaussianality training for deep speaker vector normalization

被引:2
|
作者
Cai, Yunqi [1 ,2 ,3 ]
Li, Lantian [4 ]
Abel, Andrew [3 ,5 ]
Zhu, Xiaoyan [3 ]
Wang, Dong [2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650504, Peoples R China
[2] BNRist Tsinghua Univ, Ctr Speech & Language Technol CSLT, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Dept Comp Sci, Beijing 100084, Peoples R China
[4] Artificial Intelligence Beijing Univ Posts & Telec, Beijing, Peoples R China
[5] Univ Strathclyde, Dept Comp & Informat Sci, Glasgow, Scotland
关键词
Speaker embedding Normalization flow Gaussianality training; RECOGNITION;
D O I
10.1016/j.patcog.2023.109977
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic Speaker Verification (ASV) is a critical task in pattern recognition and has been applied to various security-sensitive scenarios. The current state-of-the-art technique for ASV is based on deep embedding. However, a significant challenge with this approach is that the resulting deep speaker vectors tend to be irregularly distributed. To address this issue, this paper proposes a novel training method called Maximum Gaussianality (MG), which regulates the distribution of the speaker vectors. Compared to the conventional normalization approach based on maximum likelihood (ML), the new approach directly maximizes the Gaussianality of the latent codes, and therefore can both normalize the between-class and within-class distributions in a controlled and reliable way and eliminate the unbound likelihood problem associated with the conventional ML approach. Our experiments on several datasets demonstrate that our MG-based normalization can deliver much better performance than the baseline systems without normalization and outperform discriminative normalization flow (DNF), an ML-based normalization method, particularly when the training data is limited. In theory, the MG criterion can be applied to any task in any research domain where Gaussian distributions are needed, making the MG training a versatile tool.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Is normalization indispensable for training deep neural networks?
    Shao, Jie
    Hu, Kai
    Wang, Changhu
    Xue, Xiangyang
    Raj, Bhiksha
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [22] Training of support vector machine with the use of multivariate normalization
    Martinez Lopez, F. J.
    Martinez Puertas, S.
    Torres Arriaza, J. A.
    APPLIED SOFT COMPUTING, 2014, 24 : 1105 - 1111
  • [23] Ensemble Speaker Modeling using Speaker Adaptive Training Deep Neural Network for Speaker Adaptation
    Li, Sheng
    Lu, Xugang
    Akita, Yuya
    Kawahara, Tatsuya
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2892 - 2896
  • [24] Deep Speaker Embedding with Frame-Constrained Training Strategy for Speaker Verification
    Gu, Bin
    INTERSPEECH 2022, 2022, : 1451 - 1455
  • [25] FMLLR Speaker Normalization With i-Vector: In Pseudo-FMLLR and Distillation Framework
    Joy, Neethu Mariam
    Kothinti, Sandeep Reddy
    Umesh, Srinivasan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (04) : 797 - 805
  • [26] I-VECTOR KULLBACK-LEIBLER DIVISIVE NORMALIZATION FOR PLDA SPEAKER VERIFICATION
    Pan, Yilin
    Zheng, Tieran
    Chen, Chen
    2017 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2017), 2017, : 56 - 60
  • [27] SPEAKER ADAPTIVE TRAINING USING DEEP NEURAL NETWORKS
    Ochiai, Tsubasa
    Matsuda, Shigeki
    Lu, Xugang
    Hori, Chiori
    Katagiri, Shigeru
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [28] IMPROVEMENTS TO SPEAKER ADAPTIVE TRAINING OF DEEP NEURAL NETWORKS
    Miao, Yajie
    Jiang, Lu
    Zhang, Hao
    Metze, Florian
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 165 - 170
  • [29] Structure injected weight normalization for training deep networks
    Xu Yuan
    Xiangjun Shen
    Sumet Mehta
    Teng Li
    Shiming Ge
    Zhengjun Zha
    Multimedia Systems, 2022, 28 : 433 - 444
  • [30] Structure injected weight normalization for training deep networks
    Yuan, Xu
    Shen, Xiangjun
    Mehta, Sumet
    Li, Teng
    Ge, Shiming
    Zha, Zhengjun
    MULTIMEDIA SYSTEMS, 2022, 28 (02) : 433 - 444