PARTIAL AUC OPTIMIZATION BASED DEEP SPEAKER EMBEDDINGS WITH CLASS-CENTER LEARNING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION

被引:0
作者
Bai, Zhongxin [1 ,2 ]
Zhang, Xiao-Lei [1 ,2 ]
Chen, Jingdong [1 ,2 ]
机构
[1] Northwestern Polytech Univ, Ctr Intelligent Acoust & Immers Commun, Xian, Peoples R China
[2] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian, Peoples R China
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
基金
美国国家科学基金会; 以色列科学基金会;
关键词
speaker verification; pAUC optimization; speaker centers; verification loss; RECOGNITION;
D O I
10.1109/icassp40776.2020.9053674
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep embedding based text-independent speaker verification has demonstrated superior performance to traditional methods in many challenging scenarios. Its loss functions can be generally categorized into two classes, i.e., verification and identification. The verification loss functions match the pipeline of speaker verification, but their implementations are difficult. Thus, most state-of-the-art deep embedding methods use the identification loss functions with softmax output units or their variants. In this paper, we propose a verification loss function, named the maximization of partial area under the Receiver-operating-characteristic (ROC) curve (pAUC), for deep embedding based text-independent speaker verification. We also propose a class-center based training trial construction method to improve the training efficiency, which is critical for the proposed loss function to be comparable to the identification loss in performance. Experiments on the Speaker in the Wild (SITW) and NIST SRE 2016 datasets show that the proposed pAUC loss function is highly competitive with the state-of-the-art identification loss functions.
引用
收藏
页码:6819 / 6823
页数:5
相关论文
共 50 条
  • [41] Quasi Text-Independent Speaker-Verification based on Pattern Matching
    Gerber, Michael
    Beutler, Rene
    Pfister, Beat
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 93 - 96
  • [42] Text-Independent Speaker Verification Using Rank Threshold in Large Number of Speaker Models
    Okamoto, Haruka
    Tsuge, Satoru
    Abdelwahab, Amira
    Nishida, Masafumi
    Horiuchi, Yasuo
    Kuroiwa, Shingo
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2319 - +
  • [43] Text-Independent Speaker Verification Using Variational Gaussian Mixture Model
    Moattar, Mohammad Hossein
    Homayounpour, Mohammad Mehdi
    ETRI JOURNAL, 2011, 33 (06) : 914 - 923
  • [45] Significance of Constraining Text in Limited Data Text-independent Speaker Verification
    Das, Rohan Kumar
    Jelil, Sarfaraz
    Prasanna, S. R. Mahadeva
    2016 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2016,
  • [46] Text-independent speaker verification using predictive neural networks
    Finan, RA
    Sapeluk, AT
    Damper, RI
    FIFTH INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS, 1997, (440): : 274 - 279
  • [47] GRAPH ATTENTIVE FEATURE AGGREGATION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Shim, Hye-Jin
    Heo, Jungwoo
    Park, Jae-Han
    Lee, Ga-Hui
    Yu, Ha-Jin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7972 - 7976
  • [48] Maximum Likelihood Discriminant Feature for Text-Independent Speaker Verification
    Liu, Qingsong
    Dai, Beiqian
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 3733 - 3736
  • [49] SMALL FOOTPRINT TEXT-INDEPENDENT SPEAKER VERIFICATION FOR EMBEDDED SYSTEMS
    Balian, Julien
    Tavarone, Raffaele
    Poumeyrol, Mathieu
    Coucke, Alice
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6179 - 6183
  • [50] Score Fusion Methods for Text-Independent Speaker Verification Applications
    Rastoceanu, Florin
    Lazar, Marilena
    2011 6TH CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2011,