Speaker Verification by Partial AUC Optimization With Mahalanobis Distance Metric Learning

被引:12
|
作者
Bai, Zhongxin [1 ,2 ]
Zhang, Xiao-Lei [1 ,2 ]
Chen, Jingdong [3 ]
机构
[1] Northwestern Polytech Univ, Ctr Intelligent Acoust & Immers Commun CIAIC, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China
[3] Northwestern Polytech Univ, CIAIC, Xian 710072, Peoples R China
基金
以色列科学基金会;
关键词
Measurement; Feature extraction; Acoustics; Detection algorithms; Training; Speech processing; Optimization; Metric learning; pAUC; speaker verification; squared Mahalanobis distance; NONLINEAR TRANSFORMATIONS; PLDA; VECTORS;
D O I
10.1109/TASLP.2020.2990275
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Receiver operating characteristic (ROC) and detection error tradeoff (DET) curves are two widely used evaluation metrics for speaker verification. They are equivalent since the latter can be obtained by transforming the former's true positive y-axis to false negative y-axis and then re-scaling both axes by a probit operator. Real-world speaker verification systems, however, usually work on part of the ROC curve instead of the entire ROC curve given an application. Therefore, we propose in this article to use the area under part of the ROC curve (pAUC) as a more efficient evaluation metric for speaker verification. A Mahalanobis distance metric learning based back-end is applied to optimize pAUC, where the Mahalanobis distance metric learning guarantees that the optimization objective of the back-end is a convex one so that the global optimum solution is achievable. To improve the performance of the state-of-the-art speaker verification systems by the proposed back-end, we further propose two feature preprocessing techniques based on length-normalization and probabilistic linear discriminant analysis respectively. We evaluate the proposed systems on the major languages of NIST SRE16 and the core tasks of SITW. Experimental results show that the proposed back-end outperforms the state-of-the-art speaker verification back-ends in terms of seven evaluation metrics.
引用
收藏
页码:1533 / 1548
页数:16
相关论文
共 50 条
  • [1] PARTIAL AUC OPTIMIZATION BASED DEEP SPEAKER EMBEDDINGS WITH CLASS-CENTER LEARNING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Bai, Zhongxin
    Zhang, Xiao-Lei
    Chen, Jingdong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6819 - 6823
  • [2] aDCF Loss Function for Deep Metric Learning in End-to-End Text-Dependent Speaker Verification Systems
    Mingote, Victoria
    Miguel, Antonio
    Ribas, Dayana
    Ortega, Alfonso
    Lleida, Eduardo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 772 - 784
  • [3] Cosine metric learning based speaker verification
    Bai, Zhongxin
    Zhang, Xiao-Lei
    Chen, Jingdong
    SPEECH COMMUNICATION, 2020, 118 : 10 - 20
  • [4] SAR Image Change Detection via Spatial Metric Learning With an Improved Mahalanobis Distance
    Wang, Rongfang
    Chen, Jia-Wei
    Wang, Yule
    Jiao, Licheng
    Wang, Mi
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (01) : 77 - 81
  • [5] LEARNING A MAHALANOBIS DISTANCE METRIC VIA REGULARIZED LDA FOR SCENE RECOGNITION
    Wu, Meng
    Zhou, Jun
    Sun, Jun
    2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 3125 - 3128
  • [6] CDMA: CROSS-DOMAIN DISTANCE METRIC ADAPTATION FOR SPEAKER VERIFICATION
    Li, Jianchen
    Han, Jiqing
    Song, Hongwei
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7197 - 7201
  • [7] Semi-supervised distributed clustering with Mahalanobis distance metric learning
    Yuecheng Y.
    Jiandong W.
    Guansheng Z.
    Bin G.
    International Journal of Digital Content Technology and its Applications, 2010, 4 (09) : 132 - 140
  • [8] Weakly Supervised AUC Optimization: A Unified Partial AUC Approach
    Xie, Zheng
    Liu, Yu
    He, Hao-Yuan
    Li, Ming
    Zhou, Zhi-Hua
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (07) : 4780 - 4795
  • [9] Vietnamese Speaker Verification With Mel-Scale Filter Bank Energies and Deep Learning
    Nguyen, Thi-Thanh-Mai
    Nguyen, Duc-Dung
    Luong, Chi-Mai
    IEEE ACCESS, 2024, 12 : 150114 - 150122
  • [10] Coupling a Generative Model With a Discriminative Learning Framework for Speaker Verification
    Lu, Xugang
    Shen, Peng
    Tsao, Yu
    Kawai, Hisashi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3631 - 3641