On the Issue of Calibration in DNN-based Speaker Recognition Systems

被引:5
|
作者
McLaren, Mitchell [1 ]
Castan, Diego [1 ]
Ferrer, Luciana [2 ,3 ]
Lawson, Aaron [1 ]
机构
[1] SRI Int, Speech Technol & Res Lab, 333 Ravenswood Ave, Menlo Pk, CA 94025 USA
[2] Univ Buenos Aires, FCEN, Dept Comp, Buenos Aires, DF, Argentina
[3] Consejo Nacl Invest Cient & Tecn, Buenos Aires, DF, Argentina
关键词
speaker recognition; mismatch; calibration; deep neural network; bottleneck features;
D O I
10.21437/Interspeech.2016-1134
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provide extremely good performance for constrained trial conditions that are well matched to development conditions. However, when applied to unseen conditions or a wide variety of conditions, some DNN-based techniques offer poor calibration performance. Through analysis on both PRISM and the recently released Speakers in the Wild (SITW) corpora, we illustrate that bottleneck features hinder calibration if used in the calculation of first-order Baum Welch statistics during i-vector extraction. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation. This framework not only addresses the issue of calibration, but provides a more computationally efficient system based on bottleneck features with improved discriminative power.
引用
收藏
页码:1825 / 1829
页数:5
相关论文
共 50 条
  • [21] DNN-based Amplitude and Phase Feature Enhancement for Noise Robust Speaker Identification
    Oo, Zeyan
    Kawakami, Yuta
    Wang, Longbiao
    Nakagawa, Seiichi
    Xiao, Xiong
    Iwahashi, Masahiro
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2204 - 2208
  • [22] Investigation of DNN-Based Audio-Visual Speech Recognition
    Tamura, Satoshi
    Ninomiya, Hiroshi
    Kitaoka, Norihide
    Osuga, Shin
    Iribe, Yurie
    Takeda, Kazuya
    Hayamizu, Satoru
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2444 - 2451
  • [23] Speaker adaptation in DNN-based speech synthesis using d-vectors
    Doddipatla, Rama
    Braunschweiler, Norbert
    Maia, Ranniery
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3404 - 3408
  • [24] Evaluating and Improving Adversarial Attacks on DNN-Based Modulation Recognition
    Zhao, Haojun
    Lin, Yun
    Gao, Song
    Yu, Shui
    2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
  • [25] Unsupervised Speaker Adaptation for DNN-based Speech Synthesis using Input Codes
    Takaki, Shinji
    Nishimura, Yoshikazu
    Yamagishi, Junichi
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 649 - 658
  • [26] Scores Calibration in Speaker Recognition Systems
    Shulipa, Andrey
    Novoselov, Sergey
    Matveev, Yuri
    SPEECH AND COMPUTER, 2016, 9811 : 596 - 603
  • [27] DronePaint: Swarm Light Painting with DNN-based Gesture Recognition
    Serpiva, Valerii
    Karmanova, Ekaterina
    Fedoseev, Aleksey
    Perminov, Stepan
    Tsetserukou, Dzmitry
    SIGGRAPH '21: ACM SIGGRAPH 2021 EMERGING TECHNOLOGIES, 2021,
  • [28] INVESTIGATING DOMAIN SENSITIVITY OF DNN EMBEDDINGS FOR SPEAKER RECOGNITION SYSTEMS
    Rahman, Md Hafizur
    Himawan, Ivan
    Sridharan, Sridha
    Fookes, Clinton
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5811 - 5815
  • [29] On Parameter Adaptation in Softmax-based Cross-Entropy Loss for Improved Convergence Speed and Accuracy in DNN-based Speaker Recognition
    Rybicka, Magdalena
    Kowalczyk, Konrad
    INTERSPEECH 2020, 2020, : 3805 - 3809
  • [30] ENVIRONMENT AWARE SPEAKER DIARIZATION FOR MOVING TARGETS USING PARALLEL DNN-BASED RECOGNIZERS
    Najafian, Maryam
    Hansen, John H. L.
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5450 - 5454