Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification

被引:0
|
作者
Shum, Stephen [1 ]
Dehak, Najim [1 ]
Dehak, Reda [2 ]
Glass, James R. [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 32 Vassar St, Cambridge, MA 02139 USA
[2] LRDE, Paris, France
来源
ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP | 2010年
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a new approach to unsupervised speaker adaptation inspired by the recent success of the factor analysisbased Total Variability Approach to text-independent speaker verification [1]. This approach effectively represents speaker variability in terms of low-dimensional total factor vectors and, when paired alongside the simplicity of cosine similarity scoring, allows for easy manipulation and efficient computation [2]. The development of our adaptation algorithm is motivated by the desire to have a robust method of setting an adaptation threshold, to minimize the amount of required computation for each adaptation update, and to simplify the associated score normalization procedures where possible. To address the final issue, we propose the Symmetric Normalization (S-norm) method, which takes advantage of the symmetry in cosine similarity scoring and achieves competitive performance to that of the ZT-norm while requiring fewer parameter calculations. In subsequent experiments, we also assess an attempt to replace the use of score normalization procedures altogether with a Normalized Cosine Similarity scoring function [3]. We evaluated the performance of our unsupervised speaker adaptation algorithm under various score normalization procedures on the 10sec-10sec and core conditions of the 2008 NIST SRE dataset. Using results without adaptation as our baseline, it was found that the proposed methods are consistent in successfully improving speaker verification performance to achieve state-of-the-art results.
引用
收藏
页码:76 / 82
页数:7
相关论文
共 50 条
  • [1] CHANNEL ADAPTATION OF PLDA FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Chen, Liping
    Lee, Kong Aik
    Ma, Bin
    Guo, Wu
    Li, Haizhou
    Dai, Li Rong
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5251 - 5255
  • [2] Group-based speaker embeddings for text-independent speaker verification
    Jung, Youngmoon
    Eom, Youngsik
    Lee, Yeonghyeon
    Kim, Hoirin
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 496 - 502
  • [4] A tutorial on text-independent speaker verification
    Bimbot, F. (bimbot@irisa.fr), 1600, Hindawi Publishing Corporation (2004):
  • [5] A tutorial on text-independent speaker verification
    Bimbot, F
    Bonastre, JF
    Fredouille, C
    Gravier, G
    Magrin-Chagnolleau, I
    Meignier, S
    Merlin, T
    Ortega-García, J
    Petrovska-Delacrétaz, D
    Reynolds, DA
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (04) : 430 - 451
  • [6] Discriminative transformation for sufficient adaptation in text-independent speaker verification
    Yang, Hao
    Dong, Yuan
    Zhao, Xianyu
    Zha, Jian
    Wang, Haila
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 558 - +
  • [7] A Tutorial on Text-Independent Speaker Verification
    Frédéric Bimbot
    Jean-François Bonastre
    Corinne Fredouille
    Guillaume Gravier
    Ivan Magrin-Chagnolleau
    Sylvain Meignier
    Teva Merlin
    Javier Ortega-García
    Dijana Petrovska-Delacrétaz
    Douglas A. Reynolds
    EURASIP Journal on Advances in Signal Processing, 2004
  • [8] Triplet Loss Based Cosine Similarity Metric Learning for Text-independent Speaker Recognition
    Novoselov, Sergey
    Shchemelinin, Vadim
    Shulipa, Andrey
    Kozlov, Alexandr
    Kremnev, Ivan
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2242 - 2246
  • [9] Triplet Based Embedding Distance and Similarity Learning for Text-independent Speaker Verification
    Ren, Zongze
    Chen, Zhiyong
    Xu, Shugong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 558 - 562
  • [10] Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
    Bhattacharya, Gautam
    Alam, Jahangir
    Gupta, Vishwa
    Kenny, Patrick
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3588 - 3592