Deep Speaker Embeddings for Short-Duration Speaker Verification

被引:105
|
作者
Bhattacharya, Gautam [1 ,2 ]
Alam, Jahangir [2 ]
Kenny, Patrick [2 ]
机构
[1] McGill Univ, Montreal, PQ, Canada
[2] Comp Res Inst Montreal, Montreal, PQ, Canada
关键词
speaker recognition; convolutional neural networks; deep learning; i-vectors;
D O I
10.21437/Interspeech.2017-1575
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of a state-of-the-art speaker verification system is severely degraded when it is presented with trial recordings of short duration. In this work we propose to use deep neural networks to learn short-duration speaker embeddings. We focus on the 5s-5s condition, wherein both sides of a verification trial are 5 seconds long. In our previous work we established that learning a non-linear mapping from i-vectors to speaker labels is beneficial for speaker verification [1]. In this work we take the idea of learning a speaker classifier one step further - we apply deep neural networks directly to time-frequency speech representations. We propose two feedforward network architectures for this task. Our hest model is based on a deep convolutional architecture wherein recordings are treated as images. From our experimental findings we advocate treating utterances as images or 'speaker snapshots, much like in face recognition. Our convolutional speaker embeddings perform significantly better than i-vectors when scoring is done using cosine distance, where the relative improvement is 23.5%. The proposed deep embeddings combined with cosine distance also outperform a state-of-the-art i-vector verification system by 1%, providing further empirical evidence in favor of our learned speaker features.
引用
收藏
页码:1517 / 1521
页数:5
相关论文
共 50 条
  • [1] On Deep Speaker Embeddings for Speaker Verification
    Jakubec, Maros
    Jarina, Roman
    Lieskovska, Eva
    Chmulik, Michal
    2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, : 162 - 166
  • [2] Deep Discriminative Embeddings for Duration Robust Speaker Verification
    Li, Na
    Tuo, Deyi
    Su, Dan
    Li, Zhifeng
    Yu, Dong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2262 - 2266
  • [3] Deep Speaker Embeddings for Speaker Verification of Children
    Abed, Mohammed Hamzah
    Sztaho, David
    TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II, 2024, 15049 : 58 - 69
  • [4] Transfer Learning for Speaker Verification with Short-Duration Audio
    Fathima, Noor
    Simha, J. B.
    Abhi, Shinu
    SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 5, SMARTCOM 2024, 2024, 949 : 195 - 205
  • [5] PHONE ADAPTIVE TRAINING FOR SHORT-DURATION SPEAKER VERIFICATION
    Soldi, Giovanni
    Bozonnet, Simon
    Beaugeant, Christophe
    Evans, Nicholas
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2107 - 2111
  • [6] A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments
    Jung, Youngmoon
    Choi, Yeunju
    Lim, Hyungjun
    Kim, Hoirin
    IEEE ACCESS, 2020, 8 : 175448 - 175466
  • [7] Consideration of Varying Training Lengths for Short-Duration Speaker Verification
    Ko, WooSeok
    Um, Seyun
    Piao, Zhenyu
    Kang, Hong-goo
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 139 - 144
  • [8] The Sogou System for Short-duration Speaker Verification Challenge 2021
    Yan, Jie
    Yao, Shengyu
    Pan, Yiqian
    Chen, Wei
    INTERSPEECH 2021, 2021, : 2327 - 2331
  • [9] The SJTU System for Short-duration Speaker Verification Challenge 2021
    Han, Bing
    Chen, Zhengyang
    Zhou, Zhikai
    Qian, Yanmin
    INTERSPEECH 2021, 2021, : 2332 - 2336
  • [10] The TalTech Systems for the Short-duration Speaker Verification Challenge 2020
    Alumae, Tanel
    Valk, Jorgen
    INTERSPEECH 2020, 2020, : 746 - 750