DEEP SPEAKER REPRESENTATION USING ORTHOGONAL DECOMPOSITION AND RECOMBINATION FOR SPEAKER VERIFICATION

被引:0
|
作者
Kim, Insoo [1 ]
Kim, Kyuhong [1 ]
Kim, Jiwhan [1 ]
Choi, Changkyu [1 ]
机构
[1] Samsung Adv Inst Technol, Suwon, South Korea
关键词
speaker verification; speaker embedding; orthogonal vector pooling; deep learning; CNN;
D O I
10.1109/icassp.2019.8683332
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech signal contains intrinsic and extrinsic variations such as accent, emotion, dialect, phoneme, speaking manner, noise, music, and reverberation. Some of these variations are unnecessary and are unspecified factors of variation. These factors lead to increased variability in speaker representation. In this paper, we assume that unspecified factors of variation exist in speaker representations, and we attempt to minimize variability in speaker representation. The key idea is that a primal speaker representation can be decomposed into orthogonal vectors and these vectors are recombined by using deep neural networks ( DNN) to reduce speaker representation variability, yielding performance improvement for speaker verification ( SV). The experimental results show that our proposed approach produces a relative equal error rate ( EER) reduction of 47.1% compared to the use of the same convolutional neural network ( CNN) architecture on the VoxCeleb dataset. Furthermore, our proposed method provides significant improvement for short utterances.
引用
收藏
页码:6126 / 6130
页数:5
相关论文
共 50 条
  • [1] On Deep Speaker Embeddings for Speaker Verification
    Jakubec, Maros
    Jarina, Roman
    Lieskovska, Eva
    Chmulik, Michal
    2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, : 162 - 166
  • [2] Deep Speaker Embeddings for Speaker Verification of Children
    Abed, Mohammed Hamzah
    Sztaho, David
    TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II, 2024, 15049 : 58 - 69
  • [3] SPEAKER VERIFICATION USING SPARSE REPRESENTATION CLASSIFICATION
    Kua, Jia Min Karen
    Ambikairajah, Eliathamby
    Epps, Julien
    Togneri, Roberto
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4548 - 4551
  • [4] Speaker verification using mixture decomposition discrimination
    Sukkar, RA
    Gandhi, MB
    Setlur, AR
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (03): : 292 - 299
  • [5] Speaker Verification with Deep Features
    Liu, Yuan
    Fu, Tianfan
    Fan, Yuchen
    Qian, Yanmin
    Yu, Kai
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 747 - 753
  • [6] Deep Speaker Embeddings for Short-Duration Speaker Verification
    Bhattacharya, Gautam
    Alam, Jahangir
    Kenny, Patrick
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1517 - 1521
  • [7] Deep speaker embeddings for Speaker Verification: Review and experimental comparison
    Jakubec, Maros
    Jarina, Roman
    Lieskovska, Eva
    Kasak, Peter
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
  • [8] MODELLING SPEAKER AND CHANNEL VARIABILITY USING DEEP NEURAL NETWORKS FOR ROBUST SPEAKER VERIFICATION
    Bhattacharya, Gautam
    Alam, Jahangir
    Kenny, Patrick
    Gupta, Vishwa
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 192 - 198
  • [9] PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification
    Zheng, Siqi
    Suo, Hongbin
    Chen, Qian
    INTERSPEECH 2022, 2022, : 1431 - 1435
  • [10] DEEP NEURAL NETWORK TRAINED WITH SPEAKER REPRESENTATION FOR SPEAKER NORMALIZATION
    Tang, Yun
    Mohan, Aanchan
    Rose, Richard C.
    Ma, Chengyuan
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,