Curricular SincNet: Towards Robust Deep Speaker Recognition by Emphasizing Hard Samples in Latent Space

被引:1
|
作者
Chowdhury, Labib [1 ]
Kamal, Mustafa [1 ]
Hasan, Najia [1 ]
Mohammed, Nabeel [1 ]
机构
[1] North South Univ, Dept Elect & Comp Engn, Dhaka, Bangladesh
来源
PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE OF THE BIOMETRICS SPECIAL INTEREST GROUP (BIOSIG 2021) | 2021年 / 315卷
关键词
Biometric Authentication; Speaker Recognition; Angular Margin Loss; Curriculum Learning;
D O I
10.1109/BIOSIG52210.2021.9548296
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning models have become an increasingly preferred option for biometric recognition systems, such as speaker recognition. SincNet, a deep neural network architecture, gained popularity in speaker recognition tasks due to its parameterized sinc functions that allow it to work directly on the speech signal. The original SincNet architecture uses the softmax loss, which may not be the most suitable choice for recognition-based tasks. Such loss functions do not impose interclass margins nor differentiate between easy and hard training samples. Curriculum learning, particularly those leveraging angular margin-based losses, has proven very successful in other biometric applications such as face recognition. The advantage of such a curriculum learning-based techniques is that it will impose inter-class margins as well as taking to account easy and hard samples. In this paper, we propose Curricular SincNet(CL-SincNet), an improved SincNet model where we use a curricular loss function to train the SincNet architecture. The proposed model is evaluated on multiple datasets using intra-dataset and inter-dataset evaluation protocols. In both settings, the model performs competitively with other previously published work. In the case of inter-dataset testing, it achieves the best overall results with a reduction of 4% error rate compare to SincNet and other published work.
引用
收藏
页数:4
相关论文
共 7 条
  • [1] Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss
    Chowdhury, Labib
    Zunair, Hasib
    Mohammed, Nabeel
    APPLIED SCIENCES-BASEL, 2020, 10 (21): : 1 - 17
  • [2] Hard-Mask Missing Feature Theory for Robust Speaker Recognition
    Lim, Shin-Cheol
    Jang, Sei-Jin
    Lee, Soek-Pil
    Kim, Moo Young
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2011, 57 (03) : 1245 - 1250
  • [3] TOWARDS NOISE-ROBUST SPEAKER RECOGNITION USING PROBABILISTIC LINEAR DISCRIMINANT ANALYSIS
    Lei, Yun
    Burget, Lukas
    Ferrer, Luciana
    Graciarena, Martin
    Scheffer, Nicolas
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4253 - 4256
  • [4] Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition
    Xiang, Xu
    Wang, Shuai
    Huang, Houjun
    Qian, Yanmin
    Yu, Kai
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1652 - 1656
  • [5] Robust Speaker Recognition Using MAP Estimation of Additive Noise in i-vectors Space
    Ben Kheder, Waad
    Matrouf, Driss
    Bousquet, Pierre-Michel
    Bonastre, Jean-Francois
    Ajili, Moez
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2014, 2014, 8791 : 97 - 107
  • [6] Robust Hearing-Impaired Speaker Recognition from Speech using Deep Learning Networks in Native Language
    Chelliah, Jeyalakshmi
    Benny, KiranBala
    Arunachalam, Revathi
    Balasubramanian, Viswanathan
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2023, 20 (01) : 102 - 112
  • [7] TOWARDS PLDA-RBM BASED SPEAKER RECOGNITION IN MOBILE ENVIRONMENT: DESIGNING STACKED/DEEP PLDA-RBM SYSTEMS
    Nautsch, Andreas
    Hao, Hong
    Stafylakis, Themos
    Rathgeb, Christian
    Busch, Christoph
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5055 - 5059