A framework for speaker retrieval and identification through unsupervised learning

被引:4
作者
Campos, Victor de Abreu [1 ]
Guimaraes Pedronette, Daniel Carlos [1 ]
机构
[1] State Univ Sao Paulo, UNESP, Dept Stat Appl Math & Comp, Rio Claro, Brazil
基金
巴西圣保罗研究基金会;
关键词
Speaker recognition; Speaker retrieval; Unsupervised learning; Vector quantization; Gaussian mixture model; i-vector; IMAGE RE-RANKING; RECOGNITION; SIMILARITY; MACHINES;
D O I
10.1016/j.csl.2019.04.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker recognition is a task of remarkable relevance, with applications in diversified domains. Recently, mainly due to the facilities in audio-visual content acquisition, the capacity of analyzing growing datasets independent of labeled data has become a crucial advantage. This paper presents a speaker recognition approach based on recent unsupervised learning methods, which do not require any labeled data or user intervention. The approach is organized in terms of a framework which exploits a rank-based formulation. The similarity information defined by speaker modeling techniques is encoded in ranked lists, which are used as input by the unsupervised learning algorithms. Vector quantization, Gaussian mixture models and i-vectors are employed as modeling techniques, while the algorithms RL-Sim and ReckNN are used for unsupervised learning tasks. The framework was experimentally evaluated on query-by-example speaker retrieval and speaker identification tasks, both on clean and noisy speech recordings. An experimental evaluation was conducted on three public datasets, different languages, and recordings conditions. Effectiveness gains up to +56% on retrieval measures were obtained through the use of unsupervised learning algorithms over traditional speaker recognition techniques. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:153 / 174
页数:22
相关论文
共 50 条
  • [31] UNSUPERVISED LEARNING THROUGH SYMBOLIC CLUSTERING
    GOWDA, KC
    DIDAY, E
    PATTERN RECOGNITION LETTERS, 1991, 12 (05) : 259 - 264
  • [32] Learning Discriminative Features for Speaker Identification and Verification
    Yadav, Sarthak
    Rai, Atul
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2237 - 2241
  • [33] Unsupervised rank diffusion for content-based image retrieval
    Guimaraes Pedronette, Daniel Carlos
    Torres, Ricardo da S.
    NEUROCOMPUTING, 2017, 260 : 478 - 489
  • [34] Unsupervised similarity learning through Cartesian product of ranking references
    Valem, Lucas Pascotti
    Guimaraes Pedronette, Daniel Carlos
    Almeida, Jurandy
    PATTERN RECOGNITION LETTERS, 2018, 114 : 41 - 52
  • [35] Contrastive Learning and Inter-Speaker Distribution Alignment Based Unsupervised Domain Adaptation for Robust Speaker Verification
    Li, Zuoliang
    Guo, Wu
    Bin Gu
    Peng, Shengyu
    Zhang, Jie
    INTERSPEECH 2024, 2024, : 3794 - 3798
  • [36] EMARATI SPEAKER IDENTIFICATION
    Shahin, Ismail
    Ba-Hutair, Mohammed Nasser
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 488 - 493
  • [37] Unsupervised Learning of Saliency Concepts for Natural Image Classification and Retrieval
    Perina, A.
    Cristani, M.
    Murino, V.
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2008, 5197 : 169 - 177
  • [38] Sequential Speaker Embedding and Transfer Learning for Text-Independent Speaker Identification
    Hong, Qian-Bei
    Wu, Chung-Hsien
    Su, Ming-Hsiang
    Wang, Hsin-Min
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 827 - 832
  • [39] Speaker retrieval based on deep speaker vector
    Li, Wei
    Yang, Jichen
    He, Qianhua
    Li, Yanxiong
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2015, 43 (07): : 62 - 65
  • [40] Unsupervised Learning of Total Variability Embedding for Speaker Verification with Random Digit Strings
    Kang, Woo Hyun
    Kim, Nam Soo
    APPLIED SCIENCES-BASEL, 2019, 9 (08):