A framework for speaker retrieval and identification through unsupervised learning

被引:4
|
作者
Campos, Victor de Abreu [1 ]
Guimaraes Pedronette, Daniel Carlos [1 ]
机构
[1] State Univ Sao Paulo, UNESP, Dept Stat Appl Math & Comp, Rio Claro, Brazil
来源
COMPUTER SPEECH AND LANGUAGE | 2019年 / 58卷 / 153-174期
基金
巴西圣保罗研究基金会;
关键词
Speaker recognition; Speaker retrieval; Unsupervised learning; Vector quantization; Gaussian mixture model; i-vector; IMAGE RE-RANKING; RECOGNITION; SIMILARITY; MACHINES;
D O I
10.1016/j.csl.2019.04.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker recognition is a task of remarkable relevance, with applications in diversified domains. Recently, mainly due to the facilities in audio-visual content acquisition, the capacity of analyzing growing datasets independent of labeled data has become a crucial advantage. This paper presents a speaker recognition approach based on recent unsupervised learning methods, which do not require any labeled data or user intervention. The approach is organized in terms of a framework which exploits a rank-based formulation. The similarity information defined by speaker modeling techniques is encoded in ranked lists, which are used as input by the unsupervised learning algorithms. Vector quantization, Gaussian mixture models and i-vectors are employed as modeling techniques, while the algorithms RL-Sim and ReckNN are used for unsupervised learning tasks. The framework was experimentally evaluated on query-by-example speaker retrieval and speaker identification tasks, both on clean and noisy speech recordings. An experimental evaluation was conducted on three public datasets, different languages, and recordings conditions. Effectiveness gains up to +56% on retrieval measures were obtained through the use of unsupervised learning algorithms over traditional speaker recognition techniques. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:153 / 174
页数:22
相关论文
共 50 条
  • [1] An Unsupervised Distance Learning Framework for Multimedia Retrieval
    Valem, Lucas Pascotti
    Guimaraes Pedronette, Daniel Carlos
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 107 - 111
  • [2] An Iterative Framework for Unsupervised Learning in the PLDA based Speaker Verification
    Liu, Wenbo
    Yu, Zhiding
    Li, Ming
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 78 - +
  • [3] Opinion retrieval through unsupervised topological learning
    Rogovschi, Nicoleta
    Grozavu, Nistor
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 3130 - 3134
  • [4] Unsupervised Feature Learning for Writer Identification and Writer Retrieval
    Christlein, Vincent
    Gropp, Martin
    Fiel, Stefan
    Maier, Andreas
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 991 - 997
  • [5] Unsupervised Speaker Identification for TV News
    Woo, Daniel N.
    Aygun, Ramazan S.
    IEEE MULTIMEDIA, 2016, 23 (04) : 50 - 58
  • [6] UNSUPERVISED CROSS-MODAL RETRIEVAL THROUGH ADVERSARIAL LEARNING
    He, Li
    Xu, Xing
    Lu, Huimin
    Yang, Yang
    Shen, Fumin
    Shen, Heng Tao
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1153 - 1158
  • [7] An Unsupervised Neural Prediction Framework for Learning Speaker Embeddings using Recurrent Neural Networks
    Jati, Arindam
    Georgiou, Panayiotis
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1131 - 1135
  • [8] Deep Hashing for Speaker Identification and Retrieval
    Fan, Lei
    Jiang, Qing-Yuan
    Yu, Ya-Qi
    Li, Wu-Jun
    INTERSPEECH 2019, 2019, : 2908 - 2912
  • [9] A MULTITASK LEARNING FRAMEWORK FOR SPEAKER CHANGE DETECTION WITH CONTENT INFORMATION FROM UNSUPERVISED SPEECH DECOMPOSITION
    Su, Hang
    Zhao, Danyang
    Dang, Long
    Li, Minglei
    Wu, Xixin
    Liu, Xunying
    Meng, Helen
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8087 - 8091
  • [10] Unsupervised Arabic Speech Embedding Model for Speaker Identification
    Al Roken, Noora
    Hussain, Abir
    Shahin, Ismail
    Turky, Ayad
    Khan, Bilal
    Khan, Wasiq
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,