A framework for speaker retrieval and identification through unsupervised learning

被引:4
作者
Campos, Victor de Abreu [1 ]
Guimaraes Pedronette, Daniel Carlos [1 ]
机构
[1] State Univ Sao Paulo, UNESP, Dept Stat Appl Math & Comp, Rio Claro, Brazil
基金
巴西圣保罗研究基金会;
关键词
Speaker recognition; Speaker retrieval; Unsupervised learning; Vector quantization; Gaussian mixture model; i-vector; IMAGE RE-RANKING; RECOGNITION; SIMILARITY; MACHINES;
D O I
10.1016/j.csl.2019.04.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker recognition is a task of remarkable relevance, with applications in diversified domains. Recently, mainly due to the facilities in audio-visual content acquisition, the capacity of analyzing growing datasets independent of labeled data has become a crucial advantage. This paper presents a speaker recognition approach based on recent unsupervised learning methods, which do not require any labeled data or user intervention. The approach is organized in terms of a framework which exploits a rank-based formulation. The similarity information defined by speaker modeling techniques is encoded in ranked lists, which are used as input by the unsupervised learning algorithms. Vector quantization, Gaussian mixture models and i-vectors are employed as modeling techniques, while the algorithms RL-Sim and ReckNN are used for unsupervised learning tasks. The framework was experimentally evaluated on query-by-example speaker retrieval and speaker identification tasks, both on clean and noisy speech recordings. An experimental evaluation was conducted on three public datasets, different languages, and recordings conditions. Effectiveness gains up to +56% on retrieval measures were obtained through the use of unsupervised learning algorithms over traditional speaker recognition techniques. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:153 / 174
页数:22
相关论文
共 50 条
  • [41] Feature selection for unsupervised learning through local learning
    Yao, Jin
    Mao, Qi
    Goodison, Steve
    Mai, Volker
    Sun, Yijun
    PATTERN RECOGNITION LETTERS, 2015, 53 : 100 - 107
  • [42] UNSUPERVISED REPRESENTATION LEARNING OF SPEECH FOR DIALECT IDENTIFICATION
    Shon, Suwon
    Hsu, Wei-Ning
    Glass, James
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 105 - 111
  • [43] Unsupervised learning method for events identification in φ-OTDR
    Jie Zhang
    Xiaoting Zhao
    Yiming Zhao
    Xiang Zhong
    Yidan Wang
    Fanchao Meng
    Jinmin Ding
    Yingli Niu
    Xinghua Zhang
    Liang Dong
    Sheng Liang
    Optical and Quantum Electronics, 2022, 54
  • [44] Acoustic animal identification using unsupervised learning
    Guerrero, Maria J.
    Bedoya, Carol L.
    Lopez, Jose D.
    Daza, Juan M.
    Isaza, Claudia
    METHODS IN ECOLOGY AND EVOLUTION, 2023, 14 (06): : 1500 - 1514
  • [45] A graph-based ranked-list model for unsupervised distance learning on shape retrieval
    Guimaraes Pedronette, Daniel Carlos
    Almeida, Jurandy
    Torres, Ricardo da S.
    PATTERN RECOGNITION LETTERS, 2016, 83 : 357 - 367
  • [46] A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments
    Jung, Youngmoon
    Choi, Yeunju
    Lim, Hyungjun
    Kim, Hoirin
    IEEE ACCESS, 2020, 8 : 175448 - 175466
  • [47] Visual Speech Detection using an Unsupervised Learning Framework
    Ahmad, Rameez
    Raza, Syed Paymaan
    Malik, Hafiz
    2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 2, 2013, : 525 - 528
  • [48] Voice Activity Detection Based on an Unsupervised Learning Framework
    Ying, Dongwen
    Yan, Yonghong
    Dang, Jianwu
    Soong, Frank K.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (08): : 2624 - 2632
  • [49] An Adaptive Unsupervised Learning Framework for Monocular Depth Estimation
    Yang, Delong
    Zhong, Xunyu
    Lin, Lixiong
    Peng, Xiafu
    IEEE ACCESS, 2019, 7 : 148142 - 148151
  • [50] Adversarial Framework for Unsupervised Learning of Motion Dynamics in Videos
    Spampinato, C.
    Palazzo, S.
    D'Oro, P.
    Giordano, D.
    Shah, M.
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (05) : 1378 - 1397