MIXTURE FACTORIZED AUTO-ENCODER FOR UNSUPERVISED HIERARCHICAL DEEP FACTORIZATION OF SPEECH SIGNAL

被引:0
|
作者
Peng, Zhiyuan [1 ]
Feng, Siyuan [1 ]
Lee, Tan [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Elect Engn, Hong Kong, Peoples R China
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
关键词
unsupervised deep factorization; mixture factorized auto-encoder; speaker verification; unsupervised subword modeling;
D O I
10.1109/icassp40776.2020.9054595
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech signal is constituted and contributed by various informative factors, such as linguistic content and speaker characteristic. There have been notable recent studies attempting to factorize speech signal into these individual factors without requiring any annotation. These studies typically assume continuous representation for linguistic content, which is not in accordance with general linguistic knowledge and may make the extraction of speaker information less successful. This paper proposes the mixture factorized auto-encoder (mFAE) for unsupervised deep factorization. The encoder part of mFAE comprises a frame tokenizer and an utterance embedder. The frame tokenizer models linguistic content of input speech with a discrete categorical distribution. It performs frame clustering by assigning each frame a soft mixture label. The utterance embedder generates an utterance-level vector representation. A frame decoder serves to reconstruct speech features from the encoders' outputs. The mFAE is evaluated on speaker verification (SV) task and unsupervised subword modeling (USM) task. The SV experiments on VoxCeleb 1 show that the utterance embedder is capable of extracting speaker-discriminative embeddings with performance comparable to a x-vector baseline. The USM experiments on ZeroSpeech 2017 dataset verify that the frame tokenizer is able to capture linguistic content and the utterance embedder can acquire speaker-related information.
引用
收藏
页码:6774 / 6778
页数:5
相关论文
共 50 条
  • [1] A Novel Sparse Auto-Encoder for Deep Unsupervised Learning
    Jiang, Xiaojuan
    Zhang, Yinghua
    Zhang, Wensheng
    Xiao, Xian
    2013 SIXTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2013, : 256 - 261
  • [2] Adversarial auto-encoder for unsupervised deep domain adaptation
    Shao, Rui
    Lan, Xiangyuan
    IET IMAGE PROCESSING, 2019, 13 (14) : 2772 - 2777
  • [3] Unsupervised deep feature representation using adversarial auto-encoder
    Cai, Jinyu
    Wang, Shiping
    Guo, Wenzhong
    2019 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER PHYSICAL SYSTEMS (ICPS 2019), 2019, : 749 - 754
  • [4] The Unsupervised Hierarchical Convolutional Sparse Auto-Encoder for Neuroimaging Data Classification
    Han, Xiaobing
    Zhong, Yanfei
    He, Lifang
    Yu, Philip S.
    Zhang, Liangpei
    BRAIN INFORMATICS AND HEALTH (BIH 2015), 2015, 9250 : 156 - 166
  • [5] Research on speech emotion recognition based on deep auto-encoder
    Wang, Fei
    Ye, Xiaofeng
    Sun, Zhaoyu
    Huang, Yujia
    Zhang, Xing
    Shang, Shengxing
    2016 IEEE INTERNATIONAL CONFERENCE ON CYBER TECHNOLOGY IN AUTOMATION, CONTROL, AND INTELLIGENT SYSTEMS (CYBER), 2016, : 308 - 312
  • [6] Binary Coding of Speech Spectrograms Using a Deep Auto-encoder
    Deng, L.
    Seltzer, M.
    Yu, D.
    Acero, A.
    Mohamed, A.
    Hinton, G.
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1692 - +
  • [7] Unsupervised Deep Spectrum Sensing: A Variational Auto-Encoder Based Approach
    Xie, Jiandong
    Fang, Jun
    Liu, Chang
    Yang, Linxiao
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (05) : 5307 - 5319
  • [8] Unsupervised Text Feature Learning via Deep Variational Auto-encoder
    Liu, Genggeng
    Xie, Lin
    Chen, Chi-Hua
    INFORMATION TECHNOLOGY AND CONTROL, 2020, 49 (03): : 421 - 437
  • [9] Unsupervised embedded feature learning for deep clustering with stacked sparse auto-encoder
    Cai, Jinyu
    Wang, Shiping
    Guo, Wenzhong
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 186
  • [10] Unsupervised image segmentation via Stacked Denoising Auto-encoder and hierarchical patch indexing
    Yu, Jun
    Huang, Di
    Wei, Zhongliang
    SIGNAL PROCESSING, 2018, 143 : 346 - 353