Vector-Quantized Autoregressive Predictive Coding

被引:47
|
作者
Chung, Yu-An [1 ]
Tang, Hao [1 ]
Glass, James [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
来源
INTERSPEECH 2020 | 2020年
关键词
self-supervised learning; unsupervised learning; representation learning; vector quantization; transfer learning;
D O I
10.21437/Interspeech.2020-1228
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Autoregressive Predictive Coding (APC), as a self-supervised objective, has enjoyed success in learning representations from large amounts of unlabeled data, and the learned representations are rich for many downstream tasks. However, the connection between low self-supervised loss and strong performance in downstream tasks remains unclear. In this work, we propose Vector-Quantized Autoregressive Predictive Coding (VQ-APC), a novel model that produces quantized representations, allowing us to explicitly control the amount of information encoded in the representations. By studying a sequence of increasingly limited models, we reveal the constituents of the learned representations. In particular, we confirm the presence of information with probing tasks, while showing the absence of information with mutual information, uncovering the model's preference in preserving speech information as its capacity becomes constrained. We find that there exists a point where phonetic and speaker information are amplified to maximize a self-supervised objective. As a byproduct, the learned codes for a particular model capacity correspond well to English phones.
引用
收藏
页码:3760 / 3764
页数:5
相关论文
共 50 条
  • [41] Model-based monaural source separation using a vector-quantized phase-vocoder representation
    Ellis, Daniel P. W.
    Weiss, Ron J.
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 5815 - 5818
  • [42] Improving Phoneme Recognition with Augmented Autoregressive Predictive Coding
    Ullah, Asad
    Ragano, Alessandro
    Hines, Andrew
    2023 34TH IRISH SIGNALS AND SYSTEMS CONFERENCE, ISSC, 2023,
  • [43] Vector-Quantized Space-Vector-Based Spread Spectrum Modulation Scheme for Multilevel Inverters Using the Principle of Oversampling ADC
    Jacob, Biji
    Baiju, M. R.
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2013, 60 (08) : 2969 - 2977
  • [44] CRANK: AN OPEN-SOURCE SOFTWARE FOR NONPARALLEL VOICE CONVERSION BASED ON VECTOR-QUANTIZED VARIATIONAL AUTOENCODER
    Kobayashi, Kazuhiro
    Huang, Wen-Chin
    Wu, Yi-Chiao
    Tobing, Patrick Lumban
    Hayashi, Tomoki
    Toda, Tomoki
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5934 - 5938
  • [45] Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution
    Tuo, Zixi
    Yang, Huan
    Fu, Jianlong
    Dun, Yujie
    Qian, Xueming
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13133 - 13143
  • [46] Generating High-Quality F0 Embeddings Using the Vector-Quantized Variational Autoencoder
    Portes, David
    Horak, Ales
    TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II, 2024, 15049 : 139 - 148
  • [47] Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks
    Kamper, Herman
    van Niekerk, Benjamin
    INTERSPEECH 2021, 2021, : 1539 - 1543
  • [48] Non-adversarial Learning: Vector-Quantized Common Latent Space for Multi-sequence MRI
    Han, Luyi
    Tan, Tao
    Zhang, Tianyu
    Wang, Xin
    Gao, Yuan
    Lu, Chunyao
    Liang, Xinglong
    Dou, Haoran
    Huang, Yunzhi
    Mann, Ritse
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XI, 2024, 15011 : 481 - 491
  • [49] DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders
    Liu, Yanqing
    Xue, Ruiqing
    He, Lei
    Tan, Xu
    Zhao, Sheng
    INTERSPEECH 2022, 2022, : 1581 - 1585
  • [50] Bone-conducted Speech Enhancement Using Vector-quantized Variational Autoencoder and Gammachirp Filterbank Cepstral Coefficients
    Quoc-Huy Nguyen
    Unoki, Masashi
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 21 - 25