Vector-Quantized Autoregressive Predictive Coding

被引：47

作者：

Chung, Yu-An ^{[1
]}

Tang, Hao ^{[1
]}

Glass, James ^{[1
]}

机构：

[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA

来源：

INTERSPEECH 2020 | 2020年

关键词：

self-supervised learning; unsupervised learning; representation learning; vector quantization; transfer learning;

D O I：

10.21437/Interspeech.2020-1228

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Autoregressive Predictive Coding (APC), as a self-supervised objective, has enjoyed success in learning representations from large amounts of unlabeled data, and the learned representations are rich for many downstream tasks. However, the connection between low self-supervised loss and strong performance in downstream tasks remains unclear. In this work, we propose Vector-Quantized Autoregressive Predictive Coding (VQ-APC), a novel model that produces quantized representations, allowing us to explicitly control the amount of information encoded in the representations. By studying a sequence of increasingly limited models, we reveal the constituents of the learned representations. In particular, we confirm the presence of information with probing tasks, while showing the absence of information with mutual information, uncovering the model's preference in preserving speech information as its capacity becomes constrained. We find that there exists a point where phonetic and speaker information are amplified to maximize a self-supervised objective. As a byproduct, the learned codes for a particular model capacity correspond well to English phones.

引用

页码：3760 / 3764

页数：5

共 50 条

[41] Model-based monaural source separation using a vector-quantized phase-vocoder representation
Ellis, Daniel P. W.
Weiss, Ron J.
2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 5815 - 5818
[42] Improving Phoneme Recognition with Augmented Autoregressive Predictive Coding
Ullah, Asad
Ragano, Alessandro
Hines, Andrew
2023 34TH IRISH SIGNALS AND SYSTEMS CONFERENCE, ISSC, 2023,
[43] Vector-Quantized Space-Vector-Based Spread Spectrum Modulation Scheme for Multilevel Inverters Using the Principle of Oversampling ADC
Jacob, Biji
Baiju, M. R.
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2013, 60 (08) : 2969 - 2977
[44] CRANK: AN OPEN-SOURCE SOFTWARE FOR NONPARALLEL VOICE CONVERSION BASED ON VECTOR-QUANTIZED VARIATIONAL AUTOENCODER
Kobayashi, Kazuhiro
Huang, Wen-Chin
Wu, Yi-Chiao
Tobing, Patrick Lumban
Hayashi, Tomoki
Toda, Tomoki
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5934 - 5938
[45] Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution
Tuo, Zixi
Yang, Huan
Fu, Jianlong
Dun, Yujie
Qian, Xueming
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13133 - 13143
[46] Generating High-Quality F0 Embeddings Using the Vector-Quantized Variational Autoencoder
Portes, David
Horak, Ales
TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II, 2024, 15049 : 139 - 148
[47] Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks
Kamper, Herman
van Niekerk, Benjamin
INTERSPEECH 2021, 2021, : 1539 - 1543
[48] Non-adversarial Learning: Vector-Quantized Common Latent Space for Multi-sequence MRI
Han, Luyi
Tan, Tao
Zhang, Tianyu
Wang, Xin
Gao, Yuan
Lu, Chunyao
Liang, Xinglong
Dou, Haoran
Huang, Yunzhi
Mann, Ritse
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XI, 2024, 15011 : 481 - 491
[49] DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders
Liu, Yanqing
Xue, Ruiqing
He, Lei
Tan, Xu
Zhao, Sheng
INTERSPEECH 2022, 2022, : 1581 - 1585
[50] Bone-conducted Speech Enhancement Using Vector-quantized Variational Autoencoder and Gammachirp Filterbank Cepstral Coefficients
Quoc-Huy Nguyen
Unoki, Masashi
2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 21 - 25

← 1 2 3 4 5 →