SIMILARITY ANALYSIS OF SELF-SUPERVISED SPEECH REPRESENTATIONS

被引：18

作者：

Chung, Yu-An ^{[1
]}

Belinkov, Yonatan ^{[2
]}

Glass, James ^{[1
]}

机构：

[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA

[2] Technion Henry & Marilyn Taub Fac Comp Sci, IL-3200003 Haifa, Israel

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

基金：

以色列科学基金会;

关键词：

Self-supervised learning; speech representation learning; unsupervised pre-training; comparative analysis;

D O I：

10.1109/ICASSP39728.2021.9414321

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Self-supervised speech representation learning has recently been a prosperous research topic. Many algorithms have been proposed for learning useful representations from large-scale unlabeled data, and their applications to a wide range of speech tasks have also been investigated. However, there has been little research focusing on understanding the properties of existing approaches. In this work, we aim to provide a comparative study of some of the most representative self-supervised algorithms. Specifically, we quantify the similarities between different self-supervised representations using existing similarity measures. We also design probing tasks to study the correlation between the models' pre-training loss and the amount of specific speech information contained in their learned representations. In addition to showing how various self-supervised models behave differently given the same input, our study also finds that the training objective has a higher impact on representation similarity than architectural choices such as building blocks (RNN/Transformer/CNN) and directionality (uni/bidirectional). Our results also suggest that there exists a strong correlation between pre-training loss and downstream performance for some self-supervised algorithms.

引用

页码：3040 / 3044

页数：5

共 39 条

[1]

Andrew G., 2013, PMLR, V28, P1247

[2]

[Anonymous], 2017, Advances in neural information processing systems

[3]

Baevski A., 2020, ICASSP

[4]

Baevski A., 2020, NEURIPS

[5]

Bau A., 2019, ICLR

[6]

Belinkov Y., 2017, NIPS

[7]

Blandon M. A. C., 2020, ICML SAS WORKSH

[8]

Chen T., 2020, ICLR

[9]

Chung Y.-A., 2020, ACL

[10] Vector-Quantized Autoregressive Predictive Coding [J].

Chung, Yu-An ;

Tang, Hao ;

Glass, James .

INTERSPEECH 2020, 2020, :3760-3764

← 1 2 3 4 →