Refining Self-Supervised Learnt Speech Representation using Brain Activations

被引：0

作者：

Li, Hengyu ^{[1
]}

Mei, Kangdi ^{[1
]}

Liu, Zhaoci ^{[1
]}

Ai, Yang ^{[1
]}

Chen, Liping ^{[1
]}

Zhang, Jie ^{[1
]}

Ling, Zhenhua ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Res Ctr Speech & Language Informat Proc, Hefei, Peoples R China

来源：

INTERSPEECH 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

Pre-trained speech model; wav2vec2.0; brain activation; SUPERB;

D O I：

10.21437/Interspeech.2024-604

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

It was shown in literature that speech representations extracted by self-supervised pre-trained models exhibit similarities with brain activations of human for speech perception and fine-tuning speech representation models on downstream tasks can further improve the similarity. However, it still remains unclear if this similarity can be used to optimize the pre-trained speech models. In this work, we therefore propose to use the brain activations recorded by fMRI to refine the often-used wav2vec2.0 model by aligning model representations toward human neural responses. Experimental results on SUPERB reveal that this operation is beneficial for several downstream tasks, e.g., speaker verification, automatic speech recognition, intent classification. One can then consider the proposed method as a new alternative to improve self-supervised speech models.

引用

页码：1480 / 1484

页数：5