On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation

被引：0

作者：

Yang, Gene-Ping ^{[1
]}

Gu, Yue ^{[2
]}

Tang, Qingming ^{[2
]}

Du, Dongsu ^{[2
]}

Liu, Yuzong ^{[3
]}

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland

[2] Amazon, Alexa Perceptual Technol, Seattle, WA USA

[3] Zoom Video Commun Inc, San Jose, CA USA

来源：

INTERSPEECH 2023 | 2023年

关键词：

self-supervised learning; knowledge distillation; dual-view cross-correlation; keyword spotting; on-device;

D O I：

10.21437/Interspeech.2023-2362

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Large self-supervised models are effective feature extractors, but their application is challenging under on-device budget constraints and biased dataset collection, especially in keyword spotting. To address this, we proposed a knowledge distillation-based self-supervised speech representation learning (S3RL) architecture for on-device keyword spotting. Our approach used a teacher-student framework to transfer knowledge from a larger, more complex model to a smaller, light-weight model using dual-view cross-correlation distillation and the teacher's codebook as learning objectives. We evaluated our model's performance on an Alexa keyword spotting detection task using a 16.6k-hour in-house dataset. Our technique showed exceptional performance in normal and noisy conditions, demonstrating the efficacy of knowledge distillation methods in constructing self-supervised models for keyword spotting tasks while working within on-device resource constraints.

引用

页码：1623 / 1627

页数：5

共 30 条

[1] Baevski A., 2020, wav2vec 2.0: A framework for self-supervised learning of speech representations
[2] Chang H.-J., 2022, ICASSP
[3] Chen S., 2022, IEEE Journal of Selected Topics in Signal Processing (JSTSP)
[4] Chung Y.-A., 2021, ASRU
[5] An Unsupervised Autoregressive Model for Speech Representation Learning
Chung, Yu-An
Hsu, Wei-Ning
Tang, Hao
Glass, James
[J]. INTERSPEECH 2019, 2019, : 146 - 150
[6] Hinton G., 2015, DISTILLING KNOWLEDGE
[7] Hsu W.-N., 2021, IEEE ACM T AUDIO SPE
[8] Huang K.-P., 2023, ARXIV230212757
[9] Jiang D., 2019, ARXIV191009932
[10] Jiang D., 2021, ICASSP

← 1 2 3 →