Model-Agnostic Meta-Learning for Fast Text-Dependent Speaker Embedding Adaptation

被引：5

作者：

Lin, Weiwei ^{[1
]}

Mak, Man-Wai ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2023年 / 31卷

基金：

中国国家自然科学基金;

关键词：

Deep speaker embedding; text-dependent speaker verification; meta-learning; model adaptation; MAML;

D O I：

10.1109/TASLP.2023.3275029

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

By constraining the lexical content of input speech, text-dependent speaker verification (TD-SV) offers more reliable performance than text-independent speaker verification (TI-SV) when dealing with short utterances. Because speech with constrained lexical content is harder to collect, often TD models are fine-tuned from a TI model using a small target phrase dataset. However, sometimes the target phrase dataset is too tiny for fine-tuning, which is the main obstacle for deploying TD-SV. One solution is to fine-tune the model using medium-size multi-phrase TD data and then deploy the model on the target phrase. Although this strategy does help in some cases, the performance is still sub-optimal because the model is not optimized for the target phrase. Inspired by the recent progress in meta-learning, we propose a three-stage pipeline for adapting a TI model to a TD model for the target phrase. Firstly, a TI model is trained using a large amount of speech data. Then, we use a multi-phrase TD dataset to tune the TI model via model-agnostic meta-learning. Finally, we perform fast adaptation using a small target phrase dataset. Results show that the three-stage pipeline consistently outperforms multi-phrase and target phrase fine-tuning.

引用

页码：1866 / 1876

页数：11

共 46 条

[1]

Abdel-Hamid O, 2013, INT CONF ACOUST SPEE, P7942, DOI 10.1109/ICASSP.2013.6639211

[2]

Antoniou A., 2019, PROC INT C LEARN REP

[3] Speaker recognition based on deep learning: An overview [J].

Bai, Zhongxin ;

Zhang, Xiao-Lei .

NEURAL NETWORKS, 2021, 140 :65-99

[4]

Chen Ting, 2019, PMLR

[5]

Chen Y., 2019, PROC 7 INT C LEARN R

[6]

Chung JS, 2018, INTERSPEECH, P1086

[7] Front-End Factor Analysis for Speaker Verification [J].

Dehak, Najim ;

Kenny, Patrick J. ;

Dehak, Reda ;

Dumouchel, Pierre ;

Ouellet, Pierre .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798

[8]

Finn C, 2017, PR MACH LEARN RES, V70

[9]

Garcia-Romero D, 2011, 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, P256

[10] The vanishing gradient problem during learning recurrent neural nets and problem solutions [J].

Hochreiter, S .

INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 1998, 6 (02) :107-116

← 1 2 3 4 5 →