Selective transfer subspace learning for small-footprint end-to-end cross-domain keyword spotting

被引：0

作者：

Ma, Fei ^{[1
]}

Wang, Chengliang ^{[1
]}

Li, Xusheng ^{[1
]}

Zeng, Zhuo ^{[1
]}

机构：

[1] Chongqing Univ, Shazheng St 174, Chongqing 400044, Peoples R China

来源：

SPEECH COMMUNICATION | 2024年 / 156卷

关键词：

Cross-domain keyword spotting; Weighted maximum mean discrepancy; Active selection; Negative transfer; SPEECH; ALIGNMENT; SYSTEM;

D O I：

10.1016/j.specom.2023.103019

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In small -footprint end -to -end keyword spotting, it is often expensive and time-consuming to acquire sufficient labels in various speech scenarios. To overcome this problem, transfer learning leverages the rich knowledge of the auxiliary domain to annotate the unlabeled target data. However, most existing transfer learning methods typically learn a domain -invariant feature representation while ignoring the negative transfer problem. In this paper, we propose a new and general cross -domain keyword spotting framework called selective transfer subspace learning (STSL) that avoid negative transfer and dramatically improve the accuracy for cross -domain keyword spotting by actively selecting appropriate source samples. Specifically, STSL first aligns geometrical relationship and weighted distribution discrepancy to learn a domain -invariant projection subspace. Then, it actively selects appropriate source samples that are similar to the target domain for transfer learning to avoid negative transfer. Finally, we formulate a minimization problem that alternately optimizes the projection subspace and source active selection, giving an effective optimization. Experimental results on 10 groups of cross -domain keyword spotting tasks show that our STSL outperforms some state-of-the-art transfer learning methods and no transfer learning methods.

引用

页数：11

共 42 条

[1] Principal component analysis [J].

Abdi, Herve ;

Williams, Lynne J. .

WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04) :433-459

[2]

Belkin M, 2002, ADV NEUR IN, V14, P585

[3]

Berg A., 2021, arXiv

[4]

Chen GG, 2014, INT CONF ACOUST SPEE

[5]

Duan LX, 2012, PROC CVPR IEEE, P1338, DOI 10.1109/CVPR.2012.6247819

[6]

Espejo I.L., 2023, IEEE INT C ACOUSTICS

[7] Improving HMM-Based Keyword Spotting with Character Language Models [J].

Fischer, Andreas ;

Frinken, Volkmar ;

Bunke, Horst ;

Suen, Ching Y. .

2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, :506-510

[8]

Gong BQ, 2012, PROC CVPR IEEE, P2066, DOI 10.1109/CVPR.2012.6247911

[9] PROGRESSIVE CONTINUAL LEARNING FOR SPOKEN KEYWORD SPOTTING [J].

Huang, Yizheng ;

Hou, Nana ;

Chen, Nancy F. .

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :7552-7556

[10]

Long MS, 2015, PR MACH LEARN RES, V37, P97

← 1 2 3 4 5 →