MULTI-TASK LEARNING WITH CROSS ATTENTION FOR KEYWORD SPOTTING

被引:3
|
作者
Higuchil, Takuya [1 ]
Gupta, Anmol [2 ]
Dhir, Chandra [1 ]
机构
[1] Apple, Cupertino, CA USA
[2] Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
关键词
keyword spotting; Transformer; multi-task learning;
D O I
10.1109/ASRU51503.2021.9687967
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keyword spotting (KWS) is an important technique for speech applications, which enables users to activate devices by speaking a keyword phrase. Although a phoneme classifier can be used for KWS, exploiting a large amount of transcribed data for automatic speech recognition (ASR), there is a mismatch between the training criterion (phoneme recognition) and the target task (KWS). Recently, multi-task learning has been applied to KWS to exploit both ASR and KWS training data. In this approach, an output of an acoustic model is split into two branches for the two tasks, one for phoneme transcription trained with the ASR data and one for keyword classification trained with the KWS data. In this paper, we introduce a cross attention decoder in the multitask learning framework. Unlike the conventional multi-task learning approach with the simple split of the output layer, the cross attention decoder summarizes information from a phonetic encoder by performing cross attention between the encoder outputs and a trainable query sequence to predict a confidence score for the KWS task. Experimental results on KWS tasks show that the proposed approach achieves a 12% relative reduction in the false reject ratios compared to the conventional multi-task learning with split branches and a bi-directional long short-team memory decoder.
引用
收藏
页码:571 / 578
页数:8
相关论文
共 50 条
  • [1] Personalized Keyword Spotting through Multi-task Learning
    Yang, Seunghan
    Kim, Byeonggeun
    Chung, Inseop
    Chang, Simyung
    INTERSPEECH 2022, 2022, : 1881 - 1885
  • [2] Multi-task learning and Weighted Cross-entropy for DNN-based Keyword Spotting
    Panchapagesan, Sankaran
    Sun, Ming
    Khare, Aparna
    Mandal, Spyros Matsoukas Arindam
    Hoffineister, Bjorn
    Vitaladevuni, Shiv
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 760 - 764
  • [3] Combined Keyword Spotting and Localization Network Based on Multi-Task Learning
    Ko, Jungbeom
    Kim, Hyunchul
    Kim, Jungsuk
    MATHEMATICS, 2024, 12 (21)
  • [4] Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting
    Xingwei Liang
    Zehua Zhang
    Ruifeng Xu
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [5] Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting
    Liang, Xingwei
    Zhang, Zehua
    Xu, Ruifeng
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [6] Multi-Task ConvMixer Networks with Triplet Attention for Low-Resource Keyword Spotting
    Kivaisi, Alexander Rogath
    Zhao, Qingjie
    Zou, Yuanbing
    TSINGHUA SCIENCE AND TECHNOLOGY, 2025, 30 (02): : 875 - 893
  • [7] Multi-task learning for simultaneous script identification and keyword spotting in document images
    Cheikhrouhou, Ahmed
    Kessentini, Yousri
    Kanoun, Slim
    PATTERN RECOGNITION, 2021, 113
  • [8] SEQUENTIAL CROSS ATTENTION BASED MULTI-TASK LEARNING
    Kim, Sunkyung
    Choi, Hyesong
    Min, Dongbo
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2311 - 2315
  • [9] Cross-task Attention Mechanism for Dense Multi-task Learning
    Lopes, Ivan
    Tuan-Hung Vu
    de Charette, Raoul
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2328 - 2337
  • [10] Stratified Multi-Task Learning for Robust Spotting of Scene Texts
    Dasgupta, Kinjal
    Das, Sudip
    Bhattacharya, Ujjwal
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3130 - 3137