MULTI-TASK LEARNING WITH CROSS ATTENTION FOR KEYWORD SPOTTING

被引:3
|
作者
Higuchil, Takuya [1 ]
Gupta, Anmol [2 ]
Dhir, Chandra [1 ]
机构
[1] Apple, Cupertino, CA USA
[2] Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
关键词
keyword spotting; Transformer; multi-task learning;
D O I
10.1109/ASRU51503.2021.9687967
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keyword spotting (KWS) is an important technique for speech applications, which enables users to activate devices by speaking a keyword phrase. Although a phoneme classifier can be used for KWS, exploiting a large amount of transcribed data for automatic speech recognition (ASR), there is a mismatch between the training criterion (phoneme recognition) and the target task (KWS). Recently, multi-task learning has been applied to KWS to exploit both ASR and KWS training data. In this approach, an output of an acoustic model is split into two branches for the two tasks, one for phoneme transcription trained with the ASR data and one for keyword classification trained with the KWS data. In this paper, we introduce a cross attention decoder in the multitask learning framework. Unlike the conventional multi-task learning approach with the simple split of the output layer, the cross attention decoder summarizes information from a phonetic encoder by performing cross attention between the encoder outputs and a trainable query sequence to predict a confidence score for the KWS task. Experimental results on KWS tasks show that the proposed approach achieves a 12% relative reduction in the false reject ratios compared to the conventional multi-task learning with split branches and a bi-directional long short-team memory decoder.
引用
收藏
页码:571 / 578
页数:8
相关论文
共 50 条
  • [31] Attention-based Multi-task Learning for Sensor Analytics
    Chen, Yujing
    Rangwala, Huzefa
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2187 - 2196
  • [32] Dispatched attention with multi-task learning for nested mention recognition
    Fei, Hao
    Ren, Yafeng
    Ji, Donghong
    INFORMATION SCIENCES, 2020, 513 : 241 - 251
  • [33] CommuSpotter: Scene Text Spotting with Multi-Task Communication
    Zhao, Liang
    Wilsbacher, Greg
    Wang, Song
    APPLIED SCIENCES-BASEL, 2023, 13 (23):
  • [34] A novel multi-task learning technique for offline handwritten short answer spotting and recognition
    Das, Abhijit
    Suwanwiwat, Hemmaphan
    Pal, Umapada
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (18) : 53441 - 53465
  • [35] A novel multi-task learning technique for offline handwritten short answer spotting and recognition
    Abhijit Das
    Hemmaphan Suwanwiwat
    Umapada Pal
    Multimedia Tools and Applications, 2024, 83 : 53441 - 53465
  • [36] GRAPH ATTENTION AND INTERACTION NETWORK WITH MULTI-TASK LEARNING FOR FACT VERIFICATION
    Yang, Rui
    Wang, Runze
    Ling, Zhen-Hua
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7838 - 7842
  • [37] AMP: Multi-Task Transfer Learning via Leveraging Attention Mechanism on Task Embeddings
    Yu, Yangyang
    Wang, Keru
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2025, 39 (02)
  • [38] Multiple object tracking based on multi-task learning with strip attention
    Song, Yaoye
    Zhang, Peng
    Huang, Wei
    Zha, Yufei
    You, Tao
    Zhang, Yanning
    IET IMAGE PROCESSING, 2021, 15 (14) : 3661 - 3673
  • [39] Federated Multi-task Learning with Hierarchical Attention for Sensor Data Analytics
    Chen, Yujing
    Ning, Yue
    Chai, Zheng
    Rangwala, Huzefa
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [40] Multi-task Learning with Attention for End-to-end Autonomous Driving
    Ishihara, Keishi
    Kanervisto, Anssi
    Miura, Jun
    Hautamaki, Ville
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2896 - 2905