Combined Keyword Spotting and Localization Network Based on Multi-Task Learning

被引:0
|
作者
Ko, Jungbeom [1 ]
Kim, Hyunchul [2 ]
Kim, Jungsuk [3 ]
机构
[1] Gachon Univ, Gachon Adv Inst Hlth Sci & Technol GAIHST, Dept Hlth Sci & Technol, Incheon 21936, South Korea
[2] Univ Calif Berkeley, Sch Informat, 102 South Hall 4600, Berkeley, CA 94720 USA
[3] Gachon Univ, Coll IT Convergence, Dept Biomed Engn, Seongnam Si 13120, South Korea
基金
新加坡国家研究基金会;
关键词
deep neural network; keyword spotting; sound source localization; multi-task learning;
D O I
10.3390/math12213309
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The advent of voice assistance technology and its integration into smart devices has facilitated many useful services, such as texting and application execution. However, most assistive technologies lack the capability to enable the system to act as a human who can localize the speaker and selectively spot meaningful keywords. Because keyword spotting (KWS) and sound source localization (SSL) are essential and must operate in real time, the efficiency of a neural network model is crucial for memory and computation. In this paper, a single neural network model for KWS and SSL is proposed to overcome the limitations of sequential KWS and SSL, which require more memory and inference time. The proposed model uses multi-task learning to utilize the limited resources of the device efficiently. A shared encoder is used as the initial layer to extract common features from the multichannel audio data. Subsequently, the task-specific parallel layers utilize these features for KWS and SSL. The proposed model was evaluated on a synthetic dataset with multiple speakers, and a 7-module shared encoder structure was identified as optimal in terms of accuracy, direction of arrival (DOA) accuracy, DOA error, and latency. It achieved a KWS accuracy of 94.51%, DOA error of 12.397 degrees, and DOA accuracy of 89.86%. Consequently, the proposed model requires significantly less memory owing to the shared network architecture, which enhances the inference time without compromising KWS accuracy, DOA error, and DOA accuracy.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Multi-task Feature Learning Based Anomaly Detection of Network Dataflow
    Ren Hui-feng
    Yan Feng
    Dong Qing-chao
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 4144 - 4147
  • [22] Nuclear mass based on the multi-task learning neural network method
    Ming, Xing-Chen
    Zhang, Hong-Fei
    Xu, Rui-Rui
    Sun, Xiao-Dong
    Tian, Yuan
    Ge, Zhi-Gang
    NUCLEAR SCIENCE AND TECHNIQUES, 2022, 33 (05)
  • [23] Nuclear mass based on the multi-task learning neural network method
    Xing-Chen Ming
    Hong-Fei Zhang
    Rui-Rui Xu
    Xiao-Dong Sun
    Yuan Tian
    Zhi-Gang Ge
    Nuclear Science and Techniques, 2022, 33
  • [24] Multi-task gradient descent for multi-task learning
    Lu Bai
    Yew-Soon Ong
    Tiantian He
    Abhishek Gupta
    Memetic Computing, 2020, 12 : 355 - 369
  • [25] Multi-task gradient descent for multi-task learning
    Bai, Lu
    Ong, Yew-Soon
    He, Tiantian
    Gupta, Abhishek
    MEMETIC COMPUTING, 2020, 12 (04) : 355 - 369
  • [26] Contrastive Learning based Multi-task Network for Image Manipulation Detection
    Yin, Qilin
    Wang, Jinwei
    Lu, Wei
    Luo, Xiangyang
    SIGNAL PROCESSING, 2022, 201
  • [27] Nuclear mass based on the multi-task learning neural network method
    Xing-Chen Ming
    Hong-Fei Zhang
    Rui-Rui Xu
    Xiao-Dong Sun
    Yuan Tian
    Zhi-Gang Ge
    NuclearScienceandTechniques, 2022, 33 (04) : 95 - 102
  • [28] Image Inpainting Detection Based on Multi-task Deep Learning Network
    Wang, Xinyi
    Niu, Shaozhang
    Wang, He
    IETE TECHNICAL REVIEW, 2021, 38 (01) : 149 - 157
  • [29] Wi-Fi Indoor Localization based on Multi-Task Deep Learning
    Lin, Wei-Yuan
    Huang, Ching-Chun
    Nguyen-Tran Duc
    Hung-Nguyen Manh
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [30] Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting
    Xingwei Liang
    Zehua Zhang
    Ruifeng Xu
    EURASIP Journal on Audio, Speech, and Music Processing, 2023