Combined Keyword Spotting and Localization Network Based on Multi-Task Learning

被引：0

作者：

Ko, Jungbeom ^{[1
]}

Kim, Hyunchul ^{[2
]}

Kim, Jungsuk ^{[3
]}

机构：

[1] Gachon Univ, Gachon Adv Inst Hlth Sci & Technol GAIHST, Dept Hlth Sci & Technol, Incheon 21936, South Korea

[2] Univ Calif Berkeley, Sch Informat, 102 South Hall 4600, Berkeley, CA 94720 USA

[3] Gachon Univ, Coll IT Convergence, Dept Biomed Engn, Seongnam Si 13120, South Korea

来源：

MATHEMATICS | 2024年 / 12卷 / 21期

基金：

新加坡国家研究基金会;

关键词：

deep neural network; keyword spotting; sound source localization; multi-task learning;

D O I：

10.3390/math12213309

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

The advent of voice assistance technology and its integration into smart devices has facilitated many useful services, such as texting and application execution. However, most assistive technologies lack the capability to enable the system to act as a human who can localize the speaker and selectively spot meaningful keywords. Because keyword spotting (KWS) and sound source localization (SSL) are essential and must operate in real time, the efficiency of a neural network model is crucial for memory and computation. In this paper, a single neural network model for KWS and SSL is proposed to overcome the limitations of sequential KWS and SSL, which require more memory and inference time. The proposed model uses multi-task learning to utilize the limited resources of the device efficiently. A shared encoder is used as the initial layer to extract common features from the multichannel audio data. Subsequently, the task-specific parallel layers utilize these features for KWS and SSL. The proposed model was evaluated on a synthetic dataset with multiple speakers, and a 7-module shared encoder structure was identified as optimal in terms of accuracy, direction of arrival (DOA) accuracy, DOA error, and latency. It achieved a KWS accuracy of 94.51%, DOA error of 12.397 degrees, and DOA accuracy of 89.86%. Consequently, the proposed model requires significantly less memory owing to the shared network architecture, which enhances the inference time without compromising KWS accuracy, DOA error, and DOA accuracy.

引用

页数：14

共 50 条

[41] Dynamic Multi-Task Learning with Convolutional Neural Network
Fang, Yuchun
Ma, Zhengyan
Zhang, Zhaoxiang
Zhang, Xu-Yao
Bai, Xiang
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1668 - 1674
[42] Multi-task Transfer Learning for Bayesian Network Structures
Benikhlef, Sarah
Leray, Philippe
Raschia, Guillaume
Ben Messaoud, Montassar
Sakly, Fayrouz
SYMBOLIC AND QUANTITATIVE APPROACHES TO REASONING WITH UNCERTAINTY, ECSQARU 2021, 2021, 12897 : 217 - 228
[43] Keyword spotting based on syllable confusion network
Zhang, Pengyuan
Shao, Jian
Zhao, Qingwei
Yan, Yonghong
ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 2, PROCEEDINGS, 2007, : 656 - +
[44] An Efficient Hierarchical Optic Disc and Cup Segmentation Network Combined with Multi-task Learning and Adversarial Learning
Ying Wang
Xiaosheng Yu
Chengdong Wu
Journal of Digital Imaging, 2022, 35 : 638 - 653
[45] A multi-task learning network for skin disease classification
Wang, W.
Wang, Y.
Zhao, S.
Chen, X.
JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2022, 142 (08) : S52 - S52
[46] Deep Multi-Task Learning for Joint Localization, Perception, and Prediction
Phillips, John
Martinez, Julieta
Barsan, Ioan Andrei
Casas, Sergio
Sadat, Abbas
Urtasun, Raquel
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4677 - 4687
[47] Keyword spotting based on recurrent neural network
Zhou, JL
Liu, J
Song, YT
Yu, TC
ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 710 - 713
[48] A novel multi-task learning technique for offline handwritten short answer spotting and recognition
Das, Abhijit
Suwanwiwat, Hemmaphan
Pal, Umapada
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (18) : 53441 - 53465
[49] End-to-End Multi-task Learning Regression Network for Fovea Localization in Fundus Images
Huang, Limin
Lei, Haijun
Liu, Weixin
Li, Zhen
Xie, Hai
Lei, Baiying
2022 IEEE 35TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2022, : 389 - 393
[50] A novel multi-task learning technique for offline handwritten short answer spotting and recognition
Abhijit Das
Hemmaphan Suwanwiwat
Umapada Pal
Multimedia Tools and Applications, 2024, 83 : 53441 - 53465

← 1 2 3 4 5 →