Custom Mandarin Keyword Spotting with Extended Long Short-Term Memory

被引:0
|
作者
Cao, Haitao [1 ]
Liu, Xi [1 ]
Tan, Zhiguo [1 ]
Yang, Zhenlun [1 ]
Qin, Xin [2 ]
机构
[1] School of Information Engineering, Guangzhou Panyu Polytechnic, Guangzhou,511483, China
[2] Institute of Big Data and Internet Innovation, Hunan University of Technology and Business, Changsha,410205, China
关键词
Deep neural networks;
D O I
暂无
中图分类号
学科分类号
摘要
In real-world scenarios, Deep Neural Network (DNN)-powered Keyword Spotting (KWS) systems are typically engineered as lightweight architectures, optimizing for superior performance and low computational complexity in resource-limited devices. However, such lightweight designs often encounter limitations in generalization, particularly when it comes to customizing keywords. This paper presents a twostage method to customize a Mandarin KWS system rapidly. First, we propose an embedding model to learn the embedding representations of general Mandarin keywords. Subsequently, we facilitate keyword customization with the generalization capability of embedding models through few-shot transfer learning. To improve performance further, in the embedding model, we introduce two scale blocks to fuse acoustic features and employ an Enhanced Extended Long Short-Term Memory (ExLSTM) as the backbone. Experimental results on both English and Mandarin keyword datasets highlight the advantages of the proposed embedding model. In addition, we conduct keyword customization on a self-recorded dataset containing 10 Mandarin keywords. The impressive average accuracy of 97.45% with merely five target samples demonstrates the effectiveness of our method. © (2024), (International Association of Engineers). All rights reserved.
引用
收藏
页码:1933 / 1942
相关论文
共 35 条
  • [1] On the Initialization of Long Short-Term Memory Networks
    Ghazi, Mostafa Mehdipour
    Nielsen, Mads
    Pai, Akshay
    Modat, Marc
    Cardoso, M. Jorge
    Ourselin, Sebastien
    Sorensen, Lauge
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT I, 2019, 11953 : 275 - 286
  • [2] Bidirectional Long Short-Term Memory Network for Vehicle Behavior Recognition
    Zhu, Jiasong
    Sun, Ke
    Jia, Sen
    Lin, Weidong
    Hou, Xianxu
    Liu, Bozhi
    Qiu, Guoping
    REMOTE SENSING, 2018, 10 (06)
  • [3] Hybrid convolutional long short-term memory models for sales forecasting in retail
    Moraes, Thais de Castro
    Yuan, Xue-Ming
    Chew, Ek Peng
    JOURNAL OF FORECASTING, 2024, 43 (05) : 1278 - 1293
  • [4] Prediction of Sea Ice Motion With Convolutional Long Short-Term Memory Networks
    Petrou, Zisis I.
    Tian, Yingli
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (09): : 6865 - 6876
  • [5] A Deep Neural Network Model for Short-Term Load Forecast Based on Long Short-Term Memory Network and Convolutional Neural Network
    Tian, Chujie
    Ma, Jian
    Zhang, Chunhong
    Zhan, Panpan
    ENERGIES, 2018, 11 (12)
  • [6] Monaural Source Separation in Complex Domain With Long Short-Term Memory Neural Network
    Sun, Yang
    Xian, Yang
    Wang, Wenwu
    Naqvi, Syed Mohsen
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (02) : 359 - 369
  • [7] MONAURAL SPEECH ENHANCEMENT BASED ON TWO STAGE LONG SHORT-TERM MEMORY NETWORKS
    Xian, Yang
    Sun, Yang
    Wang, Wenwu
    Naqvi, Syed Mohsen
    2019 13TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2019,
  • [8] Modeling Speaker Variability Using Long Short-Term Memory Networks for Speech Recognition
    Li, Xiangang
    Wu, Xihong
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1086 - 1090
  • [9] Separating overlapping bat calls with a bi-directional long short-term memory network
    Zhang, Kangkang
    Liu, Tong
    Song, Shengjing
    Zhao, Xin
    Sun, Shijun
    Metzner, Walter
    Feng, Jiang
    Liu, Ying
    INTEGRATIVE ZOOLOGY, 2022, 17 (05): : 741 - 751
  • [10] ALSTM: An attention-based long short-term memory framework for knowledge base reasoning
    Wang, Qi
    Hao, Yongsheng
    NEUROCOMPUTING, 2020, 399 : 342 - 351