COMPARATIVE STUDY OF TOKENIZATION ALGORITHMS FOR END-TO-END OPEN VOCABULARY KEYWORD DETECTION

被引:2
作者
Gurugubelli, Krishna [1 ]
Mohamed, Sahil [1 ]
Krishna, Rajesh K. S. [1 ]
机构
[1] Samsung Res & Dev Inst Bangalore, Bangalore, Karnataka, India
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024) | 2024年
关键词
Custom-keyword detection; Embedding; Subword; Phoneme; Tokenization;
D O I
10.1109/ICASSP48485.2024.10445876
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The advent of Deep-Learning techniques and the increasing importance of personalization in voice assistants fueled the need for open vocabulary keyword detection systems, in which, the user can enroll a keyword using audio or text as a modality. A text enrollment-based custom-keyword detection system has to detect that the input speech signal is matched with enrolled keywordphrase or not. The methods used for tokenization of keyword phrase can alter the custom-keyword detection performance. Hence, in this study, we explore and evaluate different tokenization methods which includes, phoneme-level tokenization, character-level tokenization, and subword tokenization (Byte Pair Encoding, and Unigram). The efficiency of these methods is studied using an end-to-end custom-keyword detection framework, on five different datasets. The findings of this study reveal valuable insights into the suitability of different tokenization methods for custom-keyword detection task. From results, it is observed that proposed architecture with phoneme-based tokenization shows best detection accuracy of 99.54%, 98.12% and 90.05% on Libriphrase, Qualcomm, Google speech command datasets, respectively.
引用
收藏
页码:12431 / 12435
页数:5
相关论文
共 22 条
[1]  
[Anonymous], 2014, ARXIV
[2]  
Bluche Th eodore, 2020, ARXIV200210851
[3]   Predicting detection filters for small footprint open-vocabulary keyword spotting [J].
Bluche, Theodore ;
Gisselbrecht, Thibault .
INTERSPEECH 2020, 2020, :2552-2556
[4]   QUERY-BY-EXAMPLE KEYWORD SPOTTING SYSTEM USING MULTI-HEAD ATTENTION AND SOFTTRIPLE LOSS [J].
Huang, Jinmiao ;
Gharbieh, Waseem ;
Shim, Han Suk ;
Kim, Eugene .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :6858-6862
[5]  
Kamper H, 2020, INT CONF ACOUST SPEE, P6414, DOI [10.1109/ICASSP40776.2020.9054202, 10.1109/icassp40776.2020.9054202]
[6]  
Kim B, 2019, 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), P532, DOI [10.1109/asru46091.2019.9004014, 10.1109/ASRU46091.2019.9004014]
[7]  
Kudo T, 2018, CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, P66
[8]   Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation [J].
Lee, Jiyoung ;
Chung, Soo-Whan ;
Kim, Sunok ;
Kang, Hong-Goo ;
Sohn, Kwanghoon .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :1336-1345
[9]   A Lightweight Architecture for Query-by-Example Keyword Spotting on Low-Power IoT Devices [J].
Li, Meirong .
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2023, 69 (01) :65-75
[10]   Neural keyword confidence estimation for open-vocabulary keyword spotting [J].
Liu, Zuozhen ;
Li, Ta ;
Zhang, Pengyuan .
ELECTRONICS LETTERS, 2022, 58 (03) :133-135