COMPARATIVE STUDY OF TOKENIZATION ALGORITHMS FOR END-TO-END OPEN VOCABULARY KEYWORD DETECTION

被引：2

作者：

Gurugubelli, Krishna ^{[1
]}

Mohamed, Sahil ^{[1
]}

Krishna, Rajesh K. S. ^{[1
]}

机构：

[1] Samsung Res & Dev Inst Bangalore, Bangalore, Karnataka, India

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024) | 2024年

关键词：

Custom-keyword detection; Embedding; Subword; Phoneme; Tokenization;

D O I：

10.1109/ICASSP48485.2024.10445876

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The advent of Deep-Learning techniques and the increasing importance of personalization in voice assistants fueled the need for open vocabulary keyword detection systems, in which, the user can enroll a keyword using audio or text as a modality. A text enrollment-based custom-keyword detection system has to detect that the input speech signal is matched with enrolled keywordphrase or not. The methods used for tokenization of keyword phrase can alter the custom-keyword detection performance. Hence, in this study, we explore and evaluate different tokenization methods which includes, phoneme-level tokenization, character-level tokenization, and subword tokenization (Byte Pair Encoding, and Unigram). The efficiency of these methods is studied using an end-to-end custom-keyword detection framework, on five different datasets. The findings of this study reveal valuable insights into the suitability of different tokenization methods for custom-keyword detection task. From results, it is observed that proposed architecture with phoneme-based tokenization shows best detection accuracy of 99.54%, 98.12% and 90.05% on Libriphrase, Qualcomm, Google speech command datasets, respectively.

引用

页码：12431 / 12435

页数：5

共 22 条

[1]

[Anonymous], 2014, ARXIV

[2]

Bluche Th eodore, 2020, ARXIV200210851

[3] Predicting detection filters for small footprint open-vocabulary keyword spotting [J].

Bluche, Theodore ;

Gisselbrecht, Thibault .

INTERSPEECH 2020, 2020, :2552-2556

[4] QUERY-BY-EXAMPLE KEYWORD SPOTTING SYSTEM USING MULTI-HEAD ATTENTION AND SOFTTRIPLE LOSS [J].

Huang, Jinmiao ;

Gharbieh, Waseem ;

Shim, Han Suk ;

Kim, Eugene .

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :6858-6862

[5]

Kamper H, 2020, INT CONF ACOUST SPEE, P6414, DOI [10.1109/ICASSP40776.2020.9054202, 10.1109/icassp40776.2020.9054202]

[6]

Kim B, 2019, 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), P532, DOI [10.1109/asru46091.2019.9004014, 10.1109/ASRU46091.2019.9004014]

[7]

Kudo T, 2018, CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, P66

[8] Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation [J].

Lee, Jiyoung ;

Chung, Soo-Whan ;

Kim, Sunok ;

Kang, Hong-Goo ;

Sohn, Kwanghoon .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :1336-1345

[9] A Lightweight Architecture for Query-by-Example Keyword Spotting on Low-Power IoT Devices [J].

Li, Meirong .

IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2023, 69 (01) :65-75

[10] Neural keyword confidence estimation for open-vocabulary keyword spotting [J].

Liu, Zuozhen ;

Li, Ta ;

Zhang, Pengyuan .

ELECTRONICS LETTERS, 2022, 58 (03) :133-135

← 1 2 3 →