A High-Performance Neural Network SoC for End-to-End Speaker Verification

被引：0

作者：

Tsai, Tsung-Han ^{[1
]}

Chiang, Meng-Jui ^{[1
]}

机构：

[1] Natl Cent Univ, Dept Elect Engn, Taoyuan 32001, Taiwan

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Speaker verification (SV); speaker identification; x-vector; RISC-V; system-on-chip (SoC); GMM;

D O I：

10.1109/ACCESS.2024.3491780

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The use of the neural network to recognize a speaker's identity from their speech sounds has become popular in the last few years. Among these methods, the x-vector extractor, which is based on time-delay neural networks (TDNN), performs better in noise-canceling and generally achieves higher accuracy compared to previous methods such as the Gaussian mixture model (GMM) and the support vector machines (SVM). This paper presents a system-on-chip (SoC) composed of a RISC-V CPU and a neural network accelerator module for x-vector-based speaker verification (SV). To ensure real-time latency and enable the implementation of the system on edge devices, this work employs three steps for processing x-vector including size reduction, pruning, and compression. We are dedicated to optimizing the data flow with sparsity. Compared with the conventional sparse matrix compression method compressed sparse row (CSR), we propose the binary pointer compressed sparse row (BPCSR) method which significantly improves the latency and avoids the load balancing issue in each PE. We further design the neural network accelerator module that stores the compressed parameters and computes the x-vector extractor while the RISC-V CPU processes the rest of the calculations such as feature extraction and the classifier. The system was tested on the VoxCeleb dataset, containing 1251 test speakers, and achieved over 95% accuracy. Lastly, we synthesized the chip with TSMC 90 nm technology. It presents 15.5 mm2 in the area and 97.88 mW for real-time identification.

引用

页码：165482 / 165496

页数：15

共 50 条

[1] End-To-End Phonetic Neural Network Approach for Speaker Verification
Demirbag, Sedat
Erden, Mustafa
Arslan, Levent
2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
[2] DEEP NEURAL NETWORK-BASED SPEAKER EMBEDDINGS FOR END-TO-END SPEAKER VERIFICATION
Snyder, David
Ghahremani, Pegah
Povey, Daniel
Garcia-Romero, Daniel
Carmiel, Yishay
Khudanpur, Sanjeev
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 165 - 170
[3] Neural PLDA Modeling for End-to-End Speaker Verification
Ramoji, Shreyas
Krishnan, Prashant
Ganapathy, Sriram
INTERSPEECH 2020, 2020, : 4333 - 4337
[4] High-Performance End-to-End Integrity Verification on Big Data Transfer
Jung, Eun-Sung
Liu, Si
Kettimuthu, Rajkumar
Chung, Sungwook
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (08) : 1478 - 1488
[5] GENERALIZED END-TO-END LOSS FOR SPEAKER VERIFICATION
Wan, Li
Wang, Quan
Papir, Alan
Moreno, Ignacio Lopez
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4879 - 4883
[6] Effective Phase Encoding for End-to-end Speaker Verification
Peng, Junyi
Qu, Xiaoyang
Gu, Rongzhi
Wang, Jianzong
Xiao, Jing
Burget, Lukas
Cernocky, Jan ''Honza''
INTERSPEECH 2021, 2021, : 2366 - 2370
[7] Generalized End-to-End Loss for Forensic Speaker Verification
Huapeng WANG
Fangzhou HE
Lianquan WU
Journal of Systems Science and Information, 2023, 11 (02) : 264 - 276
[8] Contrastive Learning for improving End-to-end Speaker Verification
Tang, Yanxi
Wang, Jianzong
Qu, Xiaoyang
Xiao, Jing
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[9] Angular Softmax Loss for End-to-end Speaker Verification
Li, Yutian
Gao, Feng
Ou, Zhijian
Sun, Jiasong
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 190 - 194
[10] Robust End-to-End Speaker Verification Using EEG
Han, Yan
Krishna, Gautam
Tran, Co
Carnahan, Mason
Tewfik, Ahmed H.
28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 1170 - 1174

← 1 2 3 4 5 →