A High-Performance Neural Network SoC for End-to-End Speaker Verification

被引:0
|
作者
Tsai, Tsung-Han [1 ]
Chiang, Meng-Jui [1 ]
机构
[1] Natl Cent Univ, Dept Elect Engn, Taoyuan 32001, Taiwan
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Speaker verification (SV); speaker identification; x-vector; RISC-V; system-on-chip (SoC); GMM;
D O I
10.1109/ACCESS.2024.3491780
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The use of the neural network to recognize a speaker's identity from their speech sounds has become popular in the last few years. Among these methods, the x-vector extractor, which is based on time-delay neural networks (TDNN), performs better in noise-canceling and generally achieves higher accuracy compared to previous methods such as the Gaussian mixture model (GMM) and the support vector machines (SVM). This paper presents a system-on-chip (SoC) composed of a RISC-V CPU and a neural network accelerator module for x-vector-based speaker verification (SV). To ensure real-time latency and enable the implementation of the system on edge devices, this work employs three steps for processing x-vector including size reduction, pruning, and compression. We are dedicated to optimizing the data flow with sparsity. Compared with the conventional sparse matrix compression method compressed sparse row (CSR), we propose the binary pointer compressed sparse row (BPCSR) method which significantly improves the latency and avoids the load balancing issue in each PE. We further design the neural network accelerator module that stores the compressed parameters and computes the x-vector extractor while the RISC-V CPU processes the rest of the calculations such as feature extraction and the classifier. The system was tested on the VoxCeleb dataset, containing 1251 test speakers, and achieved over 95% accuracy. Lastly, we synthesized the chip with TSMC 90 nm technology. It presents 15.5 mm2 in the area and 97.88 mW for real-time identification.
引用
收藏
页码:165482 / 165496
页数:15
相关论文
共 50 条
  • [1] End-To-End Phonetic Neural Network Approach for Speaker Verification
    Demirbag, Sedat
    Erden, Mustafa
    Arslan, Levent
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [2] DEEP NEURAL NETWORK-BASED SPEAKER EMBEDDINGS FOR END-TO-END SPEAKER VERIFICATION
    Snyder, David
    Ghahremani, Pegah
    Povey, Daniel
    Garcia-Romero, Daniel
    Carmiel, Yishay
    Khudanpur, Sanjeev
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 165 - 170
  • [3] Neural PLDA Modeling for End-to-End Speaker Verification
    Ramoji, Shreyas
    Krishnan, Prashant
    Ganapathy, Sriram
    INTERSPEECH 2020, 2020, : 4333 - 4337
  • [4] High-Performance End-to-End Integrity Verification on Big Data Transfer
    Jung, Eun-Sung
    Liu, Si
    Kettimuthu, Rajkumar
    Chung, Sungwook
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (08) : 1478 - 1488
  • [5] GENERALIZED END-TO-END LOSS FOR SPEAKER VERIFICATION
    Wan, Li
    Wang, Quan
    Papir, Alan
    Moreno, Ignacio Lopez
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4879 - 4883
  • [6] Effective Phase Encoding for End-to-end Speaker Verification
    Peng, Junyi
    Qu, Xiaoyang
    Gu, Rongzhi
    Wang, Jianzong
    Xiao, Jing
    Burget, Lukas
    Cernocky, Jan ''Honza''
    INTERSPEECH 2021, 2021, : 2366 - 2370
  • [7] Generalized End-to-End Loss for Forensic Speaker Verification
    Huapeng WANG
    Fangzhou HE
    Lianquan WU
    Journal of Systems Science and Information, 2023, 11 (02) : 264 - 276
  • [8] Contrastive Learning for improving End-to-end Speaker Verification
    Tang, Yanxi
    Wang, Jianzong
    Qu, Xiaoyang
    Xiao, Jing
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [9] Angular Softmax Loss for End-to-end Speaker Verification
    Li, Yutian
    Gao, Feng
    Ou, Zhijian
    Sun, Jiasong
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 190 - 194
  • [10] Robust End-to-End Speaker Verification Using EEG
    Han, Yan
    Krishna, Gautam
    Tran, Co
    Carnahan, Mason
    Tewfik, Ahmed H.
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 1170 - 1174