MULTI-STREAM CONVOLUTIONAL NEURAL NETWORK WITH FREQUENCY SELECTION FOR ROBUST SPEAKER VERIFICATION

被引：0

作者：

Yao, Wei ^{[1
]}

Chen, Shen ^{[2
]}

Cui, Jiamin ^{[1
]}

Lou, Yaolin ^{[1
]}

机构：

[1] Zhejiang Univ Water Resources & Elect Power, Coll Elect Engn, Key Lab Technol Rural Water Management Zhejiang Pr, Hangzhou, Peoples R China

[2] Wanbang Digital Energy Co Ltd China, Hangzhou, Peoples R China

来源：

COMPUTING AND INFORMATICS | 2024年 / 43卷 / 04期

关键词：

Deep learning; speaker verification; convolutional neural network; mul-; ti-stream; frequency selection;

D O I：

10.31577/cai20244819

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speaker verification aims to verify whether an input speech corresponds to the claimed speaker, and conventionally, this kind of system is deployed based on single-stream scenario, wherein the feature extractor operates in full frequency range. In this paper, we hypothesize that machine can learn enough knowledge to do classification task when listening to partial frequency range instead of full frequency range, which is so called frequency selection technique, and further propose a novel framework of multi-stream Convolutional Neural Network (CNN) with this technique for speaker verification tasks. The proposed framework accommodates diverse temporal embeddings generated from multiple streams to enhance the robustness of acoustic modeling. For the diversity of temporal embeddings, we consider feature augmentation with frequency selection, which is to manually segment the full-band of frequency into several sub-bands, and the feature extractor of each stream can select which sub-bands to use as target frequency domain. Different from conventional single-stream solution wherein each utterance would only be processed for one time, in this framework, there are multiple streams processing it in parallel. The input utterance for each stream is pre-processed by a frequency selector within specified frequency range, and post-processed by mean normalization. The normalized temporal embeddings of each stream will flow into a pooling layer to generate fused embeddings. We conduct extensive experiments on VoxCeleb dataset, and the experimental results demonstrate that multi-stream CNN significantly outperforms single-stream baseline with 20.53% of relative improvement in minimum Decision Cost Function (minDCF) and 15.28% of relative improvement in Equal Error Rate (EER).

引用

页码：819 / 848

页数：30

共 50 条

[1] Multi-Stream Convolutional Neural Network for SAR Automatic Target Recognition
Zhao, Pengfei
Liu, Kai
Zou, Hao
Zhen, Xiantong
REMOTE SENSING, 2018, 10 (09)
[2] Evaluation of a noise-robust multi-stream speaker verification method using F0 information
Asami, Taichi
Iwano, Koji
Furui, Sadaoki
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03) : 549 - 557
[3] Multi-stream convolutional neural network-based fault diagnosis for variable frequency drives in sustainable manufacturing systems
Grezmak, John
Zhang, Jianjing
Wang, Peng
Gao, Robert X.
SUSTAINABLE MANUFACTURING - HAND IN HAND TO SUSTAINABILITY ON GLOBE, 2020, 43 : 511 - 518
[4] A multi-stream convolutional neural network for sEMG-based gesture recognition in muscle-computer interface
Wei, Wentao
Wong, Yongkang
Du, Yu
Hu, Yu
Kankanhalli, Mohan
Geng, Weidong
PATTERN RECOGNITION LETTERS, 2019, 119 : 131 - 138
[5] Automatic Modulation Classification Using a Deep Multi-Stream Neural Network
Zhang, Hao
Wang, Yan
Xu, Lingwei
Gulliver, T. Aaron
Cao, Conghui
IEEE ACCESS, 2020, 8 : 43888 - 43897
[6] Empowering Speaker Verification with Deep Convolutional Neural Network Vectors
Hourri, Soufiane
STUDIES IN INFORMATICS AND CONTROL, 2024, 33 (02): : 97 - 107
[7] Where are the People? A Multi-Stream Convolutional Neural Network for Crowd Counting via Density Map from Complex Images
Ttito, Darwin
Quispe, Rodolfo
Rivera, Adin Ramfrez
Pedrini, Helio
PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP 2019), 2019, : 241 - 246
[8] Multi-stream fish detection in unconstrained underwater videos by the fusion of two convolutional neural network detectors
Abdelouahid Ben Tamou
Abdesslam Benzinou
Kamal Nasreddine
Applied Intelligence, 2021, 51 : 5809 - 5821
[9] Multi-stream fish detection in unconstrained underwater videos by the fusion of two convolutional neural network detectors
Ben Tamou, Abdelouahid
Benzinou, Abdesslam
Nasreddine, Kamal
APPLIED INTELLIGENCE, 2021, 51 (08) : 5809 - 5821
[10] Multi-stream Information-Based Neural Network for Mammogram Mass Segmentation
Li, Zhilin
Deng, Zijian
Chen, Li
Gui, Yu
Cai, Zhigang
Liao, Jianwei
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT I, 2022, 13529 : 267 - 278

← 1 2 3 4 5 →