A focus module-based lightweight end-to-end CNN framework for voiceprint recognition

被引：9

作者：

Velayuthapandian, Karthikeyan ^{[1
]}

Subramoniam, Suja Priyadharsini ^{[2
]}

机构：

[1] Mepco Schlenk Engn Coll, Dept Elect & Commun Engn, Sivakasi, Tamil Nadu, India

[2] Anna Univ Reg Campus, Dept Elect & Commun Engn, Tirunelveli, Tamil Nadu, India

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2023年 / 17卷 / 06期

关键词：

Speaker recognition; Deep neural network; Spectrogram; 1-D CNN; Focus module; SUPPORT VECTOR MACHINES; SPEAKER; SYSTEM;

D O I：

10.1007/s11760-023-02500-7

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The process of identifying a spokesperson from a collection of subsequent time series data is referred to as speaker identification. Convolutional neural networks (CNNs) and deep neural networks are the two types of neural networks that are used in the majority of modern experimental approaches. This work presents a CNN model for speaker identification using a jump-connected one-dimensional convolutional neural network (1-D CNN) with a focus module (FM). The 1-D convolutional layer integrated with FM is employed in the presented model for speaker characteristic extraction and lessens heterogeneity in the temporal and spatial domains, allowing for quicker layer processing. Furthermore, the layered CNN hopping interconnection is employed to overcome the connectivity glitches, and a solution based on softmax loss and smooth L1-norm combined regulation is presented to increase efficiency. The recommended network model was evaluated using the ELSDSR, TIMIT, NIST, 16,000 PCM, and experimental audio datasets. According to experimental data, the equal error rate (EER) of end-to-end CNN for voiceprint identification is 9.02% higher than baseline approaches. In experiments, our proposed speaker recognition (SR) model, which we refer to as the deep FM-1D CNN, had a high recognition accuracy of 99.21%. Moreover, the observations demonstrate that the proposed network model is more robust than other models.

引用

页码：2817 / 2825

页数：9

共 50 条

[41] E2E-DASR: End-to-end deep learning-based dysarthric automatic speech recognition
Almadhor, Ahmad
Irfan, Rizwana
Gao, Jiechao
Saleem, Nasir
Rauf, Hafiz Tayyab
Kadry, Seifedine
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 222
[42] Deep Neural Networks Based End-to-End DOA Estimation System
Ando, Daniel Akira
Kase, Yuya
Nishimura, Toshihiko
Sato, Takanori
Ohganey, Takeo
Ogawa, Yasutaka
Hagiwara, Junichiro
IEICE TRANSACTIONS ON COMMUNICATIONS, 2023, E106B (12) : 1350 - 1362
[43] BDD-Net: An End-to-End Multiscale Residual CNN for Earthquake-Induced Building Damage Detection
Seydi, Seyd Teymoor
Rastiveis, Heidar
Kalantar, Bahareh
Halin, Alfian Abdul
Ueda, Naonori
REMOTE SENSING, 2022, 14 (09)
[44] Guaranteeing end-to-end deadlines for AUTOSAR-based automotive software
Yoon, H.
Ryu, M.
INTERNATIONAL JOURNAL OF AUTOMOTIVE TECHNOLOGY, 2015, 16 (04) : 635 - 644
[45] End-to-end deep learning-based framework for path planning and collision checking: bin-picking application
Ghafarian Tamizi, Mehran
Honari, Homayoun
Nozdryn-Plotnicki, Aleksey
Najjaran, Homayoun
ROBOTICA, 2024, 42 (04) : 1094 - 1112
[46] Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning
Li, Yuanchao
Zhao, Tianyu
Kawahara, Tatsuya
INTERSPEECH 2019, 2019, : 2803 - 2807
[47] LIS-Net: An end-to-end light interior search network for speech command recognition
Nguyen Tuan Anh
Hu, Yongjian
He, Qianhua
Tran Thi Ngoc Linh
Hoang Thi Kim Dung
Guang, Chen
COMPUTER SPEECH AND LANGUAGE, 2021, 65
[48] Multi-Channel Training for End-to-End Speaker Recognition under Reverberant and Noisy Environment
Cai, Danwei
Qin, Xiaoyi
Li, Ming
INTERSPEECH 2019, 2019, : 4365 - 4369
[49] EEG-TNet: An End-To-End Brain Computer Interface Framework for Mental Workload Estimation
Fan, Chaojie
Hu, Jin
Huang, Shufang
Peng, Yong
Kwong, Sam
FRONTIERS IN NEUROSCIENCE, 2022, 16
[50] Deep-PCAC: An End-to-End Deep Lossy Compression Framework for Point Cloud Attributes
Sheng, Xihua
Li, Li
Liu, Dong
Xiong, Zhiwei
Li, Zhu
Wu, Feng
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2617 - 2632

← 1 2 3 4 5 →