A focus module-based lightweight end-to-end CNN framework for voiceprint recognition

被引:9
作者
Velayuthapandian, Karthikeyan [1 ]
Subramoniam, Suja Priyadharsini [2 ]
机构
[1] Mepco Schlenk Engn Coll, Dept Elect & Commun Engn, Sivakasi, Tamil Nadu, India
[2] Anna Univ Reg Campus, Dept Elect & Commun Engn, Tirunelveli, Tamil Nadu, India
关键词
Speaker recognition; Deep neural network; Spectrogram; 1-D CNN; Focus module; SUPPORT VECTOR MACHINES; SPEAKER; SYSTEM;
D O I
10.1007/s11760-023-02500-7
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The process of identifying a spokesperson from a collection of subsequent time series data is referred to as speaker identification. Convolutional neural networks (CNNs) and deep neural networks are the two types of neural networks that are used in the majority of modern experimental approaches. This work presents a CNN model for speaker identification using a jump-connected one-dimensional convolutional neural network (1-D CNN) with a focus module (FM). The 1-D convolutional layer integrated with FM is employed in the presented model for speaker characteristic extraction and lessens heterogeneity in the temporal and spatial domains, allowing for quicker layer processing. Furthermore, the layered CNN hopping interconnection is employed to overcome the connectivity glitches, and a solution based on softmax loss and smooth L1-norm combined regulation is presented to increase efficiency. The recommended network model was evaluated using the ELSDSR, TIMIT, NIST, 16,000 PCM, and experimental audio datasets. According to experimental data, the equal error rate (EER) of end-to-end CNN for voiceprint identification is 9.02% higher than baseline approaches. In experiments, our proposed speaker recognition (SR) model, which we refer to as the deep FM-1D CNN, had a high recognition accuracy of 99.21%. Moreover, the observations demonstrate that the proposed network model is more robust than other models.
引用
收藏
页码:2817 / 2825
页数:9
相关论文
共 50 条
  • [41] E2E-DASR: End-to-end deep learning-based dysarthric automatic speech recognition
    Almadhor, Ahmad
    Irfan, Rizwana
    Gao, Jiechao
    Saleem, Nasir
    Rauf, Hafiz Tayyab
    Kadry, Seifedine
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 222
  • [42] Deep Neural Networks Based End-to-End DOA Estimation System
    Ando, Daniel Akira
    Kase, Yuya
    Nishimura, Toshihiko
    Sato, Takanori
    Ohganey, Takeo
    Ogawa, Yasutaka
    Hagiwara, Junichiro
    IEICE TRANSACTIONS ON COMMUNICATIONS, 2023, E106B (12) : 1350 - 1362
  • [43] BDD-Net: An End-to-End Multiscale Residual CNN for Earthquake-Induced Building Damage Detection
    Seydi, Seyd Teymoor
    Rastiveis, Heidar
    Kalantar, Bahareh
    Halin, Alfian Abdul
    Ueda, Naonori
    REMOTE SENSING, 2022, 14 (09)
  • [44] Guaranteeing end-to-end deadlines for AUTOSAR-based automotive software
    Yoon, H.
    Ryu, M.
    INTERNATIONAL JOURNAL OF AUTOMOTIVE TECHNOLOGY, 2015, 16 (04) : 635 - 644
  • [45] End-to-end deep learning-based framework for path planning and collision checking: bin-picking application
    Ghafarian Tamizi, Mehran
    Honari, Homayoun
    Nozdryn-Plotnicki, Aleksey
    Najjaran, Homayoun
    ROBOTICA, 2024, 42 (04) : 1094 - 1112
  • [46] Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning
    Li, Yuanchao
    Zhao, Tianyu
    Kawahara, Tatsuya
    INTERSPEECH 2019, 2019, : 2803 - 2807
  • [47] LIS-Net: An end-to-end light interior search network for speech command recognition
    Nguyen Tuan Anh
    Hu, Yongjian
    He, Qianhua
    Tran Thi Ngoc Linh
    Hoang Thi Kim Dung
    Guang, Chen
    COMPUTER SPEECH AND LANGUAGE, 2021, 65
  • [48] Multi-Channel Training for End-to-End Speaker Recognition under Reverberant and Noisy Environment
    Cai, Danwei
    Qin, Xiaoyi
    Li, Ming
    INTERSPEECH 2019, 2019, : 4365 - 4369
  • [49] EEG-TNet: An End-To-End Brain Computer Interface Framework for Mental Workload Estimation
    Fan, Chaojie
    Hu, Jin
    Huang, Shufang
    Peng, Yong
    Kwong, Sam
    FRONTIERS IN NEUROSCIENCE, 2022, 16
  • [50] Deep-PCAC: An End-to-End Deep Lossy Compression Framework for Point Cloud Attributes
    Sheng, Xihua
    Li, Li
    Liu, Dong
    Xiong, Zhiwei
    Li, Zhu
    Wu, Feng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2617 - 2632