Combined Bidirectional Long Short-Term Memory with Mel-Frequency Cepstral Coefficients Using Autoencoder for Speaker Recognition

被引:4
作者
Chen, Young-Long [1 ]
Wang, Neng-Chung [2 ]
Ciou, Jing-Fong [1 ]
Lin, Rui-Qi [1 ]
机构
[1] Natl Taichung Univ Sci & Technol, Dept Comp Sci & Informat Engn, Taichung 404336, Taiwan
[2] Natl United Univ, Dept Comp Sci & Informat Engn, Miaoli 360302, Taiwan
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 12期
关键词
speaker recognition; neural network; long short-term memory; mel-frequency cepstral coefficients; triplet loss; IDENTIFICATION; CLASSIFICATION;
D O I
10.3390/app13127008
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Recently, neural network technology has shown remarkable progress in speech recognition, including word classification, emotion recognition, and identity recognition. This paper introduces three novel speaker recognition methods to improve accuracy. The first method, called long short-term memory with mel-frequency cepstral coefficients for triplet loss (LSTM-MFCC-TL), utilizes MFCC as input features for the LSTM model and incorporates triplet loss and cluster training for effective training. The second method, bidirectional long short-term memory with mel-frequency cepstral coefficients for triplet loss (BLSTM-MFCC-TL), enhances speaker recognition accuracy by employing a bidirectional LSTM model. The third method, bidirectional long short-term memory with mel-frequency cepstral coefficients and autoencoder features for triplet loss (BLSTM-MFCCAE-TL), utilizes an autoencoder to extract additional AE features, which are then concatenated with MFCC and fed into the BLSTM model. The results showed that the performance of the BLSTM model was superior to the LSTM model, and the method of adding AE features achieved the best learning effect. Moreover, the proposed methods exhibit faster computation times compared to the reference GMM-HMM model. Therefore, utilizing pre-trained autoencoders for speaker encoding and obtaining AE features can significantly enhance the learning performance of speaker recognition. Additionally, it also offers faster computation time compared to traditional methods.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Improving Mandarin Tone Recognition using Convolutional Bidirectional Long Short-Term Memory with Attention
    Yang, Longfei
    Xie, Yanlu
    Zhang, Jinsong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 352 - 356
  • [32] Terahertz Spectral Recognition Based on Bidirectional Long Short-Term Memory Recurrent Neural Network
    Yu Hao-yue
    Shen Tao
    Zhu Yan
    Liu Ying-li
    Yu Zheng-tao
    SPECTROSCOPY AND SPECTRAL ANALYSIS, 2019, 39 (12) : 3737 - 3742
  • [33] Material recognition for fault diagnosis in machine tools using improved Mel Frequency Cepstral Coefficients
    Yuan, Jianjian
    Li, Lin
    Shao, Hua
    Han, Muyue
    Huang, Hongcheng
    JOURNAL OF MANUFACTURING PROCESSES, 2023, 98 : 67 - 79
  • [34] Heart-ID: human identity recognition using heart sounds based on modifying mel-frequency cepstral features
    Abbas, Sherif N.
    Abo-Zahhad, Mohammed
    Ahmed, Sabah M.
    Farrag, Mohammed
    IET BIOMETRICS, 2016, 5 (04) : 284 - 296
  • [35] Short-term power load forecast using OOA optimized bidirectional long short-term memory network with spectral attention for the frequency domain
    Liu, Jingrui
    Hou, Zhiwen
    Yin, Tianxiang
    ENERGY REPORTS, 2024, 12 : 4891 - 4908
  • [36] Recognition of normal-abnormal phonocardiographic signals using deep convolutional neural networks and mel-frequency spectral coefficients
    Maknickas, Vykintas
    Maknickas, Algirdas
    PHYSIOLOGICAL MEASUREMENT, 2017, 38 (08) : 1671 - 1684
  • [37] Automatic speaker recognition from speech signal using bidirectional long-short-term memory recurrent neural network
    Devi, Kharibam Jilenkumari
    Thongam, Khelchandra
    COMPUTATIONAL INTELLIGENCE, 2023, 39 (02) : 170 - 193
  • [38] Amputee walking mode recognition based on mel frequency cepstral coefficients using surface electromyography sensor
    Hussain, Tahir
    Iqbal, Nadeem
    Maqbool, Hafiz Farhan
    Khan, Mukhtaj
    Tahir, Mehak
    INTERNATIONAL JOURNAL OF SENSOR NETWORKS, 2020, 32 (03) : 139 - 149
  • [39] iEnhancer-EBLSTM: Identifying Enhancers and Strengths by Ensembles of Bidirectional Long Short-Term Memory
    Niu, Kun
    Luo, Ximei
    Zhang, Shumei
    Teng, Zhixia
    Zhang, Tianjiao
    Zhao, Yuming
    FRONTIERS IN GENETICS, 2021, 12
  • [40] Behavior recognition for humanoid robots using long short-term memory
    How, Dickson Neoh Tze
    Loo, Chu Kiong
    Sahari, Khairul Salleh Mohamed
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2016, 13 : 1 - 14