Combined Bidirectional Long Short-Term Memory with Mel-Frequency Cepstral Coefficients Using Autoencoder for Speaker Recognition

被引:4
作者
Chen, Young-Long [1 ]
Wang, Neng-Chung [2 ]
Ciou, Jing-Fong [1 ]
Lin, Rui-Qi [1 ]
机构
[1] Natl Taichung Univ Sci & Technol, Dept Comp Sci & Informat Engn, Taichung 404336, Taiwan
[2] Natl United Univ, Dept Comp Sci & Informat Engn, Miaoli 360302, Taiwan
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 12期
关键词
speaker recognition; neural network; long short-term memory; mel-frequency cepstral coefficients; triplet loss; IDENTIFICATION; CLASSIFICATION;
D O I
10.3390/app13127008
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Recently, neural network technology has shown remarkable progress in speech recognition, including word classification, emotion recognition, and identity recognition. This paper introduces three novel speaker recognition methods to improve accuracy. The first method, called long short-term memory with mel-frequency cepstral coefficients for triplet loss (LSTM-MFCC-TL), utilizes MFCC as input features for the LSTM model and incorporates triplet loss and cluster training for effective training. The second method, bidirectional long short-term memory with mel-frequency cepstral coefficients for triplet loss (BLSTM-MFCC-TL), enhances speaker recognition accuracy by employing a bidirectional LSTM model. The third method, bidirectional long short-term memory with mel-frequency cepstral coefficients and autoencoder features for triplet loss (BLSTM-MFCCAE-TL), utilizes an autoencoder to extract additional AE features, which are then concatenated with MFCC and fed into the BLSTM model. The results showed that the performance of the BLSTM model was superior to the LSTM model, and the method of adding AE features achieved the best learning effect. Moreover, the proposed methods exhibit faster computation times compared to the reference GMM-HMM model. Therefore, utilizing pre-trained autoencoders for speaker encoding and obtaining AE features can significantly enhance the learning performance of speaker recognition. Additionally, it also offers faster computation time compared to traditional methods.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] A Speech Recognition Method Using Long Short-Term Memory Network in Low Resources
    Shu F.
    Qu D.
    Zhang W.
    Zhou L.
    Guo W.
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2017, 51 (10): : 120 - 127
  • [42] Fuzzy Feature Representation with Bidirectional Long Short-Term Memory for Human Activity Modelling and Recognition
    Mohmed, Gadelhag
    Adama, David Ada
    Lotfi, Ahmad
    ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS (UKCI 2019), 2020, 1043 : 15 - 26
  • [43] Learning Hierarchical Weather Data Representation for Short-Term Weather Forecasting Using Autoencoder and Long Short-Term Memory Models
    Heryadi, Yaya
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT I, 2019, 11431 : 373 - 384
  • [44] Photovoltaic power forecasting with a long short-term memory autoencoder networks
    Sabri, Mohammed
    El Hassouni, Mohammed
    SOFT COMPUTING, 2023, 27 (15) : 10533 - 10553
  • [45] Photovoltaic power forecasting with a long short-term memory autoencoder networks
    Mohammed Sabri
    Mohammed El Hassouni
    Soft Computing, 2023, 27 : 10533 - 10553
  • [46] Hybrid long short-term memory and bidirectional multichannel network cascaded with split convolution for short-term load forecasting
    Hasanat, Syed Muhammad
    Ullah, Irshad
    Aurangzeb, Khursheed
    Rizwan, Muhammad
    Alhussein, Musaed
    Anwar, Muhammad Shahid
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 147
  • [47] Health condition monitoring of machines based on long short-term memory convolutional autoencoder
    Ye, Zhuang
    Yu, Jianbo
    APPLIED SOFT COMPUTING, 2021, 107
  • [48] Sleep staging by bidirectional long short-term memory convolution neural network
    Chen, Xueyan
    He, Jie
    Wu, Xiaoqiang
    Yan, Wei
    Wei, Wei
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 109 : 188 - 196
  • [49] Long Short-term Memory for Tibetan Speech Recognition
    Wang, Weizhe
    Chen, Ziyan
    Yang, Hongwu
    PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 1059 - 1063
  • [50] SIGN LANGUAGE RECOGNITION WITH LONG SHORT-TERM MEMORY
    Liu, Tao
    Zhou, Wengang
    Li, Hougiang
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 2871 - 2875