Combined Bidirectional Long Short-Term Memory with Mel-Frequency Cepstral Coefficients Using Autoencoder for Speaker Recognition

被引:4
作者
Chen, Young-Long [1 ]
Wang, Neng-Chung [2 ]
Ciou, Jing-Fong [1 ]
Lin, Rui-Qi [1 ]
机构
[1] Natl Taichung Univ Sci & Technol, Dept Comp Sci & Informat Engn, Taichung 404336, Taiwan
[2] Natl United Univ, Dept Comp Sci & Informat Engn, Miaoli 360302, Taiwan
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 12期
关键词
speaker recognition; neural network; long short-term memory; mel-frequency cepstral coefficients; triplet loss; IDENTIFICATION; CLASSIFICATION;
D O I
10.3390/app13127008
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Recently, neural network technology has shown remarkable progress in speech recognition, including word classification, emotion recognition, and identity recognition. This paper introduces three novel speaker recognition methods to improve accuracy. The first method, called long short-term memory with mel-frequency cepstral coefficients for triplet loss (LSTM-MFCC-TL), utilizes MFCC as input features for the LSTM model and incorporates triplet loss and cluster training for effective training. The second method, bidirectional long short-term memory with mel-frequency cepstral coefficients for triplet loss (BLSTM-MFCC-TL), enhances speaker recognition accuracy by employing a bidirectional LSTM model. The third method, bidirectional long short-term memory with mel-frequency cepstral coefficients and autoencoder features for triplet loss (BLSTM-MFCCAE-TL), utilizes an autoencoder to extract additional AE features, which are then concatenated with MFCC and fed into the BLSTM model. The results showed that the performance of the BLSTM model was superior to the LSTM model, and the method of adding AE features achieved the best learning effect. Moreover, the proposed methods exhibit faster computation times compared to the reference GMM-HMM model. Therefore, utilizing pre-trained autoencoders for speaker encoding and obtaining AE features can significantly enhance the learning performance of speaker recognition. Additionally, it also offers faster computation time compared to traditional methods.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Speaker recognition utilizing distributed DCT-II based Mel frequency cepstral coefficients and fuzzy vector quantization
    Hossan, M.
    Gregory, Mark
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2013, 16 (01) : 103 - 113
  • [22] Wind Turbine Gearbox Early Fault Detection Using Mel-Frequency Cepstral Coefficients of Vibration Data
    Velandia-Cardenas, Cristian
    Vidal, Yolanda
    Pozo, Francesc
    STRUCTURAL CONTROL & HEALTH MONITORING, 2024, 2024
  • [23] ACOUSTIC PORNOGRAPHY RECOGNITION USING FUSED PITCH AND MEL-FREQUENCY CEPSTRUM COEFFICIENTS
    Banaeeyan, Rasoul
    Karim, Hezerul Abdul
    Lye, Haris
    Fauzi, Mohamad Faizal Ahmad
    Mansor, Sarina
    See, John
    INTERNATIONAL JOURNAL OF TECHNOLOGY, 2019, 10 (07) : 1335 - 1343
  • [24] Time Series-based Spoof Speech Detection Using Long Short-term Memory and Bidirectional Long Short-term Memory
    Mirza, Arsalan R.
    Al-Talabani, Abdulbasit K.
    ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, 2024, 12 (02): : 119 - 129
  • [25] Bidirectional Long Short-Term Memory Network for Vehicle Behavior Recognition
    Zhu, Jiasong
    Sun, Ke
    Jia, Sen
    Lin, Weidong
    Hou, Xianxu
    Liu, Bozhi
    Qiu, Guoping
    REMOTE SENSING, 2018, 10 (06)
  • [26] Development of a diagnostic algorithm for abnormal situations using long short-term memory and variational autoencoder
    Kim, Hyojin
    Arigi, Awwal Mohammed
    Kim, Jonghyun
    ANNALS OF NUCLEAR ENERGY, 2021, 153
  • [27] Combined Long Short-Term Memory based Network employing wavelet coefficients for MI-EEG recognition
    Li, Mingai
    Zhang, Meng
    Luo, Xinyong
    Yang, Jinfu
    2016 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, 2016, : 1971 - 1976
  • [28] Vector quantization in text dependent automatic speaker recognition using Mel-Frequency Cepstrum Coefficient
    Kabir, Ahsanul
    Ahsan, Sheikh Mohammad Masudul
    PROCEEDINGS OF THE WSEAS INTERNATIONAL CONFERENCE ON CIRCUITS, SYSTEMS, ELECTRONICS, CONTROL & SIGNAL PROCESSING: SELECTED TOPICS ON CIRCUITS, SYSTEMS, ELECTRONICS, CONTROL & SIGNAL PROCESSING, 2007, : 352 - 355
  • [29] Kannada Named Entity Recognition and Classification using Bidirectional Long Short-Term Memory Networks
    Sathyanarayanan, Dinesh
    Ashok, Ashwin
    Mishra, Debanik
    Chimalamarri, Santwana
    Sitaram, Dinkar
    2018 3RD INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER, AND OPTIMIZATION TECHNIQUES (ICEECCOT - 2018), 2018, : 65 - 71
  • [30] Centralized tracking and bidirectional long short-term memory for abnormal behaviour recognition
    Andersson, Maria
    COUNTERTERRORISM, CRIME FIGHTING, FORENSICS, AND SURVEILLANCE TECHNOLOGIES VI, 2022, 12275