Combined Bidirectional Long Short-Term Memory with Mel-Frequency Cepstral Coefficients Using Autoencoder for Speaker Recognition

被引:4
作者
Chen, Young-Long [1 ]
Wang, Neng-Chung [2 ]
Ciou, Jing-Fong [1 ]
Lin, Rui-Qi [1 ]
机构
[1] Natl Taichung Univ Sci & Technol, Dept Comp Sci & Informat Engn, Taichung 404336, Taiwan
[2] Natl United Univ, Dept Comp Sci & Informat Engn, Miaoli 360302, Taiwan
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 12期
关键词
speaker recognition; neural network; long short-term memory; mel-frequency cepstral coefficients; triplet loss; IDENTIFICATION; CLASSIFICATION;
D O I
10.3390/app13127008
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Recently, neural network technology has shown remarkable progress in speech recognition, including word classification, emotion recognition, and identity recognition. This paper introduces three novel speaker recognition methods to improve accuracy. The first method, called long short-term memory with mel-frequency cepstral coefficients for triplet loss (LSTM-MFCC-TL), utilizes MFCC as input features for the LSTM model and incorporates triplet loss and cluster training for effective training. The second method, bidirectional long short-term memory with mel-frequency cepstral coefficients for triplet loss (BLSTM-MFCC-TL), enhances speaker recognition accuracy by employing a bidirectional LSTM model. The third method, bidirectional long short-term memory with mel-frequency cepstral coefficients and autoencoder features for triplet loss (BLSTM-MFCCAE-TL), utilizes an autoencoder to extract additional AE features, which are then concatenated with MFCC and fed into the BLSTM model. The results showed that the performance of the BLSTM model was superior to the LSTM model, and the method of adding AE features achieved the best learning effect. Moreover, the proposed methods exhibit faster computation times compared to the reference GMM-HMM model. Therefore, utilizing pre-trained autoencoders for speaker encoding and obtaining AE features can significantly enhance the learning performance of speaker recognition. Additionally, it also offers faster computation time compared to traditional methods.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Mel-Frequency Cepstral Coefficients as Features for Automatic Speaker Recognition
    Jokic, Ivan D.
    Jokic, Stevan D.
    Delic, Vlado D.
    Peric, Zoran H.
    2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 419 - 424
  • [2] Automatic Speaker Recognition Based on Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models
    Memon, Sheeraz
    Bhatti, Sania
    Abro, Farzana Rauf
    MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2013, 32 (04) : 543 - 550
  • [3] One Solution of Extension of Mel-Frequency Cepstral Coefficients Feature Vector for Automatic Speaker Recognition
    Jokic, Ivan D.
    Jokic, Stevan D.
    Delic, Vlado D.
    Peric, Zoran H.
    INFORMATION TECHNOLOGY AND CONTROL, 2020, 49 (02): : 224 - 236
  • [4] Variants of Mel-frequency Cepstral Coefficients for Improved Whispered Speech Speaker Verification in Mismatched Conditions
    Sarria-Paja, Milton
    Falk, Tiago H.
    2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 91 - 95
  • [5] Vocal Fold Pathology Assessment Using Mel-Frequency Cepstral Coefficients and Linear Predictive Cepstral Coefficients Features
    Saldanha, Jennifer C.
    Ananthakrishna, T.
    Pinto, Rohan
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2014, 4 (02) : 168 - 173
  • [6] Modelling and Characterization of an Artificial Neural Network for Infant Cry Recognition Using Mel-Frequency Cepstral Coefficients
    Bandala, Argel A.
    Lim, Allimzon M.
    Cai, Mark Anthony D.
    Bacar, Allan Jeffrey C.
    Manosca, Aynna Claudine G.
    TENCON 2014 - 2014 IEEE REGION 10 CONFERENCE, 2014,
  • [7] A Wavelet Packet and Mel-Frequency Cepstral Coefficients-Based Feature Extraction Method for Speaker Identification
    Turner, Claude
    Joseph, Anthony
    COMPLEX ADAPTIVE SYSTEMS, 2015, 2015, 61 : 416 - 421
  • [8] Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison?
    Chan, Ricky K. W.
    Wang, Bruce X.
    FORENSIC SCIENCE INTERNATIONAL, 2024, 363
  • [9] Drive-by bridge damage detection using Mel-frequency cepstral coefficients and support vector machine
    Li, Zhenkun
    Lin, Weiwei
    Zhang, Youqi
    STRUCTURAL HEALTH MONITORING-AN INTERNATIONAL JOURNAL, 2023, 22 (05): : 3302 - 3319
  • [10] Classification of Heart Sounds using Linear Prediction Coefficients and Mel-Frequency Cepstral Coefficients as Acoustic Features
    Narvaez, Pedro
    Vera, Katerine
    Bedoya, Nhikolas
    Percybrooks, Winston S.
    2017 IEEE COLOMBIAN CONFERENCE ON COMMUNICATIONS AND COMPUTING (COLCOM), 2017,