Combined Bidirectional Long Short-Term Memory with Mel-Frequency Cepstral Coefficients Using Autoencoder for Speaker Recognition

被引：4

作者：

Chen, Young-Long ^{[1
]}

Wang, Neng-Chung ^{[2
]}

Ciou, Jing-Fong ^{[1
]}

Lin, Rui-Qi ^{[1
]}

机构：

[1] Natl Taichung Univ Sci & Technol, Dept Comp Sci & Informat Engn, Taichung 404336, Taiwan

[2] Natl United Univ, Dept Comp Sci & Informat Engn, Miaoli 360302, Taiwan

来源：

APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 12期

关键词：

speaker recognition; neural network; long short-term memory; mel-frequency cepstral coefficients; triplet loss; IDENTIFICATION; CLASSIFICATION;

D O I：

10.3390/app13127008

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Recently, neural network technology has shown remarkable progress in speech recognition, including word classification, emotion recognition, and identity recognition. This paper introduces three novel speaker recognition methods to improve accuracy. The first method, called long short-term memory with mel-frequency cepstral coefficients for triplet loss (LSTM-MFCC-TL), utilizes MFCC as input features for the LSTM model and incorporates triplet loss and cluster training for effective training. The second method, bidirectional long short-term memory with mel-frequency cepstral coefficients for triplet loss (BLSTM-MFCC-TL), enhances speaker recognition accuracy by employing a bidirectional LSTM model. The third method, bidirectional long short-term memory with mel-frequency cepstral coefficients and autoencoder features for triplet loss (BLSTM-MFCCAE-TL), utilizes an autoencoder to extract additional AE features, which are then concatenated with MFCC and fed into the BLSTM model. The results showed that the performance of the BLSTM model was superior to the LSTM model, and the method of adding AE features achieved the best learning effect. Moreover, the proposed methods exhibit faster computation times compared to the reference GMM-HMM model. Therefore, utilizing pre-trained autoencoders for speaker encoding and obtaining AE features can significantly enhance the learning performance of speaker recognition. Additionally, it also offers faster computation time compared to traditional methods.

引用

页数：19

共 50 条

[21] Speaker recognition utilizing distributed DCT-II based Mel frequency cepstral coefficients and fuzzy vector quantization
Hossan, M.
Gregory, Mark
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2013, 16 (01) : 103 - 113
[22] Wind Turbine Gearbox Early Fault Detection Using Mel-Frequency Cepstral Coefficients of Vibration Data
Velandia-Cardenas, Cristian
Vidal, Yolanda
Pozo, Francesc
STRUCTURAL CONTROL & HEALTH MONITORING, 2024, 2024
[23] ACOUSTIC PORNOGRAPHY RECOGNITION USING FUSED PITCH AND MEL-FREQUENCY CEPSTRUM COEFFICIENTS
Banaeeyan, Rasoul
Karim, Hezerul Abdul
Lye, Haris
Fauzi, Mohamad Faizal Ahmad
Mansor, Sarina
See, John
INTERNATIONAL JOURNAL OF TECHNOLOGY, 2019, 10 (07) : 1335 - 1343
[24] Time Series-based Spoof Speech Detection Using Long Short-term Memory and Bidirectional Long Short-term Memory
Mirza, Arsalan R.
Al-Talabani, Abdulbasit K.
ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, 2024, 12 (02): : 119 - 129
[25] Bidirectional Long Short-Term Memory Network for Vehicle Behavior Recognition
Zhu, Jiasong
Sun, Ke
Jia, Sen
Lin, Weidong
Hou, Xianxu
Liu, Bozhi
Qiu, Guoping
REMOTE SENSING, 2018, 10 (06)
[26] Development of a diagnostic algorithm for abnormal situations using long short-term memory and variational autoencoder
Kim, Hyojin
Arigi, Awwal Mohammed
Kim, Jonghyun
ANNALS OF NUCLEAR ENERGY, 2021, 153
[27] Combined Long Short-Term Memory based Network employing wavelet coefficients for MI-EEG recognition
Li, Mingai
Zhang, Meng
Luo, Xinyong
Yang, Jinfu
2016 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, 2016, : 1971 - 1976
[28] Vector quantization in text dependent automatic speaker recognition using Mel-Frequency Cepstrum Coefficient
Kabir, Ahsanul
Ahsan, Sheikh Mohammad Masudul
PROCEEDINGS OF THE WSEAS INTERNATIONAL CONFERENCE ON CIRCUITS, SYSTEMS, ELECTRONICS, CONTROL & SIGNAL PROCESSING: SELECTED TOPICS ON CIRCUITS, SYSTEMS, ELECTRONICS, CONTROL & SIGNAL PROCESSING, 2007, : 352 - 355
[29] Kannada Named Entity Recognition and Classification using Bidirectional Long Short-Term Memory Networks
Sathyanarayanan, Dinesh
Ashok, Ashwin
Mishra, Debanik
Chimalamarri, Santwana
Sitaram, Dinkar
2018 3RD INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER, AND OPTIMIZATION TECHNIQUES (ICEECCOT - 2018), 2018, : 65 - 71
[30] Centralized tracking and bidirectional long short-term memory for abnormal behaviour recognition
Andersson, Maria
COUNTERTERRORISM, CRIME FIGHTING, FORENSICS, AND SURVEILLANCE TECHNOLOGIES VI, 2022, 12275

← 1 2 3 4 5 →