Speaker identification and localization using shuffled MFCC features and deep learning

被引：5

作者：

Barhoush M. ^{[1
]}

Hallawa A. ^{[2
]}

Schmeink A. ^{[1
]}

机构：

[1] INDA, RWTH Aachen, Aachen

[2] Artificial Intelligence for Critical Care Lab, University Hospital Aachen, Aachen

来源：

International Journal of Speech Technology | 2023年 / 26卷 / 01期

关键词：

Data augmentation; Deep neural network; Mel frequency cepstral coefficients; Speaker identification; Speaker localization;

D O I：

10.1007/s10772-023-10023-2

中图分类号：

学科分类号：

摘要：

The use of machine learning in automatic speaker identification and localization systems has recently seen significant advances. However, this progress comes at the cost of using complex models, computations, and increasing the number of microphone arrays and training data. Therefore, in this work, we propose a new end-to-end identification and localization model based on a simple fully connected deep neural network (FC-DNN) and just two input microphones. This model can jointly or separately localize and identify an active speaker with high accuracy in single and multi-speaker scenarios by exploiting a new data augmentation approach. In this regard, we propose using a novel Mel Frequency Cepstral Coefficients (MFCC) based feature called Shuffled MFCC (SHMFCC) and its variant Difference Shuffled MFCC (DSHMFCC). In order to test our approach, we analyzed the performance of the identification and localization proposed model on the new features at different noise and reverberation conditions for single and multi-speaker scenarios. The results show that our approach achieves high accuracy in these scenarios, outperforms the baseline and conventional methods, and achieves robustness even with small-sized training data. © 2023, The Author(s).

引用

页码：185 / 196

页数：11

共 50 条

[1] Robust Automatic Speaker Identification System Using Shuffled MFCC Features
Barhoush, Mahdi
Hallawa, Ahmed
Schmeink, Anke
2021 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLIED NETWORK TECHNOLOGIES (ICMLANT II), 2021, : 28 - 33
[2] A Speaker Identification System using MFCC Features with VQ Technique
Zulfiqar, Ali
Muhammad, Aslam
Enriquez A M, Martinez
2009 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL 3, PROCEEDINGS, 2009, : 115 - +
[3] A Comparison of MFCC and LPCC with Deep Learning for Speaker Recognition
Yang, Haiyan
Deng, Yanrong
Zhao, Hua-An
ICBDC 2019: PROCEEDINGS OF 2019 4TH INTERNATIONAL CONFERENCE ON BIG DATA AND COMPUTING, 2019, : 160 - 164
[4] Gender Identification of a Speaker Using MFCC and GMM
Yucesoy, Ergun
Nabiyev, Vasif V.
2013 8TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ELECO), 2013, : 626 - 629
[5] ANALYZING NOISE ROBUSTNESS OF MFCC AND GFCC FEATURES IN SPEAKER IDENTIFICATION
Zhao, Xiaojia
Wang, DeLiang
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7204 - 7208
[6] Speaker identification based on combination of MFCC and UMRT based features
Antony, Anett
Gopikakumari, R.
8TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATIONS (ICACC-2018), 2018, 143 : 250 - 257
[7] A Comparative Study on Speaker Gender Identification Using MFCC and Statistical Learning Methods
Xiao, Hanguang
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSAIT 2013), 2014, 255 : 715 - 723
[8] HISTOGRAM TRANSFORM MODEL USING MFCC FEATURES FOR TEXT-INDEPENDENT SPEAKER IDENTIFICATION
Yu, Hong
Ma, Zhanyu
Li, Minyue
Guo, Jun
CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 500 - 504
[9] Text-Independent Speaker Identification by Combining MFCC and MVA Features
Korba, Mohamed Cherif Amara
Bourouba, Houcine
Rafik, Djemili
2018 INTERNATIONAL CONFERENCE ON SIGNAL, IMAGE, VISION AND THEIR APPLICATIONS (SIVA), 2018,
[10] Combining Dynamic Features with MFCC for Text-independent Speaker Identification
Chaudhari, Amol
Rahulkar, Amol
Dhonde, S. B.
2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING (ICIP), 2015, : 160 - 164

← 1 2 3 4 5 →