Speaker identification and localization using shuffled MFCC features and deep learning

被引:5
|
作者
Barhoush M. [1 ]
Hallawa A. [2 ]
Schmeink A. [1 ]
机构
[1] INDA, RWTH Aachen, Aachen
[2] Artificial Intelligence for Critical Care Lab, University Hospital Aachen, Aachen
关键词
Data augmentation; Deep neural network; Mel frequency cepstral coefficients; Speaker identification; Speaker localization;
D O I
10.1007/s10772-023-10023-2
中图分类号
学科分类号
摘要
The use of machine learning in automatic speaker identification and localization systems has recently seen significant advances. However, this progress comes at the cost of using complex models, computations, and increasing the number of microphone arrays and training data. Therefore, in this work, we propose a new end-to-end identification and localization model based on a simple fully connected deep neural network (FC-DNN) and just two input microphones. This model can jointly or separately localize and identify an active speaker with high accuracy in single and multi-speaker scenarios by exploiting a new data augmentation approach. In this regard, we propose using a novel Mel Frequency Cepstral Coefficients (MFCC) based feature called Shuffled MFCC (SHMFCC) and its variant Difference Shuffled MFCC (DSHMFCC). In order to test our approach, we analyzed the performance of the identification and localization proposed model on the new features at different noise and reverberation conditions for single and multi-speaker scenarios. The results show that our approach achieves high accuracy in these scenarios, outperforms the baseline and conventional methods, and achieves robustness even with small-sized training data. © 2023, The Author(s).
引用
收藏
页码:185 / 196
页数:11
相关论文
共 50 条
  • [1] Robust Automatic Speaker Identification System Using Shuffled MFCC Features
    Barhoush, Mahdi
    Hallawa, Ahmed
    Schmeink, Anke
    2021 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLIED NETWORK TECHNOLOGIES (ICMLANT II), 2021, : 28 - 33
  • [2] A Speaker Identification System using MFCC Features with VQ Technique
    Zulfiqar, Ali
    Muhammad, Aslam
    Enriquez A M, Martinez
    2009 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL 3, PROCEEDINGS, 2009, : 115 - +
  • [3] A Comparison of MFCC and LPCC with Deep Learning for Speaker Recognition
    Yang, Haiyan
    Deng, Yanrong
    Zhao, Hua-An
    ICBDC 2019: PROCEEDINGS OF 2019 4TH INTERNATIONAL CONFERENCE ON BIG DATA AND COMPUTING, 2019, : 160 - 164
  • [4] Gender Identification of a Speaker Using MFCC and GMM
    Yucesoy, Ergun
    Nabiyev, Vasif V.
    2013 8TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ELECO), 2013, : 626 - 629
  • [5] ANALYZING NOISE ROBUSTNESS OF MFCC AND GFCC FEATURES IN SPEAKER IDENTIFICATION
    Zhao, Xiaojia
    Wang, DeLiang
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7204 - 7208
  • [6] Speaker identification based on combination of MFCC and UMRT based features
    Antony, Anett
    Gopikakumari, R.
    8TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATIONS (ICACC-2018), 2018, 143 : 250 - 257
  • [7] A Comparative Study on Speaker Gender Identification Using MFCC and Statistical Learning Methods
    Xiao, Hanguang
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSAIT 2013), 2014, 255 : 715 - 723
  • [8] HISTOGRAM TRANSFORM MODEL USING MFCC FEATURES FOR TEXT-INDEPENDENT SPEAKER IDENTIFICATION
    Yu, Hong
    Ma, Zhanyu
    Li, Minyue
    Guo, Jun
    CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 500 - 504
  • [9] Text-Independent Speaker Identification by Combining MFCC and MVA Features
    Korba, Mohamed Cherif Amara
    Bourouba, Houcine
    Rafik, Djemili
    2018 INTERNATIONAL CONFERENCE ON SIGNAL, IMAGE, VISION AND THEIR APPLICATIONS (SIVA), 2018,
  • [10] Combining Dynamic Features with MFCC for Text-independent Speaker Identification
    Chaudhari, Amol
    Rahulkar, Amol
    Dhonde, S. B.
    2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING (ICIP), 2015, : 160 - 164