Fusing MFCC and LPC Features Using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals

被引:116
作者
Chowdhury, Anurag [1 ]
Ross, Arun [1 ]
机构
[1] Michigan State Univ, Dept Comp Sci Engn, E Lansing, MI 48823 USA
关键词
Speaker recognition; Speech recognition; Noise measurement; Mel frequency cepstral coefficient; Speech processing; Feature extraction; Production; degraded audio; deep learning; MFCC; LPC; 1-D CNN; feature-level fusion; NOISE; IDENTIFICATION; SPEECH; MACHINES;
D O I
10.1109/TIFS.2019.2941773
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Speaker recognition algorithms are negatively impacted by the quality of the input speech signal. In this work, we approach the problem of speaker recognition from severely degraded audio data by judiciously combining two commonly used features: Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC). Our hypothesis rests on the observation that MFCC and LPC capture two distinct aspects of speech, viz., speech perception and speech production. A carefully crafted 1D Triplet Convolutional Neural Network (1D-Triplet-CNN) is used to combine these two features in a novel manner, thereby enhancing the performance of speaker recognition in challenging scenarios. Extensive evaluation on multiple datasets, different types of audio degradations, multi-lingual speech, varying length of audio samples, etc. convey the efficacy of the proposed approach over existing speaker recognition methods, including those based on iVector and xVector.
引用
收藏
页码:1616 / 1629
页数:14
相关论文
共 43 条
  • [1] IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS
    ALLEN, JB
    BERKLEY, DA
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) : 943 - 950
  • [2] [Anonymous], [No title captured]
  • [3] [Anonymous], [No title captured]
  • [4] [Anonymous], 2013, SPEECH LANGUAGE PROC
  • [5] [Anonymous], 2017, ARXIV PREPRINT ARXIV
  • [6] Arik S. O., 2017, P NIPS, V1, P1
  • [7] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION
    BOLL, SF
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02): : 113 - 120
  • [8] Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)
  • [9] Brookes M., 1997, SOFTWARE, P47
  • [10] Support vector machines using GMM supervectors for speaker verification
    Campbell, WM
    Sturim, DE
    Reynolds, DA
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) : 308 - 311