Wavelet Scattering Transform and CNN for Closed Set Speaker Identification

被引：15

作者：

Ghezaiel, Wajdi ^{[1
]}

Brun, Luc ^{[2
]}

Lezoray, Olivier ^{[2
]}

机构：

[1] Normandie Univ, ENSICAEN, UNICAEN, CNRS,NormaSTIC, F-14000 Caen, France

[2] Normandie Univ, UNICAEN, ENSICAEN, CNRS,Greyc,UMR 6072, F-14000 Caen, France

来源：

2020 IEEE 22ND INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP) | 2020年

关键词：

Speaker identification; short utterances; wavelet scattering transform; convolutional neural network; hybrid network; VERIFICATION;

D O I：

10.1109/mmsp48831.2020.9287061

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

In real world applications, the performances of speaker identification systems degrade due to the reduction of both the amount and the quality of speech utterance. For that particular purpose, we propose a speaker identification system where short utterances with few training examples are used for person identification. Therefore, only a very small amount of data involving a sentence of 2-4 seconds is used. To achieve this, we propose a novel raw waveform end-to-end convolutional neural network (CNN) for text-independent speaker identification. We use wavelet scattering transform as a fixed initialization of the first layers of a CNN network, and learn the remaining layers in a supervised manner. The conducted experiments show that our hybrid architecture combining wavelet scattering transform and CNN can successfully perform efficient feature extraction for a speaker identification, even with a small number of short duration training samples.

引用

页数：6

共 30 条

[1] Convolutional Neural Networks for Speech Recognition [J].

Abdel-Hamid, Ossama ;

Mohamed, Abdel-Rahman ;

Jiang, Hui ;

Deng, Li ;

Penn, Gerald ;

Yu, Dong .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545

[2] Deep Scattering Spectrum [J].

Anden, Joakim ;

Mallat, Stephane .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2014, 62 (16) :4114-4128

[3]

[Anonymous], 2014, P SPEAK OD

[4]

Bengio Y, 1995, HDB BRAIN THEORY NEU, V3361

[5]

Bruna J., 2014, PROC ICLR 2013 INT C

[6] Invariant Scattering Convolution Networks [J].

Bruna, Joan ;

Mallat, Stephane .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1872-1886

[7] Scattering Transform for Intrapartum Fetal Heart Rate Variability Fractal Analysis: A Case-Control Study [J].

Chudacek, Vaclav ;

Anden, Joakim ;

Mallat, Stephane ;

Abry, Patrice ;

Doret, Muriel .

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2014, 61 (04) :1100-1108

[8]

Collobert Ronan, 2008, P 25 INT C MACH LEAR, P160, DOI 10.1145/1390156.1390177

[9] Front-End Factor Analysis for Speaker Verification [J].

Dehak, Najim ;

Kenny, Patrick J. ;

Dehak, Reda ;

Dumouchel, Pierre ;

Ouellet, Pierre .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798

[10]

Dupoux E, 2015, P INT, P55

← 1 2 3 →