DNN-based Amplitude and Phase Feature Enhancement for Noise Robust Speaker Identification

被引:26
作者
Oo, Zeyan [1 ]
Kawakami, Yuta [1 ]
Wang, Longbiao [1 ]
Nakagawa, Seiichi [2 ]
Xiao, Xiong [3 ]
Iwahashi, Masahiro [1 ]
机构
[1] Nagaoka Univ Technol, Nagaoka, Niigata, Japan
[2] Toyohashi Univ Technol, Toyohashi, Aichi, Japan
[3] Nanyang Technol Univ, Singapore, Singapore
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
speaker identification; feature enhancement; deep neural network; phase information; COMBINING MFCC; RECOGNITION;
D O I
10.21437/Interspeech.2016-717
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The importance of the phase information of speech signal is gathering attention. Many researches indicate system combination of the amplitude and phase features is effective for improving speaker recognition performance under noisy environments. On the other hand, speech enhancement approach is taken usually to reduce the influence of noises. However, this approach only enhances the amplitude spectrum, therefor noisy phase spectrum is used for reconstructing the estimated signal. Recent years, DNN based feature enhancement is studied intensively for robust speech processing. This approach is expected to be effective also for phase-based feature. In this paper, we propose feature space enhancement of amplitude and phase features using deep neural network (DNN) for speaker identification. We used mel-frequency cepstral coefficients as an amplitude feature, and modified group delay cepstral coefficients as a phase feature. Simultaneous enhancement of amplitude and phase based feature was effective, and it achieved about 24% relative error reduction comparing with individual feature enhancement.
引用
收藏
页码:2204 / 2208
页数:5
相关论文
共 26 条
[1]  
[Anonymous], 2014, INTERSPEECH
[2]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[3]   Phase Processing for Single-Channel Speech Enhancement [J].
Gerkmann, Timo ;
Krawczyk-Becker, Martin ;
Le Roux, Jonathan .
IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (02) :55-66
[4]   SIGNAL ESTIMATION FROM MODIFIED SHORT-TIME FOURIER-TRANSFORM [J].
GRIFFIN, DW ;
LIM, JS .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (02) :236-243
[5]   Significance of the modified group delay feature in speech recognition [J].
Hegde, Rajesh M. ;
Murthy, Hema A. ;
Gadde, Venkata Ramana Rao .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01) :190-202
[6]   A fast learning algorithm for deep belief nets [J].
Hinton, Geoffrey E. ;
Osindero, Simon ;
Teh, Yee-Whye .
NEURAL COMPUTATION, 2006, 18 (07) :1527-1554
[7]  
Itahashi S., 1999, Journal of the Acoustical Society of Japan (E), V20, P163, DOI 10.1250/ast.20.163
[8]  
Itou K., 1999, Journal of the Acoustical Society of Japan (E), V20, P199, DOI 10.1250/ast.20.199
[9]  
Lu X., 2013, P INTERSPEECH
[10]  
Miao YJ, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P761