Accent Identification by Combining Deep Neural Networks and Recurrent Neural Networks Trained on Long and Short Term Features

被引:39
作者
Jiao, Yishan [1 ]
Tu, Ming [1 ]
Berisha, Visar [1 ,2 ]
Liss, Julie [1 ]
机构
[1] Arizona State Univ, Dept Speech & Hearing Sci, Tempe, AZ 85287 USA
[2] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ 85287 USA
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
accent identification; deep neural networks; prosody; articulation;
D O I
10.21437/Interspeech.2016-1148
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic identification of foreign accents is valuable for many speech systems, such as speech recognition, speaker identification, voice conversion, etc. The INTERSPEECH 2016 Native Language Sub-Challenge is to identify the native languages of non-native English speakers from eleven countries. Since differences in accent are due to both prosodic and articulation characteristics, a combination of long-term and short-term training is proposed in this paper. Each speech sample is processed into multiple speech segments with equal length. For each segment, deep neural networks (DNNs) are used to train on long-term statistical features, while recurrent neural networks (RNNs) are used to train on short-term acoustic features. The result for each speech sample is calculated by linearly fusing the results from the two sets of networks on all segments. The performance of the proposed system greatly surpasses the provided baseline system. Moreover, by fusing the results with the baseline system, the performance can be further improved.
引用
收藏
页码:2388 / 2392
页数:5
相关论文
共 31 条
[11]  
Eyben F., 2010, P 18 ACM INT C MULT, P1459
[12]   Formant frequencies of vowels in 13 accents of the British Isles [J].
Ferragne, Emmanuel ;
Pellegrino, Francois .
JOURNAL OF THE INTERNATIONAL PHONETIC ASSOCIATION, 2010, 40 (01) :1-34
[13]  
Ghesquiere PJ, 2002, INT CONF ACOUST SPEE, P749
[14]  
Gonzalez-Dominguez J., 2014, P INT 2014 SING 1418, P2155, DOI DOI 10.21437/INTERSPEECH.2014-483
[15]  
Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947
[16]   Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].
Hinton, Geoffrey ;
Deng, Li ;
Yu, Dong ;
Dahl, George E. ;
Mohamed, Abdel-rahman ;
Jaitly, Navdeep ;
Senior, Andrew ;
Vanhoucke, Vincent ;
Patrick Nguyen ;
Sainath, Tara N. ;
Kingsbury, Brian .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97
[17]  
Jiao Y., 2016, AC SPEECH SIGN PROC
[18]  
Kat LW, 1999, INT CONF ACOUST SPEE, P221, DOI 10.1109/ICASSP.1999.758102
[19]  
Lopez-Moreno Ignacio, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P5337, DOI 10.1109/ICASSP.2014.6854622
[20]  
Montavon Gregoire., 2009, Proc. NIPS Workshop on deep learning for Speech Recognition and Related Applications, P1