Speaker Recognition Based on Lightweight Neural Network for Smart Home Solutions

被引:1
作者
Ai, Haojun [1 ,2 ]
Xia, Wuyang [1 ]
Zhang, Quanxin [3 ]
机构
[1] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan, Peoples R China
[2] Minist Educ, Key Lab Aerosp Informat Secur & Trusted Comp, Wuhan, Peoples R China
[3] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
来源
CYBERSPACE SAFETY AND SECURITY, PT II | 2019年 / 11983卷
基金
中国国家自然科学基金;
关键词
Speaker recognition; Smart home; Transfer learning; SPEECH;
D O I
10.1007/978-3-030-37352-8_37
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the technological advancement of smart home devices, the lifestyles of people have been gradually changed. Meanwhile, speaker recognition is available in almost all smart home devices. Currently, the mainstream speaker recognition service is provided by a very deep neural network which trained on the cloud server. However, these deep neural networks are not suitable for deployment and operation on smart home devices. Aiming at this problem, in this paper, we propose a packet bottleneck method to improve SqueezeNet which has been widely used in the speaker recognition task. In the meantime, a lightweight structure named TrimNet has been designed. Besides, a model updating strategy based on transfer learning has been adopted to avoid model deteriorates due to the cold speech. The experimental results demonstrate that the proposed lightweight structure TrimNet is superior to SqueezeNet in classification accuracy, structural parameter quantity, and calculation amount. Moreover, the model updating method can increase the recognition rate of cold speech without damaging the recognition rate of other speakers.
引用
收藏
页码:421 / 431
页数:11
相关论文
共 21 条
[1]  
[Anonymous], 2009, 2009 16 INT C DIG SI
[2]  
[Anonymous], 2010, OD 2010 SPEAK LANG R
[3]  
Bandanau D, 2016, INT CONF ACOUST SPEE, P4945, DOI 10.1109/ICASSP.2016.7472618
[4]   INTERPRETATION OF BIOMECHANICAL SIMULATIONS OF NORMAL AND CHAOTIC VOCAL FOLD OSCILLATIONS WITH EMPIRICAL EIGENFUNCTIONS [J].
BERRY, DA ;
HERZEL, H ;
TITZE, IR ;
KRISCHER, K .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 95 (06) :3595-3604
[5]  
Cole R.A., 1998, 5 INT C SPOK LANG PR
[6]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[7]  
Ghiurcau MV, 2011, INT CONF ACOUST SPEE, P4944
[8]   A nonlinear operator-based speech feature analysis method with application to vocal fold pathology assessment [J].
Hansen, JHL ;
Gavidia-Ceballos, L ;
Kaiser, JF .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 1998, 45 (03) :300-313
[9]   Speaker Recognition by Machines and Humans [J].
Hansen, John H. L. ;
Hasan, Taufiq .
IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (06) :74-99
[10]   Characterization of Healthy and Pathological Voice Through Measures Based on Nonlinear Dynamics [J].
Henriquez, Patricia ;
Alonso, Jesus B. ;
Ferrer, Miguel A. ;
Travieso, Carlos M. ;
Godino-Llorente, Juan I. ;
Diaz-de-Maria, Fernando .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06) :1186-1195