AutoSpeech: Neural Architecture Search for Speaker Recognition

被引:14
作者
Ding, Shaojin [1 ]
Chen, Tianlong [2 ]
Gong, Xinyu [1 ,2 ]
Zha, Weiwei [3 ]
Wang, Zhangyang [2 ]
机构
[1] Texas A&M Univ, Dept Comp Sci & Engn, College Stn, TX 77843 USA
[2] Univ Texas Austin, Dept Elect & Comp Engn, Austin, TX 78712 USA
[3] Univ Sci & Technol China, Sch Software Engn, Beijing, Peoples R China
来源
INTERSPEECH 2020 | 2020年
关键词
speaker recognition; neural architecture search;
D O I
10.21437/Interspeech.2020-1258
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often built with off-the-shelf backbones such as VGG-Net or ResNet. However, these backbones were originally proposed for image classification, and therefore may not be naturally fit for speaker recognition. Due to the prohibitive complexity of manually exploring the design space, we propose the first neural architecture search approach for the speaker recognition tasks, named as AutoSpeech. Our algorithm first identifies the optimal operation combination in a neural cell and then derives a CNN model by stacking the neural cell for multiple times. The final speaker recognition model can be obtained by training the derived CNN model through the standard scheme. To evaluate the proposed approach, we conduct experiments on both speaker identification and speaker verification tasks using the VoxCeleb1 dataset. Results demonstrate that the derived CNN architectures from the proposed approach significantly outperform current speaker recognition systems based on VGG-M, ResNet-18, and ResNet-34 backbones, while enjoying lower model complexity.
引用
收藏
页码:916 / 920
页数:5
相关论文
共 39 条
  • [1] [Anonymous], 2011, INTERSPEECH
  • [2] [Anonymous], 2010, International journal on emerging technologies
  • [3] [Anonymous], 2018, IEEE INT C AC SPEECH
  • [4] [Anonymous], 2010, OD 2010 SPEAK LANG R
  • [5] Baruwa Ahmed, 2019, ARXIV191205946
  • [6] Bhattacharya G., 2019, P INTERSPEECH
  • [7] Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)
  • [8] Cai W., 2018, P OD SPEAK LANG REC, P74
  • [9] Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621
  • [10] Chen Wuyang, 2019, P INT C LEARN REPR A