Combining Data Augmentations for CNN-Based Voice Command Recognition

被引:8
作者
Azarang, Arian [1 ]
Hansen, John [1 ]
Kehtarnavaz, Nasser [1 ]
机构
[1] Univ Texas Dallas, Dept Elect & Comp Engn, Richardson, TX 75080 USA
来源
2019 12TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION (HSI) | 2019年
关键词
Combining data augmentation methods for voice command recognition; CNN-based voice command recognition; voice command human interaction systems; CONVOLUTIONAL NEURAL-NETWORKS;
D O I
10.1109/hsi47298.2019.8942638
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents combining two data augmentation methods involving speed perturbation and room impulse response reverberation for the purpose of improving the generalization capability of convolutional neural networks when used for voice command recognition. Speed perturbation generates voice command variations caused by shorter or longer time durations of commands spoken by different speakers. Room impulse response reverberation generates voice command variations caused by reflected sound paths. The combination of these two augmentation methods is presented in this paper by examining a public domain dataset of voice commands. The experimental results based on the performance metric of word error rate indicate the improvement in voice command recognition rates when combining these data augmentation methods relative to using each augmentation method individually.
引用
收藏
页码:17 / 21
页数:5
相关论文
共 20 条
[1]  
[Anonymous], ARXIV170706265
[2]  
Azam G., 2015, INT C PHYS SUSTAINAB, V1, P81
[3]  
Bae HS, 2016, C IND ELECT APPL, P1542, DOI 10.1109/ICIEA.2016.7603830
[4]   End-to-End Speech Command Recognition with Capsule Network [J].
Bae, Jaesung ;
Kim, Dae-Shik .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :776-780
[5]  
Corradini A., 2002, P IEEE INT JOINT C N
[6]  
Cui XD, 2015, INT CONF ACOUST SPEE, P4545, DOI 10.1109/ICASSP.2015.7178831
[7]  
D'Souza C., 2017, ELEKTRON ELEKTROTECH, V7, P60
[8]  
Habets EA, 2006, Tech Rep, V2, P1
[9]  
Ko T, 2017, INT CONF ACOUST SPEE, P5220, DOI 10.1109/ICASSP.2017.7953152
[10]  
Ko T, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3586