PERSONALIZATION OF END-TO-END SPEECH RECOGNITION ON MOBILE DEVICES FOR NAMED ENTITIES

被引:0
作者
Sim, Khe Chai [1 ]
Beaufays, Francoise [1 ]
Guliani, Arnaud Benard Dhruv [1 ]
Kabel, Andreas [1 ]
Khare, Nikhil [1 ]
Lucassen, Tamar [1 ]
Zadrazil, Petr [1 ]
Zhang, Harry [1 ]
Johnson, Leif [1 ]
Motta, Giovanni [1 ]
Zhou, Lillian [1 ]
机构
[1] Google, Mountain View, CA 94043 USA
来源
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年
关键词
personalization; vocabulary acquisition; on-device learning; speech recognition; FACTORIZED HIDDEN LAYER; NEURAL-NETWORK; ADAPTATION;
D O I
10.1109/asru46091.2019.9003775
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the effectiveness of several techniques to personalize end-to-end speech models and improve the recognition of proper names relevant to the user. These techniques differ in the amounts of user effort required to provide supervision, and are evaluated on how they impact speech recognition performance. We propose using keyword-dependent precision and recall metrics to measure vocabulary acquisition performance. We evaluate the algorithms on a dataset that we designed to contain names of persons that are difficult to recognize. Therefore, the baseline recall rate for proper names in this dataset is very low: 2.4%. A data synthesis approach we developed brings it to 48.6%, with no need for speech input from the user. With speech input, if the user corrects only the names, the name recall rate improves to 64.4%. If the user corrects all the recognition errors, we achieve the best recall of 73.5%. To eliminate the need to upload user data and store personalized models on a server, we focus on performing the entire personalization workflow on a mobile device.
引用
收藏
页码:23 / 30
页数:8
相关论文
共 29 条
[1]  
Abadi Martin, 2016, arXiv
[2]  
Aleksic P, 2015, INT CONF ACOUST SPEE, P5172, DOI 10.1109/ICASSP.2015.7178957
[3]  
[Anonymous], 2017, NEW ERA ROBUST SPEEC
[4]  
Bagby T, 2018, IEEE W SP LANG TECH, P506, DOI 10.1109/SLT.2018.8639690
[5]  
Graves A., 2012, ARXIV12113711, V58, P235
[6]  
Gulcehre Caglar, 2015, On using monolingual corpora in neural machine translation
[7]  
Hall K, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P1418
[8]  
He YZ, 2019, INT CONF ACOUST SPEE, P6381, DOI [10.1109/ICASSP.2019.8682336, 10.1109/icassp.2019.8682336]
[9]  
Ko T, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3586
[10]  
Li B, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, P526