IMPROVING VIETNAMESE ACCENT RECOGNITION USING ASR TRANSFER LEARNING

被引:2
作者
Ta, Bao Thang [1 ]
Dang, Xuan Vuong [1 ]
Duong, Quang Tien [1 ]
Le, Nhat Minh [1 ]
Do, Van Hai [2 ]
机构
[1] Viettel Grp, Viettel Cyberspace Ctr, Hanoi, Vietnam
[2] Thuyloi Univ, Hanoi, Vietnam
来源
2022 25TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA 2022) | 2022年
关键词
accent recognition; speech recognition; conformer; Vietnamese speech processing;
D O I
10.1109/O-COCOSDA202257103.2022.9997947
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Accent Recognition (AR) is a critical task in voice-controlled systems. If accent information is known in advance, voice-controlled systems can switch to a suitable accent-specific mode to improve their performance and user experience. However, available accent datasets, especially Vietnamese, are relatively small, making AR very challenging. To deal with these drawbacks, this paper proposes a transfer learning method using pretrained ASR models for Vietnamese accent recognition. This helps the system utilize available speech recognition systems while capturing implicit linguistic and phonetic information learned in ASR to improve its performance. Several experiments were conducted on a Vietnamese 8kHz telephone call dataset, which showed a significant improvement of the proposed system over existing Vietnamese AR models.
引用
收藏
页数:6
相关论文
共 24 条
[1]  
[Anonymous], 2018, 27 INT C COMP LING C
[2]  
Ardila R, 2020, Arxiv, DOI arXiv:1912.06670
[3]   EFFICIENT CONFORMER: PROGRESSIVE DOWNSAMPLING AND GROUPED ATTENTION FOR AUTOMATIC SPEECH RECOGNITION [J].
Burchi, Maxime ;
Vielzeuf, Valentin .
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, :8-15
[4]  
Dat V. T., 2022, VNU J SCI COMPUTER S, V38
[5]   AN END-TO-END SPEECH ACCENT RECOGNITION METHOD BASED ON HYBRID CTC/ATTENTION TRANSFORMER ASR [J].
Gao, Qiang ;
Wu, Haiwei ;
Sun, Yanqing ;
Duan, Yitao .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :7253-7257
[6]   Conformer: Convolution-augmented Transformer for Speech Recognition [J].
Gulati, Anmol ;
Qin, James ;
Chiu, Chung-Cheng ;
Parmar, Niki ;
Zhang, Yu ;
Yu, Jiahui ;
Han, Wei ;
Wang, Shibo ;
Zhang, Zhengdong ;
Wu, Yonghui ;
Pang, Ruoming .
INTERSPEECH 2020, 2020, :5036-5040
[7]   Spoken Arabic dialect recognition using X-vectors [J].
Hanani, Abualsoud ;
Naser, Rabee .
NATURAL LANGUAGE ENGINEERING, 2020, 26 (06) :691-700
[8]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[9]  
Hung Pham Ngoc, 2016, J. Comput. Sci. Cybern., V32, P19
[10]   Accent Identification by Combining Deep Neural Networks and Recurrent Neural Networks Trained on Long and Short Term Features [J].
Jiao, Yishan ;
Tu, Ming ;
Berisha, Visar ;
Liss, Julie .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :2388-2392