Self-supervised Learning Representation based Accent Recognition with Persistent Accent Memory

被引:1
作者
Li, Rui [1 ,6 ]
Xie, Zhiwei [1 ,6 ]
Xu, Haihua [2 ]
Peng, Yizhou [3 ]
Liu, Hexin [4 ]
Huang, Hao [1 ,5 ]
Chng, Eng Siong [4 ]
机构
[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China
[2] Bytedance, Beijing, Peoples R China
[3] Natl Univ Singapore, Singapore, Singapore
[4] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
[5] Xinjiang Key Lab Multilingual Informat Technol, Urumqi, Peoples R China
[6] AISG NTU NUS Joint Speech Lab, Singapore, Singapore
来源
INTERSPEECH 2023 | 2023年
基金
国家重点研发计划;
关键词
WavLM; Self-supervised learning; representation; accent recognition; persistent accent memory; Conformer;
D O I
10.21437/Interspeech.2023-1702
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Accent recognition (AR) is challenging due to the lack of training data as well as the accents are entangled with speakers and regional characteristics. This paper aims to improve AR performance from two perspectives. First, to alleviate the data insufficiency problem, we employ the self-supervised learning representations (SSLRs) extracted from a pre-trained model to build the AR models. With the help of SSLRs, it gains significant performance improvement compared with the traditional acoustic features. Secondly, we proposed a persistent accent memory (PAM) as contextual knowledge to bias the AR models. The accent embeddings that are extracted from all training data by the encoder of AR models are clustered to form an accent codebook, i.e. PAM. In addition, we propose diverse attention mechanisms to investigate the optimal utilization of PAM. We observe that the best performance is obtained by selecting the most relevant accent embeddings.
引用
收藏
页码:1968 / 1972
页数:5
相关论文
共 35 条
[1]  
Arasteh Soroosh Tayebi, 2020, ARXIV201104896
[2]  
Baevski A, 2020, ADV NEUR IN, V33
[3]  
Chang X., 2022, ARXIV220400540
[4]  
Chen S., 2022, ARXIV220412765
[5]   WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing [J].
Chen, Sanyuan ;
Wang, Chengyi ;
Chen, Zhengyang ;
Wu, Yu ;
Liu, Shujie ;
Chen, Zhuo ;
Li, Jinyu ;
Kanda, Naoyuki ;
Yoshioka, Takuya ;
Xiao, Xiong ;
Wu, Jian ;
Zhou, Long ;
Ren, Shuo ;
Qian, Yanmin ;
Qian, Yao ;
Zeng, Michael ;
Yu, Xiangzhan ;
Wei, Furu .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) :1505-1518
[6]  
Chen Z., 2022, ARXIV221105172
[7]   Unsupervised Cross-lingual Representation Learning for Speech Recognition [J].
Conneau, Alexis ;
Baevski, Alexei ;
Collobert, Ronan ;
Mohamed, Abdelrahman ;
Auli, Michael .
INTERSPEECH 2021, 2021, :2426-2430
[8]   Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning [J].
Das, Nilaksh ;
Bodapati, Sravan ;
Sunkara, Monica ;
Srinivasan, Sundararajan ;
Chau, Duen Horng .
INTERSPEECH 2021, 2021, :1314-1318
[9]   Improving Accent Identification and Accented Speech Recognition Under a Framework of Self-supervised Learning [J].
Deng, Keqi ;
Cao, Songjun ;
Ma, Long .
INTERSPEECH 2021, 2021, :1504-1508
[10]  
Gong X, 2021, INTERSPEECH, P1274, DOI [10.1109/TASLP.2022.3198546, 10.21437/Interspeech.2021-1075]