Self-supervised Learning Representation based Accent Recognition with Persistent Accent Memory

被引:1
作者
Li, Rui [1 ,6 ]
Xie, Zhiwei [1 ,6 ]
Xu, Haihua [2 ]
Peng, Yizhou [3 ]
Liu, Hexin [4 ]
Huang, Hao [1 ,5 ]
Chng, Eng Siong [4 ]
机构
[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China
[2] Bytedance, Beijing, Peoples R China
[3] Natl Univ Singapore, Singapore, Singapore
[4] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
[5] Xinjiang Key Lab Multilingual Informat Technol, Urumqi, Peoples R China
[6] AISG NTU NUS Joint Speech Lab, Singapore, Singapore
来源
INTERSPEECH 2023 | 2023年
基金
国家重点研发计划;
关键词
WavLM; Self-supervised learning; representation; accent recognition; persistent accent memory; Conformer;
D O I
10.21437/Interspeech.2023-1702
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Accent recognition (AR) is challenging due to the lack of training data as well as the accents are entangled with speakers and regional characteristics. This paper aims to improve AR performance from two perspectives. First, to alleviate the data insufficiency problem, we employ the self-supervised learning representations (SSLRs) extracted from a pre-trained model to build the AR models. With the help of SSLRs, it gains significant performance improvement compared with the traditional acoustic features. Secondly, we proposed a persistent accent memory (PAM) as contextual knowledge to bias the AR models. The accent embeddings that are extracted from all training data by the encoder of AR models are clustered to form an accent codebook, i.e. PAM. In addition, we propose diverse attention mechanisms to investigate the optimal utilization of PAM. We observe that the best performance is obtained by selecting the most relevant accent embeddings.
引用
收藏
页码:1968 / 1972
页数:5
相关论文
共 35 条
  • [1] [Anonymous], 2021, INTERSPEECH, DOI DOI 10.21437/Interspeech.2021-1075
  • [2] Arasteh S. T., 2020, ARXIV201104896
  • [3] Baevski A, 2020, ADV NEUR IN, V33
  • [4] Chang X., 2022, ARXIV220400540
  • [5] Chen S., 2022, ARXIV220412765
  • [6] WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
    Chen, Sanyuan
    Wang, Chengyi
    Chen, Zhengyang
    Wu, Yu
    Liu, Shujie
    Chen, Zhuo
    Li, Jinyu
    Kanda, Naoyuki
    Yoshioka, Takuya
    Xiao, Xiong
    Wu, Jian
    Zhou, Long
    Ren, Shuo
    Qian, Yanmin
    Qian, Yao
    Zeng, Michael
    Yu, Xiangzhan
    Wei, Furu
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1505 - 1518
  • [7] Chen Z., 2022, ARXIV221105172
  • [8] Unsupervised Cross-lingual Representation Learning for Speech Recognition
    Conneau, Alexis
    Baevski, Alexei
    Collobert, Ronan
    Mohamed, Abdelrahman
    Auli, Michael
    [J]. INTERSPEECH 2021, 2021, : 2426 - 2430
  • [9] Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning
    Das, Nilaksh
    Bodapati, Sravan
    Sunkara, Monica
    Srinivasan, Sundararajan
    Chau, Duen Horng
    [J]. INTERSPEECH 2021, 2021, : 1314 - 1318
  • [10] Improving Accent Identification and Accented Speech Recognition Under a Framework of Self-supervised Learning
    Deng, Keqi
    Cao, Songjun
    Ma, Long
    [J]. INTERSPEECH 2021, 2021, : 1504 - 1508