Spectral Salt-and-Pepper Patch Masking for Self-Supervised Speech Representation Learning

被引:0
|
作者
Kim, June-Woo [1 ]
Chung, Hoon [2 ]
Jung, Ho-Young [1 ]
机构
[1] Kyungpook Natl Univ, Dept Artificial Intelligence, Daegu 41566, South Korea
[2] Elect & Telecommun Res Inst, Daejeon 34129, South Korea
基金
新加坡国家研究基金会;
关键词
self-supervised learning; speech representation learning; salt-and-pepper masking; spectrum patch masking; NETWORK;
D O I
10.3390/math11153418
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Recent advanced systems in the speech recognition domain use large Transformer neural networks that have been pretrained on massive speech data. General methods in the deep learning area have been frequently shared across various domains, and the Transformer model can also be used effectively across speech and image. In this paper, we introduce a novel masking method for self-supervised speech representation learning with salt-and-pepper (S & P) mask which is commonly used in computer vision. The proposed scheme includes consecutive quadrilateral-shaped S & P patches randomly contaminating the input speech spectrum. Furthermore, we modify the standard S & P mask to make it appropriate for the speech domain. In order to validate the effect of the proposed spectral S & P patch masking for the self-supervised representation learning approach, we conduct the pretraining and downstream experiments with two languages, English and Korean. To this end, we pretrain the speech representation model using each dataset and evaluate the pretrained models for feature extraction and fine-tuning performance on varying downstream tasks, respectively. The experimental outcomes clearly illustrate that the proposed spectral S & P patch masking is effective for various downstream tasks when combined with the conventional masking methods.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Self-Supervised Learning With Segmental Masking for Speech Representation
    Yue, Xianghu
    Lin, Jingru
    Gutierrez, Fabian Ritter
    Li, Haizhou
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1367 - 1379
  • [2] Self-Supervised Learning With Segmental Masking for Speech Representation
    Yue, Xianghu
    Lin, Jingru
    Gutierrez, Fabian Ritter
    Li, Haizhou
    IEEE Journal on Selected Topics in Signal Processing, 2022, 16 (06): : 1367 - 1379
  • [3] Self-Supervised Speech Representation Learning: A Review
    Mohamed, Abdelrahman
    Lee, Hung-yi
    Borgholt, Lasse
    Havtorn, Jakob D.
    Edin, Joakim
    Igel, Christian
    Kirchhoff, Katrin
    Li, Shang-Wen
    Livescu, Karen
    Maaloe, Lars
    Sainath, Tara N.
    Watanabe, Shinji
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1179 - 1210
  • [4] Self-supervised speech representation learning based on positive sample comparison and masking reconstruction
    Zhang, Wenlin
    Liu, Xuepeng
    Niu, Tong
    Chen, Qi
    Qu, Dan
    Tongxin Xuebao/Journal on Communications, 2022, 43 (07): : 163 - 171
  • [5] Phonetically Motivated Self-Supervised Speech Representation Learning
    Yue, Xianghu
    Li, Haizhou
    INTERSPEECH 2021, 2021, : 746 - 750
  • [6] CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
    Meng, Chutong
    Ao, Junyi
    Ko, Tom
    Wang, Mingxuan
    Li, Haizhou
    INTERSPEECH 2023, 2023, : 2978 - 2982
  • [7] Adversarial Masking for Self-Supervised Learning
    Shi, Yuge
    Siddharth, N.
    Torr, Philip H. S.
    Kosiorek, Adam R.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [8] A novel supervised learning algorithm for salt-and-pepper noise detection
    Yi Wang
    Reza Adhmai
    Jian Fu
    Huda Al-Ghaib
    International Journal of Machine Learning and Cybernetics, 2015, 6 : 687 - 697
  • [9] TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
    Liu, Andy T.
    Li, Shang-Wen
    Lee, Hung-yi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2351 - 2366
  • [10] A novel supervised learning algorithm for salt-and-pepper noise detection
    Wang, Yi
    Adhmai, Reza
    Fu, Jian
    Al-Ghaib, Huda
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2015, 6 (04) : 687 - 697