Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning

被引:2
|
作者
Chang, Xuankai [1 ]
Yan, Brian [1 ]
Fujita, Yuya [2 ]
Maekaku, Takashi [2 ]
Watanabe, Shinji [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Yahoo Japan Corp, Nagoya, Aichi, Japan
来源
基金
美国国家科学基金会;
关键词
self-supervised learning; discrete tokens; discretized input; speech recognition;
D O I
10.21437/Interspeech.2023-2051
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Self-supervised learning (SSL) of speech has shown impressive results in speech-related tasks, particularly in automatic speech recognition (ASR). While most methods employ the output of intermediate layers of the SSL model as real-valued features for downstream tasks, there is potential in exploring alternative approaches that use discretized token sequences. This approach offers benefits such as lower storage requirements and the ability to apply techniques from natural language processing. In this paper, we propose a new protocol that utilizes discretized token sequences in ASR tasks, which includes de-duplication and sub-word modeling to enhance the input sequence. It reduces computational cost by decreasing the length of the sequence. Our experiments on the LibriSpeech dataset demonstrate that our proposed protocol performs competitively with conventional ASR systems using continuous input features, while reducing computational and storage costs.
引用
收藏
页码:1399 / 1403
页数:5
相关论文
共 50 条
  • [41] End-to-end Boundary Exploration forWeakly-supervised Semantic Segmentation
    Chen, Jianjun
    Fang, Shancheng
    Xie, Hongtao
    Zha, Zhengjun
    Hu, Yue
    Tan, Jianlong
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2381 - 2390
  • [42] Verifiably Safe Exploration for End-to-End Reinforcement Learning
    Hunt, Nathan
    Fulton, Nathan
    Magliacane, Sara
    Trong Nghia Hoang
    Das, Subhro
    Solar-Lezama, Armando
    HSCC2021: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON HYBRID SYSTEMS: COMPUTATION AND CONTROL (PART OF CPS-IOT WEEK), 2021,
  • [43] Efficient end-to-end learning for quantizable representations
    Jeong, Yeonwoo
    Song, Hyun Oh
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [44] Self-Supervised Pre-Trained Speech Representation Based End-to-End Mispronunciation Detection and Diagnosis of Mandarin
    Shen, Yunfei
    Liu, Qingqing
    Fan, Zhixing
    Liu, Jiajun
    Wumaier, Aishan
    IEEE ACCESS, 2022, 10 : 106451 - 106462
  • [45] Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning
    Williams, Jason D.
    Asadi, Kavosh
    Zweig, Geoffrey
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 665 - 677
  • [46] SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition
    Fu, Li
    Li, Xiaoxiao
    Wang, Runyu
    Fan, Lu
    Zhang, Zhengchen
    Chen, Meng
    Wu, Youzheng
    He, Xiaodong
    INTERSPEECH 2022, 2022, : 1006 - 1010
  • [47] End-to-End Kernel Learning with Supervised Convolutional Kernel Networks
    Mairal, Julien
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [48] AN INVESTIGATION OF MULTILINGUAL ASR USING END-TO-END LF-MMI
    Tong, Sibo
    Garner, Philip N.
    Bourlard, Herve
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6061 - 6065
  • [49] Using Large Language Model for End-to-End Chinese ASR and NER
    Li, Yuang
    Yu, Jiawei
    Zhang, Min
    Ren, Mengxin
    Zhao, Yanqing
    Zhao, Xiaofeng
    Tao, Shimin
    Su, Jinsong
    Yang, Hao
    INTERSPEECH 2024, 2024, : 822 - 826
  • [50] META-LEARNING FOR IMPROVING RARE WORD RECOGNITION IN END-TO-END ASR
    Lux, Florian
    Ngoc Thang Vu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5974 - 5978