Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning

被引:2
|
作者
Chang, Xuankai [1 ]
Yan, Brian [1 ]
Fujita, Yuya [2 ]
Maekaku, Takashi [2 ]
Watanabe, Shinji [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Yahoo Japan Corp, Nagoya, Aichi, Japan
来源
基金
美国国家科学基金会;
关键词
self-supervised learning; discrete tokens; discretized input; speech recognition;
D O I
10.21437/Interspeech.2023-2051
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Self-supervised learning (SSL) of speech has shown impressive results in speech-related tasks, particularly in automatic speech recognition (ASR). While most methods employ the output of intermediate layers of the SSL model as real-valued features for downstream tasks, there is potential in exploring alternative approaches that use discretized token sequences. This approach offers benefits such as lower storage requirements and the ability to apply techniques from natural language processing. In this paper, we propose a new protocol that utilizes discretized token sequences in ASR tasks, which includes de-duplication and sub-word modeling to enhance the input sequence. It reduces computational cost by decreasing the length of the sequence. Our experiments on the LibriSpeech dataset demonstrate that our proposed protocol performs competitively with conventional ASR systems using continuous input features, while reducing computational and storage costs.
引用
收藏
页码:1399 / 1403
页数:5
相关论文
共 50 条
  • [31] Depth Edge and Structure Optimization-Based End-to-End Self-Supervised Stereo Matching
    Yang, Wenbang
    Cheng, Xianjing
    Yong, Zhao
    Qian, Ren
    Li, Jianhua
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023, 37 (13)
  • [32] Towards End-to-End Unsupervised Saliency Detection with Self-Supervised Top-Down Context
    Song, Yicheng
    Gao, Shuyong
    Xing, Haozhe
    Cheng, Yiting
    Wang, Yan
    Zhang, Wenqiang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5532 - 5541
  • [33] End-to-End Learning from Noisy Crowd to Supervised Machine Learning Models
    Younesian, Taraneh
    Hong, Chi
    Ghiassi, Amirmasoud
    Birke, Robert
    Chen, Lydia Y.
    2020 IEEE SECOND INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE (COGMI 2020), 2020, : 17 - 26
  • [34] Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation
    Yeh, Sung-Lin
    Lin, Yun-Shao
    Lee, Chi-Chun
    INTERSPEECH 2020, 2020, : 536 - 540
  • [35] Semi-supervised end-to-end ASR via teacher-student learning with conditional posterior distribution
    Zhang, Zi-qiang
    Song, Yan
    Zhang, Jian-shu
    McLoughlin, Ian
    Dai, Li-Rong
    INTERSPEECH 2020, 2020, : 3580 - 3584
  • [36] Comparison of computed tomography image features extracted by radiomics, self-supervised learning and end-to-end deep learning for outcome prediction of oropharyngeal cancer
    Ma, Baoqiang
    Guo, Jiapan
    Chu, Hung
    van Dijk, Lisanne V.
    van Ooijen, Peter M. A.
    Langendijk, Johannes A.
    Both, Stefan
    Sijtsema, Nanna M.
    PHYSICS & IMAGING IN RADIATION ONCOLOGY, 2023, 28
  • [37] Data Augmentation Using CycleGAN for End-to-End Children ASR
    Singh, Dipesh K.
    Amin, Preet P.
    Sailor, Hardik B.
    Patil, Hemant A.
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 511 - 515
  • [38] Iterative Compression of End-to-End ASR Model using AutoML
    Mehrotra, Abhinav
    Dudziak, Lukasz
    Yeo, Jinsu
    Lee, Young-yoon
    Vipperla, Ravichander
    Abdelfattah, Mohamed S.
    Bhattacharya, Sourav
    Ishtiaq, Samin
    Ramos, Alberto Gil C. P.
    Lee, SangJeong
    Kim, Daehyun
    Lane, Nicholas D.
    INTERSPEECH 2020, 2020, : 3361 - 3365
  • [39] LEARNING WORD-LEVEL CONFIDENCE FOR SUBWORD END-TO-END ASR
    Qiu, David
    Li, Qiujia
    He, Yanzhang
    Zhang, Yu
    Li, Bo
    Cao, Liangliang
    Prabhavalkar, Rohit
    Bhatia, Deepti
    Li, Wei
    Hu, Ke
    Sainath, Tara N.
    McGraw, Ian
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6393 - 6397
  • [40] End-to-end model for automatic seizure detection using supervised contrastive learning
    Li, Haotian
    Dong, Xingchen
    Zhong, Xiangwen
    Li, Chuanyu
    Cui, Haozhou
    Zhou, Weidong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133