Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks

被引:0
作者
Chang, Kai-Wei [1 ]
Hsu, Ming-Hao [2 ]
Li, Shan-Wen [3 ]
Lee, Hung-yi [2 ]
机构
[1] Natl Taiwan Univ, Grad Inst Commun Engn, Taipei, Taiwan
[2] Natl Taiwan Univ, Dept Elect Engn, Taipei, Taiwan
[3] Meta AI, Menlo Pk, CA USA
来源
INTERSPEECH 2024 | 2024年
关键词
In-context learning; speech language model; prompt tuning; few-shot learning; speech classification;
D O I
10.21437/Interspeech.2024-1932
中图分类号
学科分类号
摘要
Ever since the development of GPT-3 in the natural language processing (NLP) field, in-context learning (ICL) has played an essential role in utilizing large language models (LLMs). By presenting the LM utterance-label demonstrations at the input, the LM can accomplish few-shot learning without relying on gradient descent or requiring explicit modification of its parameters. This enables the LM to perform various downstream tasks in a black-box manner. Despite the success of ICL in NLP, little work is exploring the possibility of ICL in speech processing. This study is the first work exploring ICL for speech classification tasks with textless speech LM. We first show that the current speech LM lacks the ICL capability. We then perform warmup training on the speech LM, equipping the LM with demonstration learning capability. This paper explores and proposes the first speech LM capable of performing unseen classification tasks in an ICL manner.
引用
收藏
页码:4139 / 4143
页数:5
相关论文
共 28 条
  • [1] Arik SÖ, 2018, ADV NEUR IN, V31
  • [2] Bommasani R., 2021, arXiv
  • [3] Brown TB, 2020, ADV NEUR IN, V33
  • [4] IEMOCAP: interactive emotional dyadic motion capture database
    Busso, Carlos
    Bulut, Murtaza
    Lee, Chi-Chun
    Kazemzadeh, Abe
    Mower, Emily
    Kim, Samuel
    Chang, Jeannette N.
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
  • [5] Chang K., 2023, SPEECHPROMPT V2 PROM
  • [6] An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks
    Chang, Kai-Wei
    Tseng, Wei-Cheng
    Li, Shang-Wen
    Lee, Hung-yi
    [J]. INTERSPEECH 2022, 2022, : 5005 - 5009
  • [7] Understanding and Improving Visual Prompting: A Label-Mapping Perspective
    Chen, Aochuan
    Yao, Yuguang
    Chen, Pin-Yu
    Zhang, Yihua
    Liu, Sijia
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19133 - 19143
  • [8] Chen MD, 2022, NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, P3558
  • [9] Dong QX, 2024, Arxiv, DOI [arXiv:2301.00234, DOI 10.48550/ARXIV.2301.00234]
  • [10] Gu YX, 2023, PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, P4849