Conformer LLM - Convolution Augmented Large Language Models

被引:0
|
作者
Vermas, Prateek [1 ]
机构
[1] Stanford Univ, 450 Serra Mall, Stanford, CA 94305 USA
来源
SPEECH AND COMPUTER, SPECOM 2024, PT II | 2025年 / 15300卷
关键词
Conformers; Language modeling; Transformer; GPT;
D O I
10.1007/978-3-031-78014-1_24
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work builds together two popular blocks of neural architecture, namely convolutional layers and Transformers, for large language models (LLMs). Non-causal conformers are used ubiquitously in automatic speech recognition. This work aims to adapt these architectures in a causal setup for training LLMs. Transformers decoders effectively capture long-range dependencies over several modalities and form a core backbone of modern advancements in machine learning. Convolutional architectures have been popular in extracting features in domains such as raw 1-D signals, speech, and images, to name a few. In this paper, by combining local and global dependencies over latent representations using causal convolutional filters and Transformer, we achieve significant gains in performance. This work showcases a robust speech architecture that can be integrated/adapted in a causal setup beyond speech applications for large-scale language modeling.
引用
收藏
页码:326 / 333
页数:8
相关论文
共 50 条
  • [1] Enrich Humanoids With Large Language Models (LLM)
    Antikatzidis, Angelos
    Feidakis, Michalis
    Marathaki, Konstantina
    Toumanidis, Lazaros
    Nikolaou, Grigoris
    Patrikakis, Charalampos Z.
    2024 IEEE GLOBAL ENGINEERING EDUCATION CONFERENCE, EDUCON 2024, 2024,
  • [2] Conformer: Convolution-augmented Transformer for Speech Recognition
    Gulati, Anmol
    Qin, James
    Chiu, Chung-Cheng
    Parmar, Niki
    Zhang, Yu
    Yu, Jiahui
    Han, Wei
    Wang, Shibo
    Zhang, Zhengdong
    Wu, Yonghui
    Pang, Ruoming
    INTERSPEECH 2020, 2020, : 5036 - 5040
  • [3] Large language models (LLM) and ChatGPT: a medical student perspective
    Arachchige, Arosh S. Perera Molligoda S.
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2023, 50 (08) : 2248 - 2249
  • [4] LLM-Pruner: On the Structural Pruning of Large Language Models
    Ma, Xinyin
    Fang, Gongfan
    Wang, Xinchao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Large language models (LLM) and ChatGPT: a medical student perspective
    Arosh S. Perera Molligoda Arachchige
    European Journal of Nuclear Medicine and Molecular Imaging, 2023, 50 : 2248 - 2249
  • [6] Large language models (LLM) and ChatGPT: what will the impact on nuclear medicine be?
    Alberts, Ian L.
    Mercolli, Lorenzo
    Pyka, Thomas
    Prenosil, George
    Shi, Kuangyu
    Rominger, Axel
    Afshar-Oromieh, Ali
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2023, 50 (06) : 1549 - 1552
  • [7] ST-LLM: Large Language Models Are Effective Temporal Learners
    Liu, Ruyang
    Li, Chen
    Tang, Haoran
    Ge, Yixiao
    Shan, Ying
    Li, Ge
    COMPUTER VISION-ECCV 2024, PT LVII, 2025, 15115 : 1 - 18
  • [8] LLM-PBE: Assessing Data Privacy in Large Language Models
    Li, Qinbin
    Hong, Junyuan
    Xie, Chulin
    Tan, Jeffrey
    Xin, Rachel
    Hou, Junyi
    Yin, Xavier
    Wang, Zhun
    Hendrycks, Dan
    Wang, Zhangyang
    Li, Bo
    He, Bingsheng
    Song, Dawn
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (11): : 3201 - 3214
  • [9] DataStates-LLM: Lazy Asynchronous Checkpointing for Large Language Models
    Maurya, Avinash
    Underwood, Robert
    Rafique, M. Mustafa
    Cappello, Franck
    Nicolae, Bogdan
    PROCEEDINGS OF THE 33RD INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2024, 2024,
  • [10] Large language models (LLM) and ChatGPT: what will the impact on nuclear medicine be?
    Ian L. Alberts
    Lorenzo Mercolli
    Thomas Pyka
    George Prenosil
    Kuangyu Shi
    Axel Rominger
    Ali Afshar-Oromieh
    European Journal of Nuclear Medicine and Molecular Imaging, 2023, 50 : 1549 - 1552