Conformer LLM - Convolution Augmented Large Language Models

被引:0
|
作者
Vermas, Prateek [1 ]
机构
[1] Stanford Univ, 450 Serra Mall, Stanford, CA 94305 USA
来源
SPEECH AND COMPUTER, SPECOM 2024, PT II | 2025年 / 15300卷
关键词
Conformers; Language modeling; Transformer; GPT;
D O I
10.1007/978-3-031-78014-1_24
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work builds together two popular blocks of neural architecture, namely convolutional layers and Transformers, for large language models (LLMs). Non-causal conformers are used ubiquitously in automatic speech recognition. This work aims to adapt these architectures in a causal setup for training LLMs. Transformers decoders effectively capture long-range dependencies over several modalities and form a core backbone of modern advancements in machine learning. Convolutional architectures have been popular in extracting features in domains such as raw 1-D signals, speech, and images, to name a few. In this paper, by combining local and global dependencies over latent representations using causal convolutional filters and Transformer, we achieve significant gains in performance. This work showcases a robust speech architecture that can be integrated/adapted in a causal setup beyond speech applications for large-scale language modeling.
引用
收藏
页码:326 / 333
页数:8
相关论文
共 50 条
  • [21] TCRA-LLM: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction
    Liu, Junyi
    Li, Liangzhi
    Xiang, Tong
    Wang, Bowen
    Qian, Yiming
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9796 - 9810
  • [22] Large language models (LLM) in computational social science: prospects, current state, and challenges
    Thapa, Surendrabikram
    Shiwakoti, Shuvam
    Shah, Siddhant Bikram
    Adhikari, Surabhi
    Veeramani, Hariram
    Nasim, Mehwish
    Naseem, Usman
    SOCIAL NETWORK ANALYSIS AND MINING, 2025, 15 (01)
  • [23] REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models
    Zhang, Ruisi
    Hussain, Shehzeen Samarah
    Neekhara, Paarth
    Koushanfar, Farinaz
    PROCEEDINGS OF THE 33RD USENIX SECURITY SYMPOSIUM, SECURITY 2024, 2024, : 1813 - 1830
  • [24] D2LLM: Decomposed and Distilled Large Language Models for Semantic Search
    Liao, Zihan
    Yu, Hang
    Li, Jianguo
    Wang, Jun
    Zhang, Wei
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 14798 - 14814
  • [25] LLM-controller: Dynamic robot control adaptation using large language models
    Zahedifar, Rasoul
    Baghshah, Mahdieh Soleymani
    Taheri, Alireza
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2025, 186
  • [26] Query Rewriting for Retrieval-Augmented Large Language Models
    Ma, Xinbei
    Gong, Yeyun
    He, Pengcheng
    Zhao, Hai
    Duan, Nan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5303 - 5315
  • [27] Benchmarking Large Language Models in Retrieval-Augmented Generation
    Chen, Jiawei
    Lin, Hongyu
    Han, Xianpei
    Sun, Le
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17754 - 17762
  • [28] Audio-LLM: Activating the Capabilities of Large Language Models to Comprehend Audio Data
    Tang, Dongting Chenchong
    Liu, Han
    ADVANCES IN NEURAL NETWORKS-ISNN 2024, 2024, 14827 : 133 - 142
  • [29] Data augmented large language models for medical record generation
    Zhang, Xuanyi
    Zhao, Genghong
    Ren, Yi
    Wang, Weiguang
    Cai, Wei
    Zhao, Yan
    Zhang, Xia
    Liu, Jiren
    APPLIED INTELLIGENCE, 2025, 55 (02)
  • [30] LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
    Kahng, Minsuk
    Tenney, Ian
    Pushkarna, Mahima
    Liu, Michael Xieyang
    Wexler, James
    Reif, Emily
    Kallarackal, Krystal
    Chang, Minsuk
    Terry, Michael
    Dixon, Lucas
    EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,