Conformer LLM - Convolution Augmented Large Language Models

被引:0
|
作者
Vermas, Prateek [1 ]
机构
[1] Stanford Univ, 450 Serra Mall, Stanford, CA 94305 USA
来源
SPEECH AND COMPUTER, SPECOM 2024, PT II | 2025年 / 15300卷
关键词
Conformers; Language modeling; Transformer; GPT;
D O I
10.1007/978-3-031-78014-1_24
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work builds together two popular blocks of neural architecture, namely convolutional layers and Transformers, for large language models (LLMs). Non-causal conformers are used ubiquitously in automatic speech recognition. This work aims to adapt these architectures in a causal setup for training LLMs. Transformers decoders effectively capture long-range dependencies over several modalities and form a core backbone of modern advancements in machine learning. Convolutional architectures have been popular in extracting features in domains such as raw 1-D signals, speech, and images, to name a few. In this paper, by combining local and global dependencies over latent representations using causal convolutional filters and Transformer, we achieve significant gains in performance. This work showcases a robust speech architecture that can be integrated/adapted in a causal setup beyond speech applications for large-scale language modeling.
引用
收藏
页码:326 / 333
页数:8
相关论文
共 50 条
  • [31] LLM-BLENDER: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion
    Jiang, Dongfu
    Ren, Xiang
    Lin, Bill Yuchen
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14165 - 14178
  • [32] Transmission Line Fault Classification Using Conformer Convolution-Augmented Transformer Model
    Lee, Meng-Yun
    Huang, Yu-Shan
    Chang, Chia-Jui
    Yang, Jia-Yu
    Liu, Chih-Wen
    Lin, Tzu-Chiao
    Lin, Yen-Bor
    APPLIED SCIENCES-BASEL, 2024, 14 (10):
  • [33] Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data
    Kim, Yubin
    Xu, Xuhai
    McDuff, Daniel
    Breazeal, Cynthia
    Park, Hae Won
    CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, 2024, 248 : 522 - 539
  • [34] LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation
    Kim, Kibum
    Yoon, Kanghoon
    Jeon, Jaehyeong
    In, Yeonjun
    Moon, Jinyoung
    Kim, Donghyun
    Park, Chanyoung
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2024, : 28306 - 28316
  • [35] PAG-LLM: Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errors
    Yadav, Vikas
    Tang, Zheng
    Srinivasan, Vijay
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2569 - 2573
  • [36] Tailoring Large Language Models to Radiology: A Preliminary Approach to LLM Adaptation for a Highly Specialized Domain
    Liu, Zhengliang
    Zhong, Aoxiao
    Li, Yiwei
    Yang, Longtao
    Ju, Chao
    Wu, Zihao
    Ma, Chong
    Shu, Peng
    Chen, Cheng
    Kim, Sekeun
    Dai, Haixing
    Zhao, Lin
    Zhu, Dajiang
    Liu, Jun
    Liu, Wei
    Shen, Dinggang
    Li, Quanzheng
    Liu, Tianming
    Li, Xiang
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I, 2024, 14348 : 464 - 473
  • [37] Symbol-LLM: Towards Foundational Symbol-centric Interface Large Language Models
    Xu, Fangzhi
    Wu, Zhiyong
    Sun, Qiushi
    Ren, Siyu
    Yuan, Fei
    Yuan, Shuai
    Lin, Qika
    Qiao, Yu
    Liu, Jun
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 13091 - 13116
  • [38] Smart Product Backlog: Automatic Classification of User Stories Using Large Language Models (LLM)
    Gaona-Cuevas, Mauricio
    Bucheli-Guerrero, Victor
    Vera-Rivera, Fredy
    REVISTA FACULTAD DE INGENIERIA, UNIVERSIDAD PEDAGOGICA Y TECNOLOGICA DE COLOMBIA, 2024, 33 (69):
  • [39] The RL/LLM Taxonomy Tree: Reviewing Synergies between Reinforcement Learning and Large Language Models
    Pternea, Moschoula
    Singh, Prerna
    Chakraborty, Abir
    Oruganti, Yagna
    Milletari, Mirco
    Bapat, Sayli
    Jiang, Kebei
    Journal of Artificial Intelligence Research, 2024, 80 : 1525 - 1573
  • [40] 3D-LLM: Injecting the 3D World into Large Language Models
    Hong, Yining
    Zhen, Haoyu
    Chen, Peihao
    Zheng, Shuhong
    Du, Yilun
    Chen, Zhenfang
    Gan, Chuang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36, NEURIPS 2023, 2023,