Conformer LLM - Convolution Augmented Large Language Models

被引：0

作者：

Vermas, Prateek ^{[1
]}

机构：

[1] Stanford Univ, 450 Serra Mall, Stanford, CA 94305 USA

来源：

SPEECH AND COMPUTER, SPECOM 2024, PT II | 2025年 / 15300卷

关键词：

Conformers; Language modeling; Transformer; GPT;

D O I：

10.1007/978-3-031-78014-1_24

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This work builds together two popular blocks of neural architecture, namely convolutional layers and Transformers, for large language models (LLMs). Non-causal conformers are used ubiquitously in automatic speech recognition. This work aims to adapt these architectures in a causal setup for training LLMs. Transformers decoders effectively capture long-range dependencies over several modalities and form a core backbone of modern advancements in machine learning. Convolutional architectures have been popular in extracting features in domains such as raw 1-D signals, speech, and images, to name a few. In this paper, by combining local and global dependencies over latent representations using causal convolutional filters and Transformer, we achieve significant gains in performance. This work showcases a robust speech architecture that can be integrated/adapted in a causal setup beyond speech applications for large-scale language modeling.

引用

页码：326 / 333

页数：8

共 50 条

[21] TCRA-LLM: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction
Liu, Junyi
Li, Liangzhi
Xiang, Tong
Wang, Bowen
Qian, Yiming
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9796 - 9810
[22] Large language models (LLM) in computational social science: prospects, current state, and challenges
Thapa, Surendrabikram
Shiwakoti, Shuvam
Shah, Siddhant Bikram
Adhikari, Surabhi
Veeramani, Hariram
Nasim, Mehwish
Naseem, Usman
SOCIAL NETWORK ANALYSIS AND MINING, 2025, 15 (01)
[23] REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models
Zhang, Ruisi
Hussain, Shehzeen Samarah
Neekhara, Paarth
Koushanfar, Farinaz
PROCEEDINGS OF THE 33RD USENIX SECURITY SYMPOSIUM, SECURITY 2024, 2024, : 1813 - 1830
[24] D2LLM: Decomposed and Distilled Large Language Models for Semantic Search
Liao, Zihan
Yu, Hang
Li, Jianguo
Wang, Jun
Zhang, Wei
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 14798 - 14814
[25] LLM-controller: Dynamic robot control adaptation using large language models
Zahedifar, Rasoul
Baghshah, Mahdieh Soleymani
Taheri, Alireza
ROBOTICS AND AUTONOMOUS SYSTEMS, 2025, 186
[26] Query Rewriting for Retrieval-Augmented Large Language Models
Ma, Xinbei
Gong, Yeyun
He, Pengcheng
Zhao, Hai
Duan, Nan
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5303 - 5315
[27] Benchmarking Large Language Models in Retrieval-Augmented Generation
Chen, Jiawei
Lin, Hongyu
Han, Xianpei
Sun, Le
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17754 - 17762
[28] Audio-LLM: Activating the Capabilities of Large Language Models to Comprehend Audio Data
Tang, Dongting Chenchong
Liu, Han
ADVANCES IN NEURAL NETWORKS-ISNN 2024, 2024, 14827 : 133 - 142
[29] Data augmented large language models for medical record generation
Zhang, Xuanyi
Zhao, Genghong
Ren, Yi
Wang, Weiguang
Cai, Wei
Zhao, Yan
Zhang, Xia
Liu, Jiren
APPLIED INTELLIGENCE, 2025, 55 (02)
[30] LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Kahng, Minsuk
Tenney, Ian
Pushkarna, Mahima
Liu, Michael Xieyang
Wexler, James
Reif, Emily
Kallarackal, Krystal
Chang, Minsuk
Terry, Michael
Dixon, Lucas
EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,

← 1 2 3 4 5 →