Conformer LLM - Convolution Augmented Large Language Models

被引：0

作者：

Vermas, Prateek ^{[1
]}

机构：

[1] Stanford Univ, 450 Serra Mall, Stanford, CA 94305 USA

来源：

SPEECH AND COMPUTER, SPECOM 2024, PT II | 2025年 / 15300卷

关键词：

Conformers; Language modeling; Transformer; GPT;

D O I：

10.1007/978-3-031-78014-1_24

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This work builds together two popular blocks of neural architecture, namely convolutional layers and Transformers, for large language models (LLMs). Non-causal conformers are used ubiquitously in automatic speech recognition. This work aims to adapt these architectures in a causal setup for training LLMs. Transformers decoders effectively capture long-range dependencies over several modalities and form a core backbone of modern advancements in machine learning. Convolutional architectures have been popular in extracting features in domains such as raw 1-D signals, speech, and images, to name a few. In this paper, by combining local and global dependencies over latent representations using causal convolutional filters and Transformer, we achieve significant gains in performance. This work showcases a robust speech architecture that can be integrated/adapted in a causal setup beyond speech applications for large-scale language modeling.

引用

页码：326 / 333

页数：8

共 50 条

[1] Enrich Humanoids With Large Language Models (LLM)
Antikatzidis, Angelos
Feidakis, Michalis
Marathaki, Konstantina
Toumanidis, Lazaros
Nikolaou, Grigoris
Patrikakis, Charalampos Z.
2024 IEEE GLOBAL ENGINEERING EDUCATION CONFERENCE, EDUCON 2024, 2024,
[2] Conformer: Convolution-augmented Transformer for Speech Recognition
Gulati, Anmol
Qin, James
Chiu, Chung-Cheng
Parmar, Niki
Zhang, Yu
Yu, Jiahui
Han, Wei
Wang, Shibo
Zhang, Zhengdong
Wu, Yonghui
Pang, Ruoming
INTERSPEECH 2020, 2020, : 5036 - 5040
[3] Large language models (LLM) and ChatGPT: a medical student perspective
Arachchige, Arosh S. Perera Molligoda S.
EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2023, 50 (08) : 2248 - 2249
[4] LLM-Pruner: On the Structural Pruning of Large Language Models
Ma, Xinyin
Fang, Gongfan
Wang, Xinchao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[5] Large language models (LLM) and ChatGPT: a medical student perspective
Arosh S. Perera Molligoda Arachchige
European Journal of Nuclear Medicine and Molecular Imaging, 2023, 50 : 2248 - 2249
[6] Large language models (LLM) and ChatGPT: what will the impact on nuclear medicine be?
Alberts, Ian L.
Mercolli, Lorenzo
Pyka, Thomas
Prenosil, George
Shi, Kuangyu
Rominger, Axel
Afshar-Oromieh, Ali
EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2023, 50 (06) : 1549 - 1552
[7] ST-LLM: Large Language Models Are Effective Temporal Learners
Liu, Ruyang
Li, Chen
Tang, Haoran
Ge, Yixiao
Shan, Ying
Li, Ge
COMPUTER VISION-ECCV 2024, PT LVII, 2025, 15115 : 1 - 18
[8] LLM-PBE: Assessing Data Privacy in Large Language Models
Li, Qinbin
Hong, Junyuan
Xie, Chulin
Tan, Jeffrey
Xin, Rachel
Hou, Junyi
Yin, Xavier
Wang, Zhun
Hendrycks, Dan
Wang, Zhangyang
Li, Bo
He, Bingsheng
Song, Dawn
PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (11): : 3201 - 3214
[9] DataStates-LLM: Lazy Asynchronous Checkpointing for Large Language Models
Maurya, Avinash
Underwood, Robert
Rafique, M. Mustafa
Cappello, Franck
Nicolae, Bogdan
PROCEEDINGS OF THE 33RD INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2024, 2024,
[10] Large language models (LLM) and ChatGPT: what will the impact on nuclear medicine be?
Ian L. Alberts
Lorenzo Mercolli
Thomas Pyka
George Prenosil
Kuangyu Shi
Axel Rominger
Ali Afshar-Oromieh
European Journal of Nuclear Medicine and Molecular Imaging, 2023, 50 : 1549 - 1552

← 1 2 3 4 5 →