Learning to Switch off, Switch on, and Integrate Modalities in Large Pre-trained Transformers

被引：0

作者：

Duseja, Tejas ^{[1
]}

Annervaz, K. M. ^{[1
]}

Duggani, Jeevithiesh ^{[1
]}

Zacharia, Shyam ^{[2
]}

Free, Michael ^{[3
]}

Dukkipati, Ambedkar ^{[1
]}

机构：

[1] Indian Inst Sci, Bengaluru, India

[2] British Telcom, Bengaluru, India

[3] British Telcom, London, England

来源：

2024 IEEE 7TH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL, MIPR 2024 | 2024年

关键词：

Multi-modal emotion recognition; sentiment analysis; pre-trained models;

D O I：

10.1109/MIPR62202.2024.00070

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformer models that revolutionized foundation models are ubiquitous nowadays. Hence, there has been a surge in pre-trained transformers that can be fine-tuned to perform different downstream tasks. Most pre-trained transformers are trained only on a single modality, and there is no direct way to fine-tune them in multiple modalities. To tackle this issue, in this paper, we propose a general-purpose gate, SSIM (Switch off, Switch on, and Integrate Modalities), by which one can integrate other modalities into large pre-trained language transformers. The proposed SSIM gate helps to obtain the unified representation by soft-switching between multi-modal interactions. To evaluate our approach, we have established benchmarks using pre-trained language transformers like BERT, XLNet, and T5 on multi-modal tasks such as Sentiment and Emotion analysis (CMU-MOSI, CMU-MOSEI), Emotion Recognition in Conversations (IEMOCAP, MELD), and Multimodal Intent Recognition (MIntRec), achieving close to State-of-the-art results.

引用

页码：403 / 409

页数：7

共 32 条

[1] Handwritten Document Recognition Using Pre-trained Vision Transformers
Parres, Daniel
Anitei, Dan
Paredes, Roberto
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 173 - 190
[2] Emotion Recognition with Pre-Trained Transformers Using Multimodal Signals
Vazquez-Rodriguez, Juan
Lefebvre, Gregoire
Cumin, Julien
Crowley, James L.
2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2022,
[3] On the Validity of Pre-Trained Transformers for Natural Language Processing in the Software Engineering Domain
von der Mosel, Julian
Trautsch, Alexander
Herbold, Steffen
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (04) : 1487 - 1507
[4] Efficient utilization of pre-trained models: A review of sentiment analysis via prompt learning
Bu, Kun
Liu, Yuanchao
Ju, Xiaolong
KNOWLEDGE-BASED SYSTEMS, 2024, 283
[5] Using Generative Pre-Trained Transformers (GPT) for Electricity Price Trend Forecasting in the Spanish Market
Medina, Alberto Menendez
Alvaro, Jose Antonio Heredia
ENERGIES, 2024, 17 (10)
[6] Simple and Effective Multimodal Learning Based on Pre-Trained Transformer Models
Miyazawa, Kazuki
Kyuragi, Yuta
Nagai, Takayuki
IEEE ACCESS, 2022, 10 : 29821 - 29833
[7] Transfer learning of pre-trained CNNs on digital transaction fraud detection
Tekkali, Chandana Gouri
Natarajan, Karthika
INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS, 2024, 28 (03) : 571 - 580
[8] Sentiment Analysis of Product Reviews Using Machine Learning and Pre-Trained LLM
Ghatora, Pawanjit Singh
Hosseini, Seyed Ebrahim
Pervez, Shahbaz
Iqbal, Muhammad Javed
Shaukat, Nabil
BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (12)
[9] Detecting the Stages of Alzheimer's Disease with Pre-trained Deep Learning Architectures
Savas, Serkan
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2022, 47 (02) : 2201 - 2218
[10] TARGET SPEECH EXTRACTION WITH PRE-TRAINED SELF-SUPERVISED LEARNING MODELS
Peng, Junyi
Delcroix, Marc
Ochiai, Tsubasa
Plchot, Oldrich
Araki, Shoko
Cemocky, Jan
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, : 10421 - 10425

← 1 2 3 4 →