Learning to Switch off, Switch on, and Integrate Modalities in Large Pre-trained Transformers

被引:0
|
作者
Duseja, Tejas [1 ]
Annervaz, K. M. [1 ]
Duggani, Jeevithiesh [1 ]
Zacharia, Shyam [2 ]
Free, Michael [3 ]
Dukkipati, Ambedkar [1 ]
机构
[1] Indian Inst Sci, Bengaluru, India
[2] British Telcom, Bengaluru, India
[3] British Telcom, London, England
来源
2024 IEEE 7TH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL, MIPR 2024 | 2024年
关键词
Multi-modal emotion recognition; sentiment analysis; pre-trained models;
D O I
10.1109/MIPR62202.2024.00070
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer models that revolutionized foundation models are ubiquitous nowadays. Hence, there has been a surge in pre-trained transformers that can be fine-tuned to perform different downstream tasks. Most pre-trained transformers are trained only on a single modality, and there is no direct way to fine-tune them in multiple modalities. To tackle this issue, in this paper, we propose a general-purpose gate, SSIM (Switch off, Switch on, and Integrate Modalities), by which one can integrate other modalities into large pre-trained language transformers. The proposed SSIM gate helps to obtain the unified representation by soft-switching between multi-modal interactions. To evaluate our approach, we have established benchmarks using pre-trained language transformers like BERT, XLNet, and T5 on multi-modal tasks such as Sentiment and Emotion analysis (CMU-MOSI, CMU-MOSEI), Emotion Recognition in Conversations (IEMOCAP, MELD), and Multimodal Intent Recognition (MIntRec), achieving close to State-of-the-art results.
引用
收藏
页码:403 / 409
页数:7
相关论文
共 32 条
  • [1] Handwritten Document Recognition Using Pre-trained Vision Transformers
    Parres, Daniel
    Anitei, Dan
    Paredes, Roberto
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 173 - 190
  • [2] Emotion Recognition with Pre-Trained Transformers Using Multimodal Signals
    Vazquez-Rodriguez, Juan
    Lefebvre, Gregoire
    Cumin, Julien
    Crowley, James L.
    2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2022,
  • [3] On the Validity of Pre-Trained Transformers for Natural Language Processing in the Software Engineering Domain
    von der Mosel, Julian
    Trautsch, Alexander
    Herbold, Steffen
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (04) : 1487 - 1507
  • [4] Efficient utilization of pre-trained models: A review of sentiment analysis via prompt learning
    Bu, Kun
    Liu, Yuanchao
    Ju, Xiaolong
    KNOWLEDGE-BASED SYSTEMS, 2024, 283
  • [5] Using Generative Pre-Trained Transformers (GPT) for Electricity Price Trend Forecasting in the Spanish Market
    Medina, Alberto Menendez
    Alvaro, Jose Antonio Heredia
    ENERGIES, 2024, 17 (10)
  • [6] Simple and Effective Multimodal Learning Based on Pre-Trained Transformer Models
    Miyazawa, Kazuki
    Kyuragi, Yuta
    Nagai, Takayuki
    IEEE ACCESS, 2022, 10 : 29821 - 29833
  • [7] Transfer learning of pre-trained CNNs on digital transaction fraud detection
    Tekkali, Chandana Gouri
    Natarajan, Karthika
    INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS, 2024, 28 (03) : 571 - 580
  • [8] Sentiment Analysis of Product Reviews Using Machine Learning and Pre-Trained LLM
    Ghatora, Pawanjit Singh
    Hosseini, Seyed Ebrahim
    Pervez, Shahbaz
    Iqbal, Muhammad Javed
    Shaukat, Nabil
    BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (12)
  • [9] Detecting the Stages of Alzheimer's Disease with Pre-trained Deep Learning Architectures
    Savas, Serkan
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2022, 47 (02) : 2201 - 2218
  • [10] TARGET SPEECH EXTRACTION WITH PRE-TRAINED SELF-SUPERVISED LEARNING MODELS
    Peng, Junyi
    Delcroix, Marc
    Ochiai, Tsubasa
    Plchot, Oldrich
    Araki, Shoko
    Cemocky, Jan
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, : 10421 - 10425