Multi-modal sarcasm detection using ensemble net model

被引：0

作者：

Sukhavasi, Vidyullatha ^{[1
,2
]}

Dondeti, Venkatesulu ^{[3
]}

机构：

[1] Vignans Fdn Sci Technol & Res, Dept CSE, Guntur 522213, Andhra Pradesh, India

[2] BVRIT HYDERABAD Coll Engn Women, Dept CSE, Hyderabad 500090, Telangana, India

[3] Vignans Fdn Sci Technol & Res, Dept Adv CSE, Guntur 522213, Andhra Pradesh, India

来源：

KNOWLEDGE AND INFORMATION SYSTEMS | 2025年 / 67卷 / 01期

关键词：

Sarcasm detection; Hybrid EnsembleNet; Weighted fusion modality; Softmax layer; Natural language processing; Deep learning approach;

D O I：

10.1007/s10115-024-02227-y

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Generally, sarcasm is expressed via various verbal and non-verbal words. Various existing works on the detection of sarcasm have been performed in either text or video. With the rapid growth of social media and internet technology, people express their emotions and feelings using text. Therefore, a multi-modal sarcasm detection task is crucial to understanding people's real feelings and beliefs. However, it is still a challenge to detect sarcasm from multi-modal features. Therefore, this work presents a new hybrid ensemble deep learning approach for multi-modal sarcasm detection. The major goal of this research is to determine the different classes of sarcasm using a multi-modal dataset. Here, imaging modality-based sarcasm detection is performed using Deep Residual Net, and the visual features are extracted. For the generation of text modality, the text data are pre-processed with punctuation removal, and the textual features are extracted using Term Frequency-Inverse Average Document Frequency. The extracted features are used as input for the bidirectional long short-term memory model. The audio (acoustic) elements are extracted to form acoustic modality, which is subsequently sent to the visual geometry group. Furthermore, the weighted fusion modality process is used to combine all of the collected features. The softmax layer acts as the classification layer for performing multi-modal sarcasm detection. Here, the Tent chaotic snack optimization algorithm is employed to tune the hyperparameter and reduce the complexity of the proposed Hybrid EnsembleNet. PYTHON tool is used to evaluate the performance of the proposed classifier. The proposed hybrid EnsembleNet is trained using two datasets: Memotion 7k and MUStARD.

引用

页码：403 / 425

页数：23

共 28 条

[1] Mel Frequency Cepstral Coefficient and its Applications: A Review [J].

Abdul, Zrar Kh. ;

Al-Talabani, Abdulbasit K. K. .

IEEE ACCESS, 2022, 10 :122136-122158

[2]

Babanejad Nastaran, 2020, P 28 INT C COMP LING, P225

[3]

Bedi Manjot, 2021, IEEE Transactions on Affective Computing

[4] Saying What You Don't Mean: A Cross-Cultural Study of Perceptions of Sarcasm [J].

Blasko, Dawn G. ;

Kazmerski, Victoria A. ;

Dawood, Shariffah Sheik .

CANADIAN JOURNAL OF EXPERIMENTAL PSYCHOLOGY-REVUE CANADIENNE DE PSYCHOLOGIE EXPERIMENTALE, 2021, 75 (02) :114-119

[5]

Cai YT, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2506

[6]

Castro Santiago, 2019, arXiv

[7]

Chatterjee N, 2020, ALGO INTELL SY, P237, DOI 10.1007/978-981-15-1216-2_9

[8]

Chauhan DS, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P4351

[9]

Felbo Bjarke, 2017, ARXIV

[10]

Firdaus M., 2020, P 28 INT C COMP LING, P4441

← 1 2 3 →