Multi-modality helps in crisis management: An attention-based deep learning approach of leveraging text for image classification

被引：11

作者：

Ahmad, Zishan ^{[1
]}

Jindal, Raghav ^{[1
]}

Mukuntha, N. S. ^{[1
]}

Ekbal, Asif ^{[1
]}

Bhattachharyya, Pushpak ^{[1
]}

机构：

[1] Indian Inst Technol Patna, Dept Comp Sci & Technol, Bihta 801106, Bihar, India

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2022年 / 195卷

关键词：

Multi-modal classification; Deep learning; Attention; Disaster Domain; MICROBLOGS;

D O I：

10.1016/j.eswa.2022.116626

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Leveraging multi-modal information sources has attracted the attention of researchers and practitioners for developing resources and technologies in the broad areas of applied Artificial Intelligence (AI). During the occurrence of a natural disasters, people heavily use social media for communication by posting multimedia information in the form of texts and images. In such a critical situations, it becomes imperative to use all modalities of information sources to better capture vital knowledge related to the crisis. In this paper, we propose an effective deep learning model to leverage multi-modal information sources in the form of both texts and images, and then disseminate useful information at the time of natural disasters. Our proposed model classifies the tweets into seven critical and potentially actionable categories, such as the reports of 'injured or dead people', 'infrastructure damage' etc. Experiments on a benchmark dataset show that fusion of multi-modal information sources, viz. text and image both, are more effective compared to uni-modal (i.e. either text or image) source in extracting meaningful information generated during disaster situations. By using information from both the modalities (text and image), we obtain a macro F1-Score of 0.51, which is a significant improvement over the baseline models that make use of only text or image for the classification. We supplement our results with a thorough analysis exploring the reasons for this phenomenon, thus, further demonstrating the utility of exploiting multiple modalities. The primary contribution of this paper lies in developing an attentive deep learning model that uses social media text and image to help classify images into crucial classes for disaster domain. The major findings of our research are that using textual features while classifying the corresponding images, improves the classification performance. We also explore different methods of fusion of multi-modal features and conclude that fusion through attention mechanism works best for image classification in disaster domain.

引用

页数：11

共 50 条

[1] Attention-based Interactions Network for Breast Tumor Classification with Multi-modality Images
Yang, Xiao
Xi, Xiaoming
Xu, Chuanzhen
Sun, Liangyun
Meng, Lingzhao
Nie, Xiushan
2022 15TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION (HSI), 2022,
[2] Attention-Based Deep Learning Approach for Breast Cancer Histopathological Image Multi-Classification
Aldakhil, Lama A.
Alhasson, Haifa F.
Alharbi, Shuaa S.
DIAGNOSTICS, 2024, 14 (13)
[3] Code-free deep learning for multi-modality medical image classification
Edward Korot
Zeyu Guan
Daniel Ferraz
Siegfried K. Wagner
Gongyu Zhang
Xiaoxuan Liu
Livia Faes
Nikolas Pontikos
Samuel G. Finlayson
Hagar Khalid
Gabriella Moraes
Konstantinos Balaskas
Alastair K. Denniston
Pearse A. Keane
Nature Machine Intelligence, 2021, 3 : 288 - 298
[4] Code-free deep learning for multi-modality medical image classification
Korot, Edward
Guan, Zeyu
Ferraz, Daniel
Wagner, Siegfried K.
Zhang, Gongyu
Liu, Xiaoxuan
Faes, Livia
Pontikos, Nikolas
Finlayson, Samuel G.
Khalid, Hagar
Moraes, Gabriella
Balaskas, Konstantinos
Denniston, Alastair K.
Keane, Pearse A.
NATURE MACHINE INTELLIGENCE, 2021, 3 (04) : 288 - +
[5] Deep Attention-Based Imbalanced Image Classification
Wang, Lituan
Zhang, Lei
Qi, Xiaofeng
Yi, Zhang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (08) : 3320 - 3330
[6] Gait Activity Classification Using Multi-Modality Sensor Fusion: A Deep Learning Approach
Yunas, Syed U.
Ozanyan, Krikor B.
IEEE SENSORS JOURNAL, 2021, 21 (15) : 16870 - 16879
[7] Leveraging attention-based visual clue extraction for image classification
Cui, Yunbo
Du, Youtian
Wang, Xue
Wang, Hang
Su, Chang
IET IMAGE PROCESSING, 2021, 15 (12) : 2937 - 2947
[8] Learning based Multi-modality Image and Video Compression
Lu, Guo
Zhong, Tianxiong
Geng, Jing
Hu, Qiang
Xu, Dong
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 6073 - 6082
[9] Framework for Deep Learning Based Multi-Modality Image Registration of Snapshot and Pathology Images
Schoop, Ryan A. L.
de Roode, Lotte M.
de Boer, Lisanne L.
Dashtbozorg, Behdad
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (11) : 6699 - 6711
[10] Automated Medical Diagnosis System Based on Multi-modality Image Fusion and Deep Learning
Algarni, Abeer D.
WIRELESS PERSONAL COMMUNICATIONS, 2020, 111 (02) : 1033 - 1058

← 1 2 3 4 5 →