Multi-modality helps in crisis management: An attention-based deep learning approach of leveraging text for image classification

被引:11
|
作者
Ahmad, Zishan [1 ]
Jindal, Raghav [1 ]
Mukuntha, N. S. [1 ]
Ekbal, Asif [1 ]
Bhattachharyya, Pushpak [1 ]
机构
[1] Indian Inst Technol Patna, Dept Comp Sci & Technol, Bihta 801106, Bihar, India
关键词
Multi-modal classification; Deep learning; Attention; Disaster Domain; MICROBLOGS;
D O I
10.1016/j.eswa.2022.116626
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Leveraging multi-modal information sources has attracted the attention of researchers and practitioners for developing resources and technologies in the broad areas of applied Artificial Intelligence (AI). During the occurrence of a natural disasters, people heavily use social media for communication by posting multimedia information in the form of texts and images. In such a critical situations, it becomes imperative to use all modalities of information sources to better capture vital knowledge related to the crisis. In this paper, we propose an effective deep learning model to leverage multi-modal information sources in the form of both texts and images, and then disseminate useful information at the time of natural disasters. Our proposed model classifies the tweets into seven critical and potentially actionable categories, such as the reports of 'injured or dead people', 'infrastructure damage' etc. Experiments on a benchmark dataset show that fusion of multi-modal information sources, viz. text and image both, are more effective compared to uni-modal (i.e. either text or image) source in extracting meaningful information generated during disaster situations. By using information from both the modalities (text and image), we obtain a macro F1-Score of 0.51, which is a significant improvement over the baseline models that make use of only text or image for the classification. We supplement our results with a thorough analysis exploring the reasons for this phenomenon, thus, further demonstrating the utility of exploiting multiple modalities. The primary contribution of this paper lies in developing an attentive deep learning model that uses social media text and image to help classify images into crucial classes for disaster domain. The major findings of our research are that using textual features while classifying the corresponding images, improves the classification performance. We also explore different methods of fusion of multi-modal features and conclude that fusion through attention mechanism works best for image classification in disaster domain.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Attention-based Interactions Network for Breast Tumor Classification with Multi-modality Images
    Yang, Xiao
    Xi, Xiaoming
    Xu, Chuanzhen
    Sun, Liangyun
    Meng, Lingzhao
    Nie, Xiushan
    2022 15TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION (HSI), 2022,
  • [2] Attention-Based Deep Learning Approach for Breast Cancer Histopathological Image Multi-Classification
    Aldakhil, Lama A.
    Alhasson, Haifa F.
    Alharbi, Shuaa S.
    DIAGNOSTICS, 2024, 14 (13)
  • [3] Code-free deep learning for multi-modality medical image classification
    Edward Korot
    Zeyu Guan
    Daniel Ferraz
    Siegfried K. Wagner
    Gongyu Zhang
    Xiaoxuan Liu
    Livia Faes
    Nikolas Pontikos
    Samuel G. Finlayson
    Hagar Khalid
    Gabriella Moraes
    Konstantinos Balaskas
    Alastair K. Denniston
    Pearse A. Keane
    Nature Machine Intelligence, 2021, 3 : 288 - 298
  • [4] Code-free deep learning for multi-modality medical image classification
    Korot, Edward
    Guan, Zeyu
    Ferraz, Daniel
    Wagner, Siegfried K.
    Zhang, Gongyu
    Liu, Xiaoxuan
    Faes, Livia
    Pontikos, Nikolas
    Finlayson, Samuel G.
    Khalid, Hagar
    Moraes, Gabriella
    Balaskas, Konstantinos
    Denniston, Alastair K.
    Keane, Pearse A.
    NATURE MACHINE INTELLIGENCE, 2021, 3 (04) : 288 - +
  • [5] Deep Attention-Based Imbalanced Image Classification
    Wang, Lituan
    Zhang, Lei
    Qi, Xiaofeng
    Yi, Zhang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (08) : 3320 - 3330
  • [6] Gait Activity Classification Using Multi-Modality Sensor Fusion: A Deep Learning Approach
    Yunas, Syed U.
    Ozanyan, Krikor B.
    IEEE SENSORS JOURNAL, 2021, 21 (15) : 16870 - 16879
  • [7] Leveraging attention-based visual clue extraction for image classification
    Cui, Yunbo
    Du, Youtian
    Wang, Xue
    Wang, Hang
    Su, Chang
    IET IMAGE PROCESSING, 2021, 15 (12) : 2937 - 2947
  • [8] Learning based Multi-modality Image and Video Compression
    Lu, Guo
    Zhong, Tianxiong
    Geng, Jing
    Hu, Qiang
    Xu, Dong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 6073 - 6082
  • [9] Framework for Deep Learning Based Multi-Modality Image Registration of Snapshot and Pathology Images
    Schoop, Ryan A. L.
    de Roode, Lotte M.
    de Boer, Lisanne L.
    Dashtbozorg, Behdad
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (11) : 6699 - 6711
  • [10] Automated Medical Diagnosis System Based on Multi-modality Image Fusion and Deep Learning
    Algarni, Abeer D.
    WIRELESS PERSONAL COMMUNICATIONS, 2020, 111 (02) : 1033 - 1058