Multimodal Categorization of Crisis Events in Social Media

被引:56
作者
Abavisani, Mahdi [1 ]
Wu, Liwei [1 ,2 ]
Hu, Shengli [1 ]
Tetreault, Joel [1 ]
Jaimes, Alejandro [1 ]
机构
[1] Dataminr Inc, New York, NY 10016 USA
[2] Univ Calif Davis, Dept Stat, Davis, CA 95616 USA
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年
关键词
D O I
10.1109/CVPR42600.2020.01469
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent developments in image classification and natural language processing, coupled with the rapid growth in social media usage, have enabled fundamental advances in detecting breaking events around the world in real-time. Emergency response is one such area that stands to gain from these advances. By processing billions of texts and images a minute, events can be automatically detected to enable emergency response workers to better assess rapidly evolving situations and deploy resources accordingly. To date, most event detection techniques in this area have focused on image-only or text-only approaches, limiting detection performance and impacting the quality of information delivered to crisis response teams. In this paper, we present a new multimodalfusion method that leverages both images and texts as input. In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities on a sample by sample basis. In addition, we employ a multimodal graph-based approach to stochastically transition between embeddings of different multimodal pairs during training to better regularize the learning process as well as dealing with limited training data by constructing new matched pairs from different samples. We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks.
引用
收藏
页码:14667 / 14677
页数:11
相关论文
共 63 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]   Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition with Multimodal Training [J].
Abavisani, Mahdi ;
Joze, Hamid Reza Vaezi ;
Patel, Vishal M. .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1165-1174
[3]   Deep Multimodal Subspace Clustering Networks [J].
Abavisani, Mahdi ;
Patel, Vishal M. .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2018, 12 (06) :1601-1614
[4]  
Abebe R., 2019, Proc Thirteen Int AAAI Conf Web Social Media, V13, P3, DOI [10.48550/arxiv.1806.0574, DOI 10.48550/ARXIV.1806.0574]
[5]   JORD - A System for Collecting Information and Monitoring Natural Disasters by Linking Social Media with Satellite Imagery [J].
Ahmad, Kashif ;
Riegler, Michael ;
Pogorelov, Konstantin ;
Conci, Nicola ;
Halvorsen, Pal ;
De Natale, Francesco .
PROCEEDINGS OF THE 15TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2017,
[6]  
[Anonymous], 2020, INT C LEARN RE UNPUB
[7]   VQA: Visual Question Answering [J].
Antol, Stanislaw ;
Agrawal, Aishwarya ;
Lu, Jiasen ;
Mitchell, Margaret ;
Batra, Dhruv ;
Zitnick, C. Lawrence ;
Parikh, Devi .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433
[8]   Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures [J].
Bernardi, Raffaella ;
Cakici, Ruket ;
Elliott, Desmond ;
Erdem, Aykut ;
Erdem, Erkut ;
Ikizler-Cinbis, Nazli ;
Keller, Frank ;
Muscat, Adrian ;
Plank, Barbara .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 55 :409-442
[9]  
Blevins Terra, 2016, P COLING 2016 26 INT, P2196
[10]  
Buechel Sven, 2018, ARXIV