Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

被引:24
作者
Hu, Di [1 ]
Li, Xuhong [1 ,2 ]
Mou, Lichao [2 ,3 ]
Jin, Pu
Chen, Dong [4 ]
Jing, Liping
Zhu, Xiaoxiang [2 ,3 ]
Dou, Dejing [1 ]
机构
[1] Baidu Res, Big Data Lab, Beijing, Peoples R China
[2] Tech Univ Munich, Munich, Germany
[3] German Aerosp Ctr, Cologne, Germany
[4] Beijing Jiaotong Univ, Beijing Key Lab Traff Data Anal & Min, Beijing, Peoples R China
来源
COMPUTER VISION - ECCV 2020, PT XXIV | 2020年 / 12369卷
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Cross-task transfer; Aerial scene classification; Geotagged sound; Multimodal learning; Remote sensing; CLASSIFICATION;
D O I
10.1007/978-3-030-58586-0_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Aerial scene recognition is a fundamental task in remote sensing and has recently received increased interest. While the visual information from overhead images with powerful models and efficient algorithms yields considerable performance on scene recognition, it still suffers from the variation of ground objects, lighting conditions etc. Inspired by the multi-channel perception theory in cognition science, in this paper, for improving the performance on the aerial scene recognition, we explore a novel audiovisual aerial scene recognition task using both images and sounds as input. Based on an observation that some specific sound events are more likely to be heard at a given geographic location, we propose to exploit the knowledge from the sound events to improve the performance on the aerial scene recognition. For this purpose, we have constructed a new dataset named AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE). With the help of this dataset, we evaluate three proposed approaches for transferring the sound event knowledge to the aerial scene recognition task in a multimodal learning framework, and show the benefit of exploiting the audio information for the aerial scene recognition. The source code is publicly available for reproducibility purposes. (https://github.com/DTaoo/Multimodal-Aerial- Scene-Recognition)
引用
收藏
页码:68 / 84
页数:17
相关论文
共 37 条
[1]  
[Anonymous], 2015, Distilling the knowledge in a neural network
[2]  
Aytar Y, 2016, ADV NEUR IN, V29
[3]   Multimodal Machine Learning: A Survey and Taxonomy [J].
Baltrusaitis, Tadas ;
Ahuja, Chaitanya ;
Morency, Louis-Philippe .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443
[4]  
Castelluccio M, 2015, Arxiv, DOI arXiv:1508.00092
[5]   When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs [J].
Cheng, Gong ;
Yang, Ceyuan ;
Yao, Xiwen ;
Guo, Lei ;
Han, Junwei .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (05) :2811-2821
[6]   Facial Attributes Classification using Multi-Task Representation Learning [J].
Ehrlich, Max ;
Shields, Timothy J. ;
Almaev, Timur ;
Amer, Mohamed R. .
PROCEEDINGS OF 29TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, (CVPRW 2016), 2016, :752-760
[7]   Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].
Eigen, David ;
Fergus, Rob .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658
[8]   Self-supervised Moving Vehicle Tracking with Stereo Sound [J].
Gan, Chuang ;
Zhao, Hang ;
Chen, Peihao ;
Cox, David ;
Torralba, Antonio .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7052-7061
[9]  
Gemmeke JF, 2017, INT CONF ACOUST SPEE, P776, DOI 10.1109/ICASSP.2017.7952261
[10]  
Hu D, 2020, Arxiv, DOI arXiv:2001.09414