Multimodal Deep Learning-based Feature Fusion for Object Detection in Remote Sensing Images

被引:0
作者
Yin, Shoulin [1 ]
Wang, Qunming [2 ]
Wang, Liguo [3 ]
Ivanovic, Mirjana [4 ]
Li, Hang [5 ]
机构
[1] Harbin Engn Univ, Coll Informat & Commun Engn, Harbin 150001, Peoples R China
[2] Tongji Univ, Coll Surveying & Geoinformat, Shanghai, Peoples R China
[3] Dalian Minzu Univ, Coll Informat & Commun Engn, Dalian 116600, Peoples R China
[4] Univ Novi Sad, Fac Sci, Novi Sad, Serbia
[5] Shenyang Normal Univ, Software Coll, Shenyang 110034, Peoples R China
关键词
Object detection; remote sensing image; multimodal deep learning; feature fusion; TRACKING; VIDEO;
D O I
10.2298/CSIS241110011Y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Object detection is an important computer vision task, which is developed from image classification task. The difference is that it is no longer only to classify a single type of object in an image, but to complete the classification and positioning of multiple objects that may exist in an image at the same time. Classification refers to assigning category labels to the object, and positioning refers to determining the vertex coordinates of the peripheral rectangular box of the object. Therefore, object detection is more challenging and has broader application prospects, such as automatic driving, face recognition, pedestrian detection, medical detection etc,. Object detection can also be used as the research basis for more complex computer vision task such as image segmentation, image description, object tracking and action recognition. In traditional object detection, the feature utilization rate is low and it is easy to be affected by other environmental factors. Hence, this paper proposes a multimodal deep learning-based feature fusion for object detection in remote sensing images. In the new model, cascade RCNN is the backbone network. Parallel cascade RCNN network is utilized for feature fusion to enhance feature expression ability. In order to solve the problem of different segmentation shapes and sizes, the central part of the network adopts multi-coefficient cascaded hollow convolution to obtain multi-receptive field features without using pooling mode and preserving image information. Meanwhile, an improved self- attention combined receptive field strategy is used to obtain both low-level features with marginal details and high-level features with global semantics. Finally, we conduct experiments on DOTA set including ablation experiments and comparison experiments. The experimental results show that the mean Average Precision (mAP) and other indexes have been greatly improved, and its performance is better than the state-of-the-art detection algorithms. It has a good application prospect in the remote sensing image object detection task.
引用
收藏
页码:327 / 344
页数:18
相关论文
共 32 条
[1]   Cascade R-CNN: High Quality Object Detection and Instance Segmentation [J].
Cai, Zhaowei ;
Vasconcelos, Nuno .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (05) :1483-1498
[2]  
Cao D., 2022, Recent Advances in Computer Science and Communications (Formerly: Recent Patents on Computer Science), V15, P1239
[3]   Cross-Scale Feature Fusion for Object Detection in Optical Remote Sensing Images [J].
Cheng, Gong ;
Si, Yongjie ;
Hong, Hailong ;
Yao, Xiwen ;
Guo, Lei .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (03) :431-435
[4]   An Efficient Multiple Object Detection and Tracking Framework for Automatic Counting and Video Surveillance Applications [J].
del-Blanco, Carlos R. ;
Jaureguizar, Fernando ;
Garcia, Narciso .
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2012, 58 (03) :857-862
[5]   Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges [J].
Ding, Jian ;
Xue, Nan ;
Xia, Gui-Song ;
Bai, Xiang ;
Yang, Wen ;
Yang, Michael Ying ;
Belongie, Serge ;
Luo, Jiebo ;
Datcu, Mihai ;
Pelillo, Marcello ;
Zhang, Liangpei .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (11) :7778-7796
[6]   Object detection using YOLO: challenges, architectural successors, datasets and applications [J].
Diwan, Tausif ;
Anirudh, G. ;
Tembhurne, Jitendra, V .
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (06) :9243-9275
[7]   Multiscale Deformable Attention and Multilevel Features Aggregation for Remote Sensing Object Detection [J].
Dong, Xiaohu ;
Qin, Yao ;
Fu, Ruigang ;
Gao, Yinghui ;
Liu, Songlin ;
Ye, Yuanxin ;
Li, Biao .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[8]   Remote Sensing Object Detection Based on Receptive Field Expansion Block [J].
Dong, Xiaohu ;
Fu, Ruigang ;
Gao, Yinghui ;
Qin, Yao ;
Ye, Yuanxin ;
Li, Biao .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[9]   Fast Detection of Multiple Objects in Traffic Scenes With a Common Detection Framework [J].
Hu, Qichang ;
Paisitkriangkrai, Sakrapee ;
Shen, Chunhua ;
van den Hengel, Anton ;
Porikli, Fatih .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2016, 17 (04) :1002-1014
[10]   Semantic segmentation for multiscale target based on object recognition using the improved Faster-RCNN model [J].
Jiang, Du ;
Li, Gongfa ;
Tan, Chong ;
Huang, Li ;
Sun, Ying ;
Kong, Jianyi .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 123 :94-104