Differential Feature Awareness Network Within Antagonistic Learning for Infrared-Visible Object Detection

被引:70
作者
Zhang, Ruiheng [1 ,2 ]
Li, Lu [1 ,2 ]
Zhang, Qi [3 ]
Zhang, Jin [1 ,2 ]
Xu, Lixin [1 ,2 ]
Zhang, Baomin [4 ]
Wang, Binglu [5 ]
机构
[1] Beijing Inst Technol, Sch Mechatron Engn, State Key Lab Electromech Dynam Control, Beijing 100081, Peoples R China
[2] Beijing Inst Technol, Sch Mechatron Engn, Beijing 100081, Peoples R China
[3] Tech Univ Munich, Sch Engn & Design, D-80333 Munich, Germany
[4] Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China
[5] Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Feature extraction; Object detection; Videos; Data mining; Semantics; Pedestrians; Image color analysis; Infrared-visible object detection; multi-modal feature fusion; FUSION NETWORK; MULTISCALE; IMAGES;
D O I
10.1109/TCSVT.2023.3289142
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The combination of infrared and visible videos aims to gather more comprehensive feature information from multiple sources and reach superior results on various practical tasks, such as detection and segmentation, over that of a single modality. However, most existing dual-modality object detection algorithms ignore the modal differences and fail to consider the correlation between feature extraction and fusion, which leads to incomplete extraction and inadequate fusion of dual-modality features. Hence, there raises an issue of how to preserve each unique modal feature and fully utilize the complementary infrared and visible information. Facing the above challenges, we propose a novel Differential Feature Awareness Network (DFANet) within antagonistic learning for infrared and visible object detection. The proposed model consists of an Antagonistic Feature Extraction with Divergence (AFED) module used to extract the differential infrared and visible features with unique information, and an Attention-based Differential Feature Fusion (ADFF) module used to fully fuse the extracted differential features. We conduct performance comparisons with existing state-of-the-art models on two benchmark datasets to represent the robustness and superiority of DFANet, and numerous ablation experiments to illustrate its effectiveness.
引用
收藏
页码:6735 / 6748
页数:14
相关论文
共 66 条
[1]  
Bañuls A, 2020, IEEE INT SYMP SAFE, P380, DOI [10.1109/SSRR50563.2020.9292593, 10.1109/ssrr50563.2020.9292593]
[2]  
Bochkovskiy A, 2020, Arxiv, DOI arXiv:2004.10934
[3]   TARDB-Net: triple-attention guided residual dense and BiLSTM networks for hyperspectral image classification [J].
Cai, Weiwei ;
Liu, Botao ;
Wei, Zhanguo ;
Li, Meilin ;
Kan, Jiangming .
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (07) :11291-11312
[4]   TWO-PHASE FEATURE FUSION NETWORK FOR VISIBLE-INFRARED PERSON RE-IDENTIFICATION [J].
Cheng, Yunzhou ;
Xiao, Guoqiang ;
Tang, Xiaoqin ;
Ma, Wenzhuo ;
Gou, Xinye .
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, :1149-1153
[5]   TF-Blender: Temporal Feature Blender for Video Object Detection [J].
Cui, Yiming ;
Yan, Liqi ;
Cao, Zhiwen ;
Liu, Dongfang .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :8118-8127
[6]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[7]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[8]   U2D2Net: Unsupervised Unified Image Dehazing and Denoising Network for Single Hazy Image Enhancement [J].
Ding, Bosheng ;
Zhang, Ruiheng ;
Xu, Lixin ;
Liu, Guanyu ;
Yang, Shuo ;
Liu, Yumeng ;
Zhang, Qi .
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 :202-217
[9]  
Felzenszwalb P, 2008, PROC CVPR IEEE, P1984
[10]   DSAGAN: A generative adversarial network based on dual-stream attention mechanism for anatomical and functional image fusion [J].
Fu, Jun ;
Li, Weisheng ;
Du, Jiao ;
Xu, Liming .
INFORMATION SCIENCES, 2021, 576 :484-506