Joint Spatial and Temporal Feature Enhancement Network for Disturbed Object Detection

被引:0
|
作者
Zhang, Fan [1 ,2 ]
Ji, Hongbing [1 ,2 ]
Zhang, Yongquan [1 ,2 ]
Zhu, Zhigang [1 ,2 ]
机构
[1] XIDIAN UNIV, Xian Key Lab Intelligent Spectrum Sensing & Inform, Xian 710071, Peoples R China
[2] XIDIAN UNIV, Shaanxi Union Res Ctr Univ & Enterprise Intelligen, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Object detection; Semantics; Aggregates; Detectors; Proposals; Correlation; Video object detection; local-global context; deformable temporal sampling; temporal attention;
D O I
10.1109/TCSVT.2024.3432900
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Video object detection remains a challenging task due to appearance degradation in certain frames. Existing studies usually aggregate temporal information from multiple frames to enhance the object's appearance representation. Although significant detection performance has been achieved, there are still two shortcomings: (1) The spatial context information within each frame is not fully exploited, which can provide additional decision support when objects are corrupted; (2) In the feature alignment phase, traditional methods tend to employ one-to-one or one-to-global temporal alignment strategies, overlooking the local temporal correlation of objects. To address the above issues, we propose a Joint Spatial and Temporal Feature Enhancement Network (JSTFE-Net) for video object detection, which can jointly utilize spatial-temporal information. First, we present a novel local-global context enhancement module to effectively encode intra-frame spatial context information. This module can enhance the learning of both local details and global semantic information of objects, thereby facilitating accurate object perception within the spatial domain. Second, we develop a deformable temporal sampling module, which adaptively samples correlated temporal information according to the motion information between frames. In addition, to improve the aggregation of temporal-correlated sampled features from multiple frames, we devise an attention-based temporal aggregation block, which dynamically fuses these feature points based on their temporal similarity with the corresponding object feature point. Note that our JSTFE-Net can be effortlessly plugged into image object detectors and state-of-the-art video object detectors. Extensive experiments on the ImageNet VID dataset show that the proposed JSTFE-Net can consistently and significantly improve performance, demonstrating its effectiveness in video object detection.
引用
收藏
页码:12258 / 12273
页数:16
相关论文
共 50 条
  • [21] Feature enhancement modules applied to a feature pyramid network for object detection
    Liu, Min
    Lin, Kun
    Huo, Wujie
    Hu, Lanlan
    He, Zhizi
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (02) : 617 - 629
  • [22] Temporal Speciation Network for Few-Shot Object Detection
    Zhao, Xiaowei
    Liu, Xianglong
    Ma, Yuqing
    Bai, Shihao
    Shen, Yifan
    Hao, Zeyu
    Liu, Aishan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8267 - 8278
  • [23] Fine-Grained Feature Enhancement for Object Detection in Remote Sensing Images
    Zhou, Yong
    Wang, Sifan
    Zhao, Jiaqi
    Zhu, Hancheng
    Yao, Rui
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [24] Representative Feature Alignment for Adaptive Object Detection
    Xu, Shan
    Zhang, Huaidong
    Xu, Xuemiao
    Hu, Xiaowei
    Xu, Yangyang
    Dai, Liangui
    Choi, Kup-Sze
    Heng, Pheng-Ann
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (02) : 689 - 700
  • [25] An Efficient Feature Pyramid Network for Object Detection in Remote Sensing Imagery
    Fang Qingyun
    Zhang Lin
    Wang Zhaokui
    IEEE ACCESS, 2020, 8 : 93058 - 93068
  • [26] Small Object Detection Network Based on Feature Information Enhancement
    Luo, Huilan
    Wang, Pei
    Chen, Hongkun
    Kowelo, Vladimir Peter
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [27] Enhancement-fusion feature pyramid network for object detection
    Dong, Shifeng
    Wang, Rujing
    Du, Jianming
    Jiao, Lin
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (01)
  • [28] Hierarchical Feature Fusion Network for Salient Object Detection
    Li, Xuelong
    Song, Dawei
    Dong, Yongsheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 9165 - 9175
  • [29] SalienDet: A Saliency-Based Feature Enhancement Algorithm for Object Detection for Autonomous Driving
    Ding, Ning
    Zhang, Ce
    Eskandarian, Azim
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 2624 - 2635
  • [30] Self-supervised spatial-temporal feature enhancement for one-shot video object detection
    Yao, Xudong
    Yang, Xiaoshan
    NEUROCOMPUTING, 2024, 601