Joint Spatial and Temporal Feature Enhancement Network for Disturbed Object Detection

被引:0
|
作者
Zhang, Fan [1 ,2 ]
Ji, Hongbing [1 ,2 ]
Zhang, Yongquan [1 ,2 ]
Zhu, Zhigang [1 ,2 ]
机构
[1] XIDIAN UNIV, Xian Key Lab Intelligent Spectrum Sensing & Inform, Xian 710071, Peoples R China
[2] XIDIAN UNIV, Shaanxi Union Res Ctr Univ & Enterprise Intelligen, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Object detection; Semantics; Aggregates; Detectors; Proposals; Correlation; Video object detection; local-global context; deformable temporal sampling; temporal attention;
D O I
10.1109/TCSVT.2024.3432900
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Video object detection remains a challenging task due to appearance degradation in certain frames. Existing studies usually aggregate temporal information from multiple frames to enhance the object's appearance representation. Although significant detection performance has been achieved, there are still two shortcomings: (1) The spatial context information within each frame is not fully exploited, which can provide additional decision support when objects are corrupted; (2) In the feature alignment phase, traditional methods tend to employ one-to-one or one-to-global temporal alignment strategies, overlooking the local temporal correlation of objects. To address the above issues, we propose a Joint Spatial and Temporal Feature Enhancement Network (JSTFE-Net) for video object detection, which can jointly utilize spatial-temporal information. First, we present a novel local-global context enhancement module to effectively encode intra-frame spatial context information. This module can enhance the learning of both local details and global semantic information of objects, thereby facilitating accurate object perception within the spatial domain. Second, we develop a deformable temporal sampling module, which adaptively samples correlated temporal information according to the motion information between frames. In addition, to improve the aggregation of temporal-correlated sampled features from multiple frames, we devise an attention-based temporal aggregation block, which dynamically fuses these feature points based on their temporal similarity with the corresponding object feature point. Note that our JSTFE-Net can be effortlessly plugged into image object detectors and state-of-the-art video object detectors. Extensive experiments on the ImageNet VID dataset show that the proposed JSTFE-Net can consistently and significantly improve performance, demonstrating its effectiveness in video object detection.
引用
收藏
页码:12258 / 12273
页数:16
相关论文
共 50 条
  • [31] Enhanced Spatial Feature Learning for Weakly Supervised Object Detection
    Wu, Zhihao
    Wen, Jie
    Xu, Yong
    Yang, Jian
    Li, Xuelong
    Zhang, David
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (01) : 961 - 972
  • [32] A Lightweight Spatial and Temporal Multi-Feature Fusion Network for Defect Detection
    Hu, Bozhen
    Gao, Bin
    Woo, Wai Lok
    Ruan, Lingfeng
    Jin, Jikun
    Yang, Yang
    Yu, Yongjie
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 472 - 486
  • [33] DiffusionVID: Denoising Object Boxes With Spatio–Temporal Conditioning for Video Object Detection
    Roh, Si-Dong
    Chung, Ki-Seok
    IEEE ACCESS, 2023, 11 : 121434 - 121444
  • [34] Bi-Branch Multiscale Feature Joint Network for ORSI Salient Object Detection in Adverse Weather Conditions
    Yuan, Jianjun
    Zou, Xu
    Xia, Haobo
    Liu, Tong
    Wu, Fujun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [35] Radar Maritime Target Detection via Spatial-Temporal Feature Attention Graph Convolutional Network
    Su, Ningyuan
    Chen, Xiaolong
    Guan, Jian
    Huang, Yong
    Wang, Xinghai
    Xue, Yonghua
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [36] Dual SIE-FPN: Semantic and Spatial Information Enhancement for Multiscale Object Detection
    Liu, Mingjie
    Chen, Junhu
    Liu, Ping
    Chen, Junsheng
    Chang, Kyunghi
    Piao, Changhao
    Li, Minglu
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (12) : 14164 - 14173
  • [37] Lightweight Feature Enhancement Network for Single-Shot Object Detection
    Jia, Peng
    Liu, Fuxiang
    SENSORS, 2021, 21 (04) : 1 - 15
  • [38] SAFNet: A Semi-Anchor-Free Network With Enhanced Feature Pyramid for Object Detection
    Jin, Zhenchao
    Liu, Bin
    Chu, Qi
    Yu, Nenghai
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 9445 - 9457
  • [39] AEFFNet: Attention Enhanced Feature Fusion Network for Small Object Detection in UAV Imagery
    Nian, Zhaoyu
    Yang, Wenzhu
    Chen, Hao
    IEEE ACCESS, 2025, 13 : 26494 - 26505
  • [40] Foreground Capture Feature Pyramid Network-Oriented Object Detection in Complex Backgrounds
    Han, Honggui
    Zhang, Qiyu
    Li, Fangyu
    Du, Yongping
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 15