Joint Spatial and Temporal Feature Enhancement Network for Disturbed Object Detection

被引:0
|
作者
Zhang, Fan [1 ,2 ]
Ji, Hongbing [1 ,2 ]
Zhang, Yongquan [1 ,2 ]
Zhu, Zhigang [1 ,2 ]
机构
[1] XIDIAN UNIV, Xian Key Lab Intelligent Spectrum Sensing & Inform, Xian 710071, Peoples R China
[2] XIDIAN UNIV, Shaanxi Union Res Ctr Univ & Enterprise Intelligen, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Object detection; Semantics; Aggregates; Detectors; Proposals; Correlation; Video object detection; local-global context; deformable temporal sampling; temporal attention;
D O I
10.1109/TCSVT.2024.3432900
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Video object detection remains a challenging task due to appearance degradation in certain frames. Existing studies usually aggregate temporal information from multiple frames to enhance the object's appearance representation. Although significant detection performance has been achieved, there are still two shortcomings: (1) The spatial context information within each frame is not fully exploited, which can provide additional decision support when objects are corrupted; (2) In the feature alignment phase, traditional methods tend to employ one-to-one or one-to-global temporal alignment strategies, overlooking the local temporal correlation of objects. To address the above issues, we propose a Joint Spatial and Temporal Feature Enhancement Network (JSTFE-Net) for video object detection, which can jointly utilize spatial-temporal information. First, we present a novel local-global context enhancement module to effectively encode intra-frame spatial context information. This module can enhance the learning of both local details and global semantic information of objects, thereby facilitating accurate object perception within the spatial domain. Second, we develop a deformable temporal sampling module, which adaptively samples correlated temporal information according to the motion information between frames. In addition, to improve the aggregation of temporal-correlated sampled features from multiple frames, we devise an attention-based temporal aggregation block, which dynamically fuses these feature points based on their temporal similarity with the corresponding object feature point. Note that our JSTFE-Net can be effortlessly plugged into image object detectors and state-of-the-art video object detectors. Extensive experiments on the ImageNet VID dataset show that the proposed JSTFE-Net can consistently and significantly improve performance, demonstrating its effectiveness in video object detection.
引用
收藏
页码:12258 / 12273
页数:16
相关论文
共 50 条
  • [41] FEDet: Feature Enhancement Object Detection with Panoramic Images
    Chang, Qingling
    Zhang, Taijie
    Liu, Wenhao
    Cui, Yan
    2024 IEEE 4TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND ARTIFICIAL INTELLIGENCE, SEAI 2024, 2024, : 92 - 98
  • [42] Joint Image and Feature Enhancement for Object Detection under Adverse Weather Conditions
    Yin, Mengyu
    Ling, Mingyang
    Chang, Kan
    Yuan, Zijian
    Qin, Qingpao
    Chen, Boning
    2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,
  • [43] Feature Enhancement SSD for Object Detection
    Tan H.
    Li S.
    Liu B.
    Liu X.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2019, 31 (04): : 573 - 579
  • [44] Foreground Feature Enhancement for Object Detection
    Jiang, Shenwang
    Xu, Tingfa
    Li, Jianan
    Shen, Ziyi
    Guo, Jie
    IEEE ACCESS, 2019, 7 : 49223 - 49231
  • [45] Feature Transform Correlation Network for Object Detection
    Wan, Shouhong
    Li, Xiaoting
    Jin, Peiquan
    Xie, Jia
    2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 1312 - 1319
  • [46] Temporal-Spatial Feature Interaction Network for Multi-Drone Multi-Object Tracking
    Wu, Han
    Sun, Hao
    Ji, Kefeng
    Kuang, Gangyao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1165 - 1179
  • [47] Object Detection Method Based on Shallow Feature Fusion and Semantic Information Enhancement
    Luo, Huilan
    Wang, Pei
    Chen, Hongkun
    Xu, Min
    IEEE SENSORS JOURNAL, 2021, 21 (19) : 21839 - 21851
  • [48] Multipatch Feature Pyramid Network for Weakly Supervised Object Detection in Optical Remote Sensing Images
    Shamsolmoali, Pourya
    Chanussot, Jocelyn
    Zareapoor, Masoumeh
    Zhou, Huiyu
    Yang, Jie
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [49] Structure-Guided Feature Transform Hybrid Residual Network for Remote Sensing Object Detection
    Li, Jiaojiao
    Zhang, Huanqing
    Song, Rui
    Xie, Weiying
    Li, Yunsong
    Du, Qian
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [50] Rotation Equivariant Feature Image Pyramid Network for Object Detection in Optical Remote Sensing Imagery
    Shamsolmoali, Pourya
    Zareapoor, Masoumeh
    Chanussot, Jocelyn
    Zhou, Huiyu
    Yang, Jie
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60