Enhancing Representation Learning With Spatial Transformation and Early Convolution for Reinforcement Learning-Based Small Object Detection

被引：5

作者：

Fang, Fen ^{[1
]}

Liang, Wenyu ^{[1
]}

Cheng, Yi ^{[1
]}

Xu, Qianli ^{[1
]}

Lim, Joo-Hwee ^{[1
,2
]}

机构：

[1] ASTAR, Inst Infocomm Res, Singapore 138632, Singapore

[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 01期

关键词：

Detectors; Training; Object detection; Feature extraction; Costs; Transformers; Prediction algorithms; Small object detection; reinforcement learning; coarse-to-fine framework;

D O I：

10.1109/TCSVT.2023.3284453

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Although object detection has achieved significant progress in the past decade, detecting small objects is still far from satisfactory due to the high variability of object scales and complex backgrounds. The common way to enhance small object detection is to use high-resolution (HR) images. However, this method incurs huge computational resources which grow squarely with the resolution of images. To achieve both accuracy and efficiency, we propose a novel reinforcement learning framework that employs an efficient policy network consisting of a Spatial Transformation Network to enhance the state representation learning and a Transformer model with early convolution to improve feature extraction. Our method has two main steps: (1) coarse location query (CLQ), where an RL agent is trained to predict the locations of small objects on low-resolution (LR) (down-sampled version of HR) images; (2) context-sensitive object detection where HR image patches are used to detect objects on the selected coarse locations and LR image patches on background areas (containing no small objects). In this way, we can obtain high detection performance on small objects while avoiding unnecessary computation on background areas. The proposed method has been tested and benchmarked on various datasets. On the Caltech Pedestrians Detection and Web Pedestrians datasets, the proposed method improves the detection accuracy by 2%, while reducing the number of processed pixels. On the Vision meets Drone object detection dataset and the Oil and Gas Storage Tank dataset, the proposed method outperforms the state-of-the-art (SotA) methods. On MS COCO mini-val set, our method outperforms SotA methods on small object detection, while also achieving comparable performance on medium and large objects.

引用

页码：315 / 328

页数：14

共 50 条

[21] Generalized Focal Loss: Towards Efficient Representation Learning for Dense Object Detection
Li, Xiang
Lv, Chengqi
Wang, Wenhai
Li, Gang
Yang, Lingfeng
Yang, Jian
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3139 - 3153
[22] Reinforcement Learning-Based Formation Pinning and Shape Transformation for Swarms
Dong, Zhaoqi
Wu, Qizhen
Chen, Lei
DRONES, 2023, 7 (11)
[23] Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection
Kim, Yongsik
Hong, Su-Youn
Park, Sungjin
Kim, Huy Kang
IEEE ACCESS, 2025, 13 : 15346 - 15362
[24] An efficient reinforcement learning-based Botnet detection approach
Alauthman, Mohammad
Aslam, Nauman
Al-kasassbeh, Mouhammd
Khan, Suleman
Al-Qerem, Ahmad
Choo, Kim-Kwang Raymond
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2020, 150 (150)
[25] GLCONet: Learning Multisource Perception Representation for Camouflaged Object Detection
Sun, Yanguang
Xuan, Hanyu
Yang, Jian
Luo, Lei
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[26] Enhancing representation learning by exploiting effective receptive fields for object detection
Wang, Qijin
Zhang, Shengyu
Qian, Yu
Zhang, Guangcai
Wang, Hongqiang
NEUROCOMPUTING, 2022, 481 : 22 - 32
[27] A Reinforcement Learning-Based Adaptive Learning System
Shawky, Doaa
Badawi, Ashraf
INTERNATIONAL CONFERENCE ON ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS (AMLTA2018), 2018, 723 : 221 - 231
[28] Rumor Containment in Hypergraph Representation of Social Networks: A Deep Reinforcement Learning-Based Solution
Kundu, Gouri
Ghosh, Smita
Choudhury, Sankhayan
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024,
[29] A Recurrent Reinforcement Learning Approach for Small Object Detection with Dynamic Refinement
Li, Yue
Han, Xuechun
Ge, Litong
Li, Fanghao
Chai, Yimeng
Zhou, Xianchun
Wang, Wei
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[30] Reinforcement Learning-Based Dual-Identity Double Auction in Personalized Federated Learning
Li, Juan
Chen, Zishang
Zang, Tianzi
Liu, Tong
Wu, Jie
Zhu, Yanmin
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (05) : 4086 - 4103

← 1 2 3 4 5 →