AODGCN: Adaptive object detection with attention-guided dynamic graph convolutional network

被引:0
作者
Zhang, Meng [1 ]
Guo, Yina [1 ]
Wang, Haidong [1 ]
Shangguan, Hong [1 ]
机构
[1] Taiyuan Univ Sci & Technol, Sch Elect & Informat Engn, Taiyuan, Shanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Image classification; Graphs; Attention mechanism; Graph convolutional network;
D O I
10.1016/j.cviu.2025.104386
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Various classifiers based on convolutional neural networks have been successfully applied to image classification in object detection. However, object detection is much more sophisticated and most classifiers used in this context exhibit limitations in capturing contextual information, particularly in scenarios with complex backgrounds or occlusions. Additionally, they lack spatial awareness, resulting in the loss of spatial structure and inadequate modeling of object details and context. In this paper, we propose an adaptive object detection approach using an attention-guided dynamic graph convolutional network (AODGCN). AODGCN represents images as graphs, enabling the capture of object properties such as connectivity, proximity, and hierarchical relationships. Attention mechanisms guide the model to focus on informative regions, highlighting relevant features while suppressing background information. This attention-guided approach enhances the model's ability to capture discriminative features. Furthermore, the dynamic graph convolutional network (D-GCN) adjusts the receptive field size and weight coefficients based on object characteristics, enabling adaptive detection of objects with varying sizes. The achieved results demonstrate the effectiveness of AODGCN on the MS-COCO 2017 dataset, with a significant improvement of 1.6% in terms of mean average precision (mAP) compared to state-of-the-art algorithms.
引用
收藏
页数:12
相关论文
共 40 条
[11]  
Jocher G., 2023, Ultralytics YOLOv8
[12]   Transfer learning for medical image classification: a literature review [J].
Kim, Hee E. ;
Cosa-Linan, Alejandro ;
Santhanam, Nandhini ;
Jannesari, Mahboubeh ;
Maros, Mate E. ;
Ganslandt, Thomas .
BMC MEDICAL IMAGING, 2022, 22 (01)
[13]  
Kingma DP., 2014, P 2 INT C LEARN REPR
[14]  
Leng ZQ, 2022, Arxiv, DOI arXiv:2204.12511
[15]  
Li C., 2023, arXiv
[16]   DS-Net plus plus : Dynamic Weight Slicing for Efficient Inference in CNNs and Vision Transformers [J].
Li, Changlin ;
Wang, Guangrun ;
Wang, Bing ;
Liang, Xiaodan ;
Li, Zhihui ;
Chang, Xiaojun .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) :4430-4446
[17]  
Li Xiang, 2020, Advances in Neural Information Processing Systems, V33
[18]  
Li Y., 2022, 2022 IEEE 2 INT C PO, P994, DOI 10.1109/ICPECA53709.2022.9718847
[19]   A multi-scale semantic attention representation for multi-label image recognition with graph networks [J].
Liang, Jun ;
Xu, Feiteng ;
Yu, Songsen .
NEUROCOMPUTING, 2022, 491 :14-23
[20]   Focal Loss for Dense Object Detection [J].
Lin, Tsung-Yi ;
Goyal, Priya ;
Girshick, Ross ;
He, Kaiming ;
Dollar, Piotr .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2999-3007