Reasonable object detection guided by knowledge of global context and category relationship

被引:5
作者
Ji, Haoqin
Ye, Kai
Wan, Qi
Shen, Linlin [1 ]
机构
[1] Shenzhen Univ, Sch Comp Sci & Software Engn, Comp Vis Inst, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Prior knowledge; Graph Convolutional Network;
D O I
10.1016/j.eswa.2022.118285
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The mainstream object detectors usually treat each region separately, which overlooks the important global context information and the associations between object categories. Existing methods model global context via attention mechanism, which requires ad hoc design and prior knowledge. Some works combine CNN features with label dependencies learned from a pre-defined graph and word embeddings, which ignore the gap between visual features and textual corpus and are usually task-specific (depend on RoIPool/RoIAlign). In order to get rid of the previous specific settings, and enable different types of detectors to refine detection results with the help of prior knowledge, in this paper, we propose KROD (Knowledge-guided Reasonable Object Detection), which consists of the GKM (Global Category Knowledge Mining) module and CRM (Category Relationship Knowledge Mining) module, to improve detection performance by mimicking the processes of human reasoning. For a given image, GKM introduces global category knowledge into the detector by simply attaching a multi-label image classification branch to the backbone. Meanwhile, CRM input the raw detection outputs to the object category co-occurrence based knowledge graph to further refine the original results, with the help of GCN (Graph Convolutional Network). We also propose a novel loss-aware module to distinctively correct the classification probability of different detected boxes. Without bells and whistles, extensive experiments show that the proposed KROD can improve different baseline models (both anchor-based and anchor-free) by a large margin (1.2% similar to 1.8% higher AP) with marginal loss of efficiency on MS COCO.
引用
收藏
页数:11
相关论文
共 61 条
  • [1] [Anonymous], 2019, P IEEE C COMP VIS PA, DOI DOI 10.48550/ARXIV.1810.12681
  • [2] Chen Q., 2021, ARXIV210309460, DOI [10.48550/arXiv.2103.09460, DOI 10.48550/ARXIV.2103.09460]
  • [3] Improving Object Detection with Relation Mining Network
    Chen, Shengjia
    Li, Zhixin
    Huang, Feicheng
    Zhang, Canlong
    Ma, Huifang
    [J]. 20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, : 52 - 61
  • [4] Chen T., 2021, ARXIV210910852, DOI [10.48550/arXiv.2109.10852, DOI 10.48550/ARXIV.2109.10852]
  • [5] Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition
    Chen, Tianshui
    Xu, Muxin
    Hui, Xiaolu
    Wu, Hefeng
    Lin, Liang
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 522 - 531
  • [6] Multi-Label Image Recognition with Graph Convolutional Networks
    Chen, Zhao-Min
    Wei, Xiu-Shen
    Wang, Peng
    Guo, Yanwen
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5172 - 5181
  • [7] ArcFace: Additive Angular Margin Loss for Deep Face Recognition
    Deng, Jiankang
    Guo, Jia
    Xue, Niannan
    Zafeiriou, Stefanos
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4685 - 4694
  • [8] Everingham M., 2010, INT J COMPUT VISION, V88, P303, DOI DOI 10.1007/s11263-009-0275-4
  • [9] He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]
  • [10] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778