Cross-Modal Attentional Context Learning for RGB-D Object Detection

被引:55
|
作者
Li, Guanbin [1 ]
Gan, Yukang [1 ]
Wu, Hejun [1 ]
Xiao, Nong [1 ]
Lin, Liang [1 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
RGB-D object detection; attentional context modeling; cross modal feature; convolutional neural network; RECOGNITION;
D O I
10.1109/TIP.2018.2878956
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing objects from simultaneously sensed photometric (RGB) and depth channels is a fundamental yet practical problem in many machine vision applications, such as robot grasping and autonomous driving. In this paper, we address this problem by developing a cross-modal attentional context (CMAC) learning framework, which enables the full exploitation of the context information from both RGB and depth data. Compared to existing RGB-D object detection frameworks, our approach has several appealing properties. First, it consists of an attention-based global context model for exploiting adaptive contextual information and incorporating this information into a region-based CNN (e.g., fast RCNN) framework to achieve improved object detection performance. Second, our CMAC framework further contains a fine-grained object part attention module to harness multiple discriminative object parts inside each possible object region for superior local feature representation. While greatly improving the accuracy of RGB-D object detection, the effective cross-modal information fusion as well as attentional context modeling in our proposed model provide an interpretable visualization scheme. Experimental results demonstrate that the proposed method significantly improves upon the state of the art on all public benchmarks.
引用
收藏
页码:1591 / 1601
页数:11
相关论文
共 50 条
  • [41] Bidirectional Attentional Interaction Networks for RGB-D salient object detection
    Wei, Weiyi
    Xu, Mengyu
    Wang, Jian
    Luo, Xuzhe
    IMAGE AND VISION COMPUTING, 2023, 138
  • [42] Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection
    Chen, Hao
    Li, Youfu
    Su, Dan
    PATTERN RECOGNITION, 2019, 86 : 376 - 385
  • [43] RGB-D Saliency Detection with 3D Cross-modal Fusion and Mid-level Integration
    Liu, Taoqi
    Li, Bo
    2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 1328 - 1335
  • [44] CCANet: A Collaborative Cross-Modal Attention Network for RGB-D Crowd Counting
    Liu, Yanbo
    Cao, Guo
    Shi, Boshan
    Hu, Yingxiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 154 - 165
  • [45] Cross-Modal Transformer for RGB-D semantic segmentation of production workshop objects
    Ru, Qingjun
    Chen, Guangzhu
    Zuo, Tingyu
    Liao, Xiaojuan
    PATTERN RECOGNITION, 2023, 144
  • [46] ECW-EGNet: Exploring Cross-Modal Weighting and Edge-Guided Decoder Network for RGB-D Salient Object Detection
    Xia, Chenxing
    Yang, Feng
    Duan, Songsong
    Gao, Xiuju
    Ge, Bin
    Li, Kuan-Ching
    Fang, Xianjin
    Zhang, Yan
    Yang, Ke
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2024, 21 (03)
  • [47] MULTI-MODAL TRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
    Song, Peipei
    Zhang, Jing
    Koniusz, Piotr
    Barnes, Nick
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2466 - 2470
  • [48] RGB-D Saliency Detection Based on Attention Mechanism and Multi-Scale Cross-Modal Fusion
    Cui Z.
    Feng Z.
    Wang F.
    Liu Q.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (06): : 893 - 902
  • [49] Three-stream RGB-D salient object detection network based on cross-level and cross-modal dual-attention fusion
    Meng, Lingbing
    Yuan, Mengya
    Shi, Xuehan
    Liu, Qingqing
    Cheng, Fei
    Li, Lingli
    IET IMAGE PROCESSING, 2023, 17 (11) : 3292 - 3308
  • [50] Cross-Level Multi-Modal Features Learning With Transformer for RGB-D Object Recognition
    Zhang, Ying
    Yin, Maoliang
    Wang, Heyong
    Hua, Changchun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7121 - 7130