MoADNet: Mobile Asymmetric Dual-Stream Networks for Real-Time and Lightweight RGB-D Salient Object Detection

被引：81

作者：

Jin, Xiao ^{[1
]}

Yi, Kang ^{[1
]}

Xu, Jing ^{[1
]}

机构：

[1] Nankai Univ, Coll Artificial Intelligence, Tianjin 300350, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2022年 / 32卷 / 11期

关键词：

RGB-D SOD; lightweight; dual-stream; cross-modality fusion;

D O I：

10.1109/TCSVT.2022.3180274

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

RGB-D Salient Object Detection (RGB-D SOD) aims at detecting remarkable objects by complementary information from RGB images and depth cues. Although many outstanding prior arts have been proposed for RGB-D SOD, most of them focus on performance enhancement, while lacking concern about practical deployment on mobile devices. In this paper, we propose mobile asymmetric dual-stream networks (MoADNet) for real-time and lightweight RGB-D SOD. First, inspired by the intrinsic discrepancy between RGB and depth modalities, we observe that depth maps can be represented by fewer channels than RGB images. Thus, we design asymmetric dual-stream encoders based on MobileNetV3. Second, we develop an inverted bottleneck cross-modality fusion (IBCMF) module to fuse multimodality features, which adopts an inverted bottleneck structure to compensate for the information loss in the lightweight backbones. Third, we present an adaptive atrous spatial pyramid (A2SP) module to speed up the inference, while maintaining the performance by appropriately selecting multiscale features in the decoder. Extensive experiments are conducted to compare our method with 15 state-of-the-art approaches. Our MoADNet obtains competitive results on five benchmark datasets under four evaluation metrics. For efficiency analysis, the proposed method significantly outperforms other baselines by a large margin. The MoADNet only contains 5.03 M parameters and runs 80 FPS when testing a 256 x 256 image on a single NVIDIA 2080Ti GPU.

引用

页码：7632 / 7645

页数：14

共 79 条

[1]

Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596

[2] Salient Object Detection: A Benchmark [J].

Borji, Ali ;

Cheng, Ming-Ming ;

Jiang, Huaizu ;

Li, Jia .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) :5706-5722

[3] Depth-Quality-Aware Salient Object Detection [J].

Chen, Chenglizhao ;

Wei, Jipeng ;

Peng, Chong ;

Qin, Hong .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :2350-2363

[4] Three-Stream Attention-Aware Network for RGB-D Salient Object Detection [J].

Chen, Hao ;

Li, Youfu .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) :2825-2835

[5] CNN-Based RGB-D Salient Object Detection: Learn, Select, and Fuse [J].

Chen, Hao ;

Li, Youfu ;

Deng, Yongjian ;

Lin, Guosheng .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (07) :2076-2096

[6] Progressively Complementarity-aware Fusion Network for RGB-D Salient Object Detection [J].

Chen, Hao ;

Li, Youfu .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3051-3060

[7] Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection [J].

Chen, Hao ;

Li, Youfu ;

Su, Dan .

PATTERN RECOGNITION, 2019, 86 :376-385

[8] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[9]

Chen Q, 2021, AAAI CONF ARTIF INTE, V35, P1063

[10] Structure-Measure: A New Way to Evaluate Foreground Maps [J].

Cheng, Ming-Ming ;

Fan, Deng-Ping .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (09) :2622-2638

← 1 2 3 4 5 6 7 8 →