LFTransNet: Light Field Salient Object Detection via a Learnable Weight Descriptor

被引:7
作者
Liu, Zhengyi [1 ]
He, Qian [1 ]
Wang, Linbo [1 ]
Fang, Xianyong [1 ]
Tang, Bin [2 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, Key Lab Intelligent Comp & Signal Proc, Minist Educ, Hefei 230601, Peoples R China
[2] Hefei Univ, Sch Artificial Intelligence & Big data, Hefei 230601, Peoples R China
关键词
Light field; salient object detection; multimodal; segmentation; transformer decoder; NETWORK;
D O I
10.1109/TCSVT.2023.3281465
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Light Field Salient Object Detection (LF SOD) aims to segment the visually distinctive objects out of surroundings. Since light field images provide a multi-focus stack (many focal slices in different depth levels) and an all-focus image for the same scene, they record comprehensive but redundant information. Existing methods exploit the useful cue by long short-term memory with attention mechanism, 3D convolution, and graph learning. However, the importance of intra-slice and inter-slice in the focal stack is not well investigated. In the paper, we propose a learnable weight descriptor to simultaneously exploit different weights in slice, spatial region, and channel dimensions, and therefore propose an LF SOD method based on the learnable descriptor. The method extracts slice features and all-focus features from a weight-shared backbone and another backbone, respectively. A transformer decoder is used to learn the weight descriptor which both emphasizes the importance of each slice (inter-slice) and discriminates the spatial and channel importance of each slice (intra-slice). The learnt descriptor serves as the weight to make slice features attend to important slices, regions, and channels. Furthermore, we propose the hierarchical multi-modal fusion which aggregates high-layer features by modelling the long-range dependency to fully excavate common salient semantics and combines low-layer features by spatial constraint to eliminate the blurring effect of slice features. The experimental result exceeds the state-of-the-art methods at least 25% in terms of mean absolute error evaluation metric. It demonstrates a significant improvement in LF SOD performance via the designed learnable weight descriptor. https://github.com/liuzywen/LFTransNet
引用
收藏
页码:7764 / 7773
页数:10
相关论文
共 65 条
  • [1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
  • [2] Interactive digital photomontage
    Agarwala, A
    Dontcheva, M
    Agrawala, M
    Drucker, S
    Colburn, A
    Curless, B
    Salesin, D
    Cohen, M
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2004, 23 (03): : 294 - 302
  • [3] Salient Object Detection: A Benchmark
    Borji, Ali
    Cheng, Ming-Ming
    Jiang, Huaizu
    Li, Jia
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) : 5706 - 5722
  • [4] MULTI-GENERATOR ADVERSARIAL NETWORKS FOR LIGHT FIELD SALIENCY DETECTION
    Cai, Hongyan
    Zhang, Xudong
    Sun, Rui
    Poppe, Ronald
    Zhang, Jun
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2021,
  • [5] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [6] Cheng B, 2021, ADV NEUR IN, V34
  • [7] Deng-Ping Fan, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P275, DOI 10.1007/978-3-030-58610-2_17
  • [8] Structure-measure: A New Way to Evaluate Foreground Maps
    Fan, Deng-Ping
    Cheng, Ming-Ming
    Liu, Yun
    Li, Tao
    Borji, Ali
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4558 - 4567
  • [9] Fan DP, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P698
  • [10] Learning from Pixel-Level Noisy Label : A New Perspective for Light Field Saliency Detection
    Feng, Mingtao
    Liu, Kendong
    Zhang, Liang
    Yu, Hongshan
    Wang, Yaonan
    Mian, Ajmal
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1746 - 1756