LFTransNet: Light Field Salient Object Detection via a Learnable Weight Descriptor

被引：7

作者：

Liu, Zhengyi ^{[1
]}

He, Qian ^{[1
]}

Wang, Linbo ^{[1
]}

Fang, Xianyong ^{[1
]}

Tang, Bin ^{[2
]}

机构：

[1] Anhui Univ, Sch Comp Sci & Technol, Key Lab Intelligent Comp & Signal Proc, Minist Educ, Hefei 230601, Peoples R China

[2] Hefei Univ, Sch Artificial Intelligence & Big data, Hefei 230601, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2023年 / 33卷 / 12期

关键词：

Light field; salient object detection; multimodal; segmentation; transformer decoder; NETWORK;

D O I：

10.1109/TCSVT.2023.3281465

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Light Field Salient Object Detection (LF SOD) aims to segment the visually distinctive objects out of surroundings. Since light field images provide a multi-focus stack (many focal slices in different depth levels) and an all-focus image for the same scene, they record comprehensive but redundant information. Existing methods exploit the useful cue by long short-term memory with attention mechanism, 3D convolution, and graph learning. However, the importance of intra-slice and inter-slice in the focal stack is not well investigated. In the paper, we propose a learnable weight descriptor to simultaneously exploit different weights in slice, spatial region, and channel dimensions, and therefore propose an LF SOD method based on the learnable descriptor. The method extracts slice features and all-focus features from a weight-shared backbone and another backbone, respectively. A transformer decoder is used to learn the weight descriptor which both emphasizes the importance of each slice (inter-slice) and discriminates the spatial and channel importance of each slice (intra-slice). The learnt descriptor serves as the weight to make slice features attend to important slices, regions, and channels. Furthermore, we propose the hierarchical multi-modal fusion which aggregates high-layer features by modelling the long-range dependency to fully excavate common salient semantics and combines low-layer features by spatial constraint to eliminate the blurring effect of slice features. The experimental result exceeds the state-of-the-art methods at least 25% in terms of mean absolute error evaluation metric. It demonstrates a significant improvement in LF SOD performance via the designed learnable weight descriptor. https://github.com/liuzywen/LFTransNet

引用

页码：7764 / 7773

页数：10

共 65 条

[1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2] Interactive digital photomontage
Agarwala, A
Dontcheva, M
Agrawala, M
Drucker, S
Colburn, A
Curless, B
Salesin, D
Cohen, M
[J]. ACM TRANSACTIONS ON GRAPHICS, 2004, 23 (03): : 294 - 302
[3] Salient Object Detection: A Benchmark
Borji, Ali
Cheng, Ming-Ming
Jiang, Huaizu
Li, Jia
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) : 5706 - 5722
[4] MULTI-GENERATOR ADVERSARIAL NETWORKS FOR LIGHT FIELD SALIENCY DETECTION
Cai, Hongyan
Zhang, Xudong
Sun, Rui
Poppe, Ronald
Zhang, Jun
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2021,
[5] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[6] Cheng B, 2021, ADV NEUR IN, V34
[7] Deng-Ping Fan, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P275, DOI 10.1007/978-3-030-58610-2_17
[8] Structure-measure: A New Way to Evaluate Foreground Maps
Fan, Deng-Ping
Cheng, Ming-Ming
Liu, Yun
Li, Tao
Borji, Ali
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4558 - 4567
[9] Fan DP, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P698
[10] Learning from Pixel-Level Noisy Label : A New Perspective for Light Field Saliency Detection
Feng, Mingtao
Liu, Kendong
Zhang, Liang
Yu, Hongshan
Wang, Yaonan
Mian, Ajmal
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1746 - 1756

← 1 2 3 4 5 6 7 →