Focal stack based light field salient object detection via 3D-2D convolution hybrid network

被引：0

作者：

Wang, Xin ^{[1
,2
]}

Xiong, Gaomin ^{[1
]}

Zhang, Yong ^{[1
]}

机构：

[1] Hefei Univ Technol, Sch Comp & Informat, Hefei 230601, Anhui, Peoples R China

[2] Intelligent Interconnected Syst Lab Anhui Prov, Hefei 230601, Anhui, Peoples R China

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2024年 / 18卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Light field; SOD; Focal stack; Hybrid network; End-to-end;

D O I：

10.1007/s11760-023-02700-1

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Due to the remarkable ability to capture both spatial and angular information of the scene, light field imaging provides abundant cues and information. Over the last decade, various forms of data, such as the focal stack, all-in-focus image, depth map, sub-aperture image, center-view image, and micro-lens image array, have been exploited by different methods of light field salient object detection (SOD). In this study, we introduce a novel 3D-2D convolution hybrid network called HFSNet, which utilizes the focal stack as the only input to achieve SOD. The encoder network is constructed based on 3D convolution to extract and preserve the continuously changing focus cues within the focal stack. In order to reduce the computational burden of 3D convolution, we incorporate 3D max-pooling layers, channel reduction modules, and focal stack feature fusing modules to reduce the data dimension. The decoder network, on the other hand, is built on 2D convolution to generate coarse saliency maps, which are then refined using the refine module to obtain the final saliency map. We conduct experiments on five benchmark light field SOD datasets, and the results demonstrate that our method outperforms other models on DUTLF-V2 and DUTLF-FS, and achieves competitive outcomes on Lytro Illum, HFUT-Lytro, and LFSD.

引用

页码：109 / 118

页数：10

共 53 条

[1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2] Salient Object Detection: A Benchmark
Borji, Ali
Sihite, Dicky N.
Itti, Laurent
[J]. COMPUTER VISION - ECCV 2012, PT II, 2012, 7573 : 414 - 429
[3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[4] Saliency-aware food image segmentation for personal dietary assessment using a wearable computer
Chen, Hsin-Chen
Jia, Wenyan
Sun, Xin
Li, Zhaoxin
Li, Yuecheng
Fernstrom, John D.
Burke, Lora E.
Baranowski, Thomas
Sun, Mingui
[J]. MEASUREMENT SCIENCE AND TECHNOLOGY, 2015, 26 (02)
[5] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Chen, Liang-Chieh
Papandreou, George
Kokkinos, Iasonas
Murphy, Kevin
Yuille, Alan L.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
[6] Chen Q, 2021, AAAI CONF ARTIF INTE, V35, P1063
[7] Cicek O., 2016, INT C MED IM COMP CO, P424, DOI DOI 10.1007/978-3-319-46723-8_49
[8] Deng-Ping Fan, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P275, DOI 10.1007/978-3-030-58610-2_17
[9] Automatic Detection of Cerebral Microbleeds From MR Images via 3D Convolutional Neural Networks
Dou, Qi
Chen, Hao
Yu, Lequan
Zhao, Lei
Qin, Jing
Wang, Defeng
Mok, Vincent C. T.
Shi, Lin
Heng, Pheng-Ann
[J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2016, 35 (05) : 1182 - 1195
[10] Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Bourdev, Lubomir
Fergus, Rob
Torresani, Lorenzo
Paluri, Manohar
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497

← 1 2 3 4 5 6 →