Light Field Salient Object Detection With Sparse Views via Complementary and Discriminative Interaction Network

被引:5
作者
Chen, Yilei [1 ]
Li, Gongyang [1 ]
An, Ping [1 ]
Liu, Zhi [1 ]
Huang, Xinpeng [1 ]
Wu, Qiang [2 ]
机构
[1] Shanghai Univ, Sch Commun & Informat Engn, Shanghai Inst Adv Commun & Data Sci, Shanghai 200444, Peoples R China
[2] Univ Technol Sydney, Global Big Data Technol Ctr, Sch Elect & Data Engn, Ultimo, NSW 2007, Australia
基金
中国国家自然科学基金;
关键词
Light fields; Feature extraction; Arrays; Object detection; Cameras; Streaming media; Three-dimensional displays; Light field; salient object detection; sparse views; complementary and discriminative interaction; COMPRESSION; GEOMETRY; DEPTH;
D O I
10.1109/TCSVT.2023.3290600
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
4D light field data record the scene from multiple views, thus implicitly providing beneficial depth cue for salient object detection in challenging scenes. Existing light field salient object detection (LF SOD) methods usually use a large number of views to improve the detection accuracy. However, using so many views for LF SOD brings difficulties to its practical applications. Considering that adjacent views in a light field are actually with very similar contents, in this work, we propose defining a more efficient pattern of input views, i. e., key sparse views, and design a network to effectively explore the depth cue from sparse views for LF SOD. Specifically, we firstly introduce a low rank-based statistical analysis to the existing LF SOD datasets, which allows us to conclude a fixed yet universal pattern for our key sparse views, including the number and positions of views. These views maintain the sufficient depth cue, but greatly lower the number of views to be captured and processed, facilitating practical applications. Then, we propose an effective solution with a key Complementary and Discriminative Interaction Module (CDIM) for LF SOD from key sparse views, named CDINet. The CDINet follows a two-stream structure to extract the depth cue from the light field stream (i. e., sparse views) and the appearance cue from the RGB stream (i. e., center view), generating features and initial saliency maps for each stream. The CDIM is tailored for inter-stream interaction of both these features and saliency maps, using the depth cue to complement the missing salient regions in RGB stream and discriminate the background distraction, to enhance the final saliency map further. Extensive experiments on three LF multi-view datasets demonstrate that our CDINet not only outperforms the state-of-the-art 2D methods, but also achieves competitive performance as compared with the state-of-the-art 3D and 4D methods. The code and results of our method are available at https://github.com/GilbertRC/LFSOD-CDINet.
引用
收藏
页码:1070 / 1085
页数:16
相关论文
共 89 条
  • [1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
  • [2] [Anonymous], 2021, Lytro
  • [3] [Anonymous], 2021, Raytrix-3D Light Field Camera Technology
  • [4] [Anonymous], 2015, INT C LEARN REPR
  • [5] [Anonymous], 1976, Tech. Rep. TR-456
  • [6] [Anonymous], 2010, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, Chia Laguna Resort, Sardinia, Italy, May 1315, 2010
  • [7] Salient object detection: A survey
    Borji, Ali
    Cheng, Ming-Ming
    Hou, Qibin
    Jiang, Huaizu
    Li, Jia
    [J]. COMPUTATIONAL VISUAL MEDIA, 2019, 5 (02) : 117 - 150
  • [8] Salient Object Detection: A Benchmark
    Borji, Ali
    Sihite, Dicky N.
    Itti, Laurent
    [J]. COMPUTER VISION - ECCV 2012, PT II, 2012, 7573 : 414 - 429
  • [9] Lenslet Light Field Image Coding: Classifying, Reviewing and Evaluating
    Brites, Catarina
    Ascenso, Joao
    Pereira, Fernando
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (01) : 339 - 354
  • [10] Light Field Compression Using Global Multiplane Representation and Two-Step Prediction
    Chen, Yilei
    An, Ping
    Huang, Xinpeng
    Yang, Chao
    Liu, Deyang
    Wu, Qiang
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 1135 - 1139