Fusion-Embedding Siamese Network for Light Field Salient Object Detection

被引:16
作者
Chen, Geng [1 ]
Fu, Huazhu [2 ]
Zhou, Tao [3 ]
Xiao, Guobao [4 ]
Fu, Keren [5 ]
Xia, Yong [1 ]
Zhang, Yanning [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci & Engn, Natl Engn Lab Integrated Aerosp Ground Ocean Big D, Data Applicat Technol, Xian 710072, Peoples R China
[2] ASTAR, Inst High Performance Comp IHPC, Singapore 138632, Singapore
[3] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Key Lab Intelligent Percept & Syst & High Dimens I, PCA Lab,Minist Educ, Nanjing 210094, Peoples R China
[4] Minjiang Univ, Coll Comp & Control Engn, Fuzhou 350108, Peoples R China
[5] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
关键词
Light field; multi-modal learning; salient object detection; siamese network; transformer;
D O I
10.1109/TMM.2023.3274933
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Light field salient object detection (SOD) has shown remarkable success and gained considerable attention from the computer vision community. Existing methods usually employ a single-/two-stream network to detect saliency. However, these methods can only handle up to two different modalities at a time, preventing them from being able to fully explore the rich information in multi-modal light field derived data. To address this, we propose the first joint multi-modal learning framework, called FES-Net, for light field SOD, which can take rich inputs not limited to two modalities. Specifically, we propose an attention-aware adaptation module to first transform the multi-modal inputs for use in our joint learning framework. The transformed inputs are then fed to a Siamese network along with multiple embedded feature fusion modules to extract informative multi-modal features. Finally, we predict saliency maps from the high-level extracted features using a saliency decoder module. Our joint multi-modal learning framework effectively resolves the limitations of existing methods, providing efficient and effective multi-modal learning that can fully explore the valuable information in light field data for accurate saliency detection. Furthermore, we improve the performance by introducing the Transformer as our backbone network. To the best of our knowledge, the improved version of our model, called FES-Trans, is the first attempt to address the challenging light field SOD with the powerful Transformer technique. Extensive experiments on benchmark datasets demonstrate that our models are superior light field SOD approaches and outperform cutting-edge models remarkably.
引用
收藏
页码:984 / 994
页数:11
相关论文
共 84 条
[1]  
Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2]   Fully-Convolutional Siamese Networks for Object Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Henriques, Joao F. ;
Vedaldi, Andrea ;
Torr, Philip H. S. .
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865
[3]  
Bromley J., 1993, International Journal of Pattern Recognition and Artificial Intelligence, V7, P669, DOI 10.1142/S0218001493000339
[4]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[5]   Camouflaged Object Detection via Context-Aware Cross-Level Fusion [J].
Chen, Geng ;
Liu, Si-Jie ;
Sun, Yu-Jia ;
Ji, Ge-Peng ;
Wu, Ya-Feng ;
Zhou, Tao .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) :6981-6993
[6]   Three-Stream Attention-Aware Network for RGB-D Salient Object Detection [J].
Chen, Hao ;
Li, Youfu .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) :2825-2835
[7]   Progressively Complementarity-aware Fusion Network for RGB-D Salient Object Detection [J].
Chen, Hao ;
Li, Youfu .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3051-3060
[8]   Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection [J].
Chen, Hao ;
Li, Youfu ;
Su, Dan .
PATTERN RECOGNITION, 2019, 86 :376-385
[9]  
Chen ZY, 2020, AAAI CONF ARTIF INTE, V34, P10599
[10]   Structure-Measure: A New Way to Evaluate Foreground Maps [J].
Cheng, Ming-Ming ;
Fan, Deng-Ping .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (09) :2622-2638