Image De-occlusion via Event-enhanced Multi-modal Fusion Hybrid Network

被引:7
作者
Li, Si-Qi [1 ,2 ,3 ,4 ]
Gao, Yue [1 ,2 ,3 ,4 ]
Dai, Qiong-Hai [1 ,2 ,3 ,5 ]
机构
[1] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Inst Brain & Cognit Sci, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Beijing Lab Brain & Cognit Intelligence, Beijing Municipal Educ Commiss, Beijing 100084, Peoples R China
[4] Tsinghua Univ, Sch Software, Key Lab Informat Syst Secur, Beijing 100084, Peoples R China
[5] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
北京市自然科学基金;
关键词
Event camera; multi-modal fusion; image de-occlusion; spiking neural network (SNN); image reconstruction;
D O I
10.1007/s11633-022-1350-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Seeing through dense occlusions and reconstructing scene images is an important but challenging task. Traditional frame-based image de-occlusion methods may lead to fatal errors when facing extremely dense occlusions due to the lack of valid information available from the limited input occluded frames. Event cameras are bio-inspired vision sensors that record the brightness changes at each pixel asynchronously with high temporal resolution. However, synthesizing images solely from event streams is ill-posed since only the brightness changes are recorded in the event stream, and the initial brightness is unknown. In this paper, we propose an event-enhanced multi-modal fusion hybrid network for image de-occlusion, which uses event streams to provide complete scene information and frames to provide color and texture information. An event stream encoder based on the spiking neural network (SNN) is proposed to encode and denoise the event stream efficiently. A comparison loss is proposed to generate clearer results. Experimental results on a large-scale event-based and frame-based image de-occlusion dataset demonstrate that our proposed method achieves state-of-the-art performance.
引用
收藏
页码:307 / 318
页数:12
相关论文
共 38 条
[1]   Real-Time High Speed Motion Prediction Using Fast Aperture-Robust Event-Driven Visual Flow [J].
Akolkar, Himanshu ;
Leng, Sio-Hoi ;
Benosman, Ryad .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (01) :361-372
[2]   Graph-Based Object Classification for Neuromorphic Vision Sensing [J].
Bi, Yin ;
Chadha, Aaron ;
Abbas, Alhabib ;
Bourtsoulatze, Eirina ;
Andreopoulos, Yiannis .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :491-501
[3]   A 240 x 180 130 dB 3 μs Latency Global Shutter Spatiotemporal Vision Sensor [J].
Brandli, Christian ;
Berner, Raphael ;
Yang, Minhao ;
Liu, Shih-Chii ;
Delbruck, Tobi .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2014, 49 (10) :2333-2341
[4]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[5]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[6]   Dynamic obstacle avoidance for quadrotors with event cameras [J].
Falanga, Davide ;
Kleber, Kevin ;
Scaramuzza, Davide .
SCIENCE ROBOTICS, 2020, 5 (40)
[7]   A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth, and Optical Flow Estimation [J].
Gallego, Guillermo ;
Rebecq, Henri ;
Scaramuzza, Davide .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3867-3876
[8]   TIME STRUCTURE OF THE ACTIVITY IN NEURAL-NETWORK MODELS [J].
GERSTNER, W .
PHYSICAL REVIEW E, 1995, 51 (01) :738-758
[9]  
Hagenaars JJ, 2021, ADV NEUR IN, V34
[10]   A QUANTITATIVE DESCRIPTION OF MEMBRANE CURRENT AND ITS APPLICATION TO CONDUCTION AND EXCITATION IN NERVE [J].
HODGKIN, AL ;
HUXLEY, AF .
JOURNAL OF PHYSIOLOGY-LONDON, 1952, 117 (04) :500-544