End-to-End Learning of Representations for Asynchronous Event-Based Data

被引：248

作者：

Gehrig, Daniel ^{[1
,2
,3
]}

Loquercio, Antonio ^{[1
,2
,3
]}

Derpanis, Konstantinos G. ^{[4
,5
]}

Scaramuzza, Davide ^{[1
,2
,3
]}

机构：

[1] Univ Zurich, Dept Informat, Robot & Percept Grp, Zurich, Switzerland

[2] Univ Zurich, Dept Neuroinformat, Robot & Percept Grp, Zurich, Switzerland

[3] Swiss Fed Inst Technol, Zurich, Switzerland

[4] Ryerson Univ, Toronto, ON, Canada

[5] Samsung AI Ctr Toronto, Toronto, ON, Canada

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年

基金：

瑞士国家科学基金会; 加拿大自然科学与工程研究理事会;

关键词：

VISION; FEATURES;

D O I：

10.1109/ICCV.2019.00573

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Event cameras are vision sensors that record asynchronous streams of per-pixel brightness changes, referred to as "events". They have appealing advantages over frame-based cameras for computer vision, including high temporal resolution, high dynamic range, and no motion blur. Due to the sparse, non-uniform spatiotemporal layout of the event signal, pattern recognition algorithms typically aggregate events into a grid-based representation and subsequently process it by a standard vision pipeline, e.g., Convolutional Neural Network (CNN). In this work, we introduce a general framework to convert event streams into grid-based representations through a sequence of differentiable operations. Our framework comes with two main advantages: (i) allows learning the input event representation together with the task dedicated network in an end-to-end manner, and (ii) lays out a taxonomy that unifies the majority of extant event representations in the literature and identifies novel ones. Empirically, we show that our approach to learning the event representation end-to-end yields an improvement of approximately 12% on optical flow estimation and object recognition over state-of-the-art methods.

引用

页码：5632 / 5642

页数：11

共 69 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2] A Low Power, Fully Event-Based Gesture Recognition System [J].

Amir, Arnon ;

Taba, Brian ;

Berg, David ;

Melano, Timothy ;

McKinstry, Jeffrey ;

Di Nolfo, Carmelo ;

Nayak, Tapan ;

Andreopoulos, Alexander ;

Garreau, Guillaume ;

Mendoza, Marcela ;

Kusnitz, Jeff ;

Debole, Michael ;

Esser, Steve ;

Delbruck, Tobi ;

Flickner, Myron ;

Modha, Dharmendra .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :7388-7397

[3] A Low Power, High Throughput, Fully Event-Based Stereo System [J].

Andreopoulos, Alexander ;

Kashyap, Hirak J. ;

Nayak, Tapan K. ;

Amir, Arnon ;

Flickner, Myron D. .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7532-7542

[4]

[Anonymous], 2017, IEEE C COMP VIS PATT

[5]

[Anonymous], 2018, ADV NEURAL INFORM PR

[6]

[Anonymous], 2015, INT JOINT C NEUR NET

[7]

[Anonymous], 2014, ADV NEURAL INFORM PR

[8]

[Anonymous], 2015, ISPRS WORKSH IM SEQ

[9]

[Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.102

[10]

[Anonymous], 2018, EUR C COMP VIS ECCV

← 1 2 3 4 5 6 7 →