Action Recognition and Benchmark Using Event Cameras

被引：6

作者：

Gao, Yue ^{[1
]}

Lu, Jiaxuan ^{[1
]}

Li, Siqi ^{[1
]}

Ma, Nan ^{[2
]}

Du, Shaoyi ^{[3
,4
]}

Li, Yipeng ^{[5
]}

Dai, Qionghai ^{[5
]}

机构：

[1] Tsinghua Univ, Sch Software, BNRist, THUIBCS,KLISS,BLBCI, Beijing 100084, Peoples R China

[2] Beijing Univ Technol, Beijing Inst Artificial Intelligence, Beijing 100124, Peoples R China

[3] Xi An Jiao Tong Univ, Natl Engn Res Ctr Visual Informat & Applicat, Natl Key Lab Human Machine Hybrid Augmented Intel, Xian 710049, Peoples R China

[4] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Peoples R China

[5] Tsinghua Univ, BNRist, THUIBCS, Dept Automat,BLBCI, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 12期

关键词：

Action recognition; dynamic vision sensor; event camera; event representation; VISION;

D O I：

10.1109/TPAMI.2023.3300741

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent years have witnessed remarkable achievements in video-based action recognition. Apart from traditional frame-based cameras, event cameras are bio-inspired vision sensors that only record pixel-wise brightness changes rather than the brightness value. However, little effort has been made in event-based action recognition, and large-scale public datasets are also nearly unavailable. In this paper, we propose an event-based action recognition framework called EV-ACT. The Learnable Multi-Fused Representation (LMFR) is first proposed to integrate multiple event information in a learnable manner. The LMFR with dual temporal granularity is fed into the event-based slow-fast network for the fusion of appearance and motion features. A spatial-temporal attention mechanism is introduced to further enhance the learning capability of action recognition. To prompt research in this direction, we have collected the largest event-based action recognition benchmark named THUE-ACT-50 and the accompanying THUE-ACT-50-CHL dataset under challenging environments, including a total of over 12,830 recordings from 50 action categories, which is over 4 times the size of the previous largest dataset. Experimental results show that our proposed framework could achieve improvements of over 14.5%, 7.6%, 11.2%, and 7.4% compared to previous works on four benchmarks. We have also deployed our proposed EV-ACT framework on a mobile platform to validate its practicality and efficiency.

引用

页码：14081 / 14097

页数：17

共 58 条

[1] Distance Surface for Event-Based Optical Flow [J].

Almatrafi, Mohammed ;

Baldwin, Raymond ;

Aizawa, Kiyoharu ;

Hirakawa, Keigo .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (07) :1547-1556

[2] DAViS Camera Optical Flow [J].

Almatrafi, Mohammed ;

Hirakawa, Keigo .

IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2020, 6 :396-407

[3] Time-Ordered Recent Event (TORE) Volumes for Event Cameras [J].

Baldwin, R. Wes ;

Liu, Ruixu ;

Almatrafi, Mohammed ;

Asari, Vijayan ;

Hirakawa, Keigo .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) :2519-2532

[4] Event-Based Visual Flow [J].

Benosman, Ryad ;

Clercq, Charles ;

Lagorce, Xavier ;

Ieng, Sio-Hoi ;

Bartolozzi, Chiara .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (02) :407-417

[5]

Berner Raphael, 2013, 2013 Symposium on VLSI Circuits, pC186

[6] A 240 x 180 130 dB 3 μs Latency Global Shutter Spatiotemporal Vision Sensor [J].

Brandli, Christian ;

Berner, Raphael ;

Yang, Minhao ;

Liu, Shih-Chii ;

Delbruck, Tobi .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2014, 49 (10) :2333-2341

[7]

Brandli C, 2014, IEEE INT SYMP CIRC S, P686, DOI 10.1109/ISCAS.2014.6865228

[8] DHP19: Dynamic Vision Sensor 3D Human Pose Dataset [J].

Calabrese, Enrico ;

Taverni, Gemma ;

Easthope, Christopher Awai ;

Skriabine, Sophie ;

Corradi, Federico ;

Longinotti, Luca ;

Eng, Kynan ;

Delbruck, Tobi .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, :1695-1704

[9] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[10] Live Demonstration: CeleX-V: a 1M Pixel Multi-Mode Event-based Sensor [J].

Chen Shoushun ;

Guo Menghan .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, :1682-1683

← 1 2 3 4 5 6 →