Hypergraph-Based Multi-View Action Recognition Using Event Cameras

被引:2
作者
Gao, Yue [1 ]
Lu, Jiaxuan [2 ]
Li, Siqi [1 ]
Li, Yipeng [3 ]
Du, Shaoyi [4 ,5 ,6 ]
机构
[1] Tsinghua Univ, Sch Software, BNRist, THUIBCS,BLBCI, Beijing 100084, Peoples R China
[2] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[3] Tsinghua Univ, Dept Automat, BNRist, THUIBCS,BLBCI, Beijing, Peoples R China
[4] Xi An Jiao Tong Univ, Affiliated Hosp 2, Dept Ultrasound, Xian 710006, Peoples R China
[5] Xi An Jiao Tong Univ, Natl Engn Res Ctr Visual Informat & Applicat, Natl Key Lab Human Machine Hybrid Augmented Intell, Xian 710049, Peoples R China
[6] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Peoples R China
关键词
Cameras; Feature extraction; Neural networks; Vision sensors; Task analysis; Semantics; Robot vision systems; Multi-view action recognition; event camera; dynamic vision sensor; hypergraph neural network; NETWORK;
D O I
10.1109/TPAMI.2024.3382117
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition from video data forms a cornerstone with wide-ranging applications. Single-view action recognition faces limitations due to its reliance on a single viewpoint. In contrast, multi-view approaches capture complementary information from various viewpoints for improved accuracy. Recently, event cameras have emerged as innovative bio-inspired sensors, leading to advancements in event-based action recognition. However, existing works predominantly focus on single-view scenarios, leaving a gap in multi-view event data exploitation, particularly in challenges like information deficit and semantic misalignment. To bridge this gap, we introduce HyperMV, a multi-view event-based action recognition framework. HyperMV converts discrete event data into frame-like representations and extracts view-related features using a shared convolutional network. By treating segments as vertices and constructing hyperedges using rule-based and KNN-based strategies, a multi-view hypergraph neural network that captures relationships across viewpoint and temporal features is established. The vertex attention hypergraph propagation is also introduced for enhanced feature fusion. To prompt research in this area, we present the largest multi-view event-based action dataset THUMV-EACT-50, comprising 50 actions from 6 viewpoints, which surpasses existing datasets by over tenfold. Experimental results show that HyperMV significantly outperforms baselines in both cross-subject and cross-view scenarios, and also exceeds the state-of-the-arts in frame-based multi-view action recognition.
引用
收藏
页码:6610 / 6622
页数:13
相关论文
共 91 条
[1]   Distance Surface for Event-Based Optical Flow [J].
Almatrafi, Mohammed ;
Baldwin, Raymond ;
Aizawa, Kiyoharu ;
Hirakawa, Keigo .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (07) :1547-1556
[2]  
Bai Y, 2020, Arxiv, DOI [arXiv:2009.06599, 10.48550/arXiv.2009.06599]
[3]   Event-Based Visual Flow [J].
Benosman, Ryad ;
Clercq, Charles ;
Lagorce, Xavier ;
Ieng, Sio-Hoi ;
Bartolozzi, Chiara .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (02) :407-417
[4]  
Berner Raphael, 2013, 2013 Symposium on VLSI Circuits, pC186
[5]  
Cai L, 2020, AAAI CONF ARTIF INTE, V34, P3308
[6]   DHP19: Dynamic Vision Sensor 3D Human Pose Dataset [J].
Calabrese, Enrico ;
Taverni, Gemma ;
Easthope, Christopher Awai ;
Skriabine, Sophie ;
Corradi, Federico ;
Longinotti, Luca ;
Eng, Kynan ;
Delbruck, Tobi .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, :1695-1704
[7]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[8]   Dynamic Graph CNN for Event-Camera Based Gesture Recognition [J].
Chen, Junming ;
Meng, Jingjing ;
Wang, Xinchao ;
Yuan, Junsong .
2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
[9]   Live Demonstration: CeleX-V: a 1M Pixel Multi-Mode Event-based Sensor [J].
Chen Shoushun ;
Guo Menghan .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, :1682-1683
[10]   ViewCLR: Learning Self-supervised Video Representation for Unseen Viewpoints [J].
Das, Srijan ;
Ryoo, Michael S. .
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :5562-5572