View-invariant representation and learning of human action

被引:6
作者
Rao, C [1 ]
Shah, M [1 ]
机构
[1] Univ Cent Florida, Sch Elect Engn & Comp Sci, Comp Vis Lab, Orlando, FL 32816 USA
来源
IEEE WORKSHOP ON DETECTION AND RECOGNITION OF EVENTS IN VIDEO, PROCEEDINGS | 2001年
关键词
video understanding; action recognition; view-invariant representation; spatiotemporal curvature; events; activities;
D O I
10.1109/EVENT.2001.938867
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatically understanding human actions from video sequences is a very challenging problem. This involves the extraction of relevant visual information front a video sequence, representation of that information in a suitable form, and interpretation of visual information for the purpose of recognition and learning. In this paper, we first present a view-invariant representation of action consisting of dynamic instants and intervals, which is computed using spatiotemporal curvature of a trajectory. This representation is then used by, our system to learn human actions without any, training. The system automatically segments video into individual actions, and computes view invariant representation for each action. The system is able to incrementally learn different actions starting with no model. It is able to discover different instances of the same action Performed by different people, and in different viewpoints. In order to validate our approach, we present results on video clips in which roughly, 50 actions were performed by five different people in different viewpoints. Our system performed impressively by correctly, interpreting most actions.
引用
收藏
页码:55 / 63
页数:9
相关论文
共 50 条
[31]   3D Human Action Representation Learning via Cross-View Consistency Pursuit [J].
Li, Linguo ;
Wang, Minsi ;
Ni, Bingbing ;
Wang, Hang ;
Yang, Jiancheng ;
Zhang, Wenjun .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4739-4748
[32]   Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment [J].
Xue, Zihui ;
Grauman, Kristen .
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[33]   Few-Shot Action Recognition via Multi-View Representation Learning [J].
Wang, Xiao ;
Lu, Yang ;
Yu, Wanchuan ;
Pang, Yanwei ;
Wang, Hanzi .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) :8522-8535
[34]   View-invariant gesture recognition using 3D optical flow and harmonic motion context [J].
Holte, M. B. ;
Moeslund, T. B. ;
Fihl, P. .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2010, 114 (12) :1353-1361
[35]   VIEW AND SCALE INSENSITIVE ACTION REPRESENTATION AND RECOGNITION [J].
Cao Yuanyuan ;
Huang Feiyue ;
Tao Linmi ;
Xu Guangyou .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :1130-1133
[36]   View invariant action recognition using projective depth [J].
Ashraf, Nazim ;
Sun, Chuan ;
Foroosh, Hassan .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2014, 123 :41-52
[37]   Jointly Learning Multi-view Features for Human Action Recognition [J].
Wang, Ruoshi ;
Liu, Zhigang ;
Yin, Ziyang .
PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, :4858-4861
[38]   Joint Transferable Dictionary Learning and View Adaptation for Multi-view Human Action Recognition [J].
Sun, Bin ;
Kong, Dehui ;
Wang, Shaofan ;
Wang, Lichun ;
Yin, Baocai .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2021, 15 (02)
[39]   Enhanced view-independent representation method for skeleton-based human action recognition [J].
Jiang Y. ;
Lu L. ;
Xu J. .
International Journal of Information and Communication Technology, 2021, 19 (02) :201-218
[40]   View invariant action recognition using weighted fundamental ratios [J].
Ashraf, Nazim ;
Shen, Yuping ;
Cao, Xiaochun ;
Foroosh, Hassan .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2013, 117 (06) :587-602