Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points

被引：122

作者：

Baradel, Fabien ^{[1
]}

Wolf, Christian ^{[1
,2
]}

Mille, Julien ^{[3
]}

Taylor, Graham W. ^{[4
,5
]}

机构：

[1] Univ Lyon, INSA Lyon, CNRS, LIRIS, F-69621 Villeurbanne, France

[2] INRIA, CITI Lab, Villeurbanne, France

[3] Univ Tours, Lab Informat, INSA Ctr Val Loire, F-41034 Blois, France

[4] Univ Guelph, Sch Engn, Guelph, ON, Canada

[5] Vector Inst, Toronto, ON, Canada

来源：

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2018年

基金：

加拿大自然科学与工程研究理事会;

关键词：

D O I：

10.1109/CVPR.2018.00056

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a method for human activity recognition from RGB data that does not rely on any pose information during test time, and does not explicitly calculate pose information internally. Instead, a visual attention module learns to predict glimpse sequences in each frame. These glimpses correspond to interest points in the scene that are relevant to the classified activities. No spatial coherence is forced on the glimpse locations, which gives the attention module liberty to explore different points at each frame and better optimize the process of scrutinizing visual information. Tracking and sequentially integrating this kind of unstructured data is a challenge, which we address by separating the set of glimpses from a set of recurrent tracking/recognition workers. These workers receive glimpses, jointly performing subsequent motion tracking and activity prediction. The glimpses are soft-assigned to the workers, optimizing coherence of the assignments in space, time and feature space using an external memory module. No hard decisions are taken, i.e. each glimpse point is assigned to all existing workers, albeit with different importance. Our methods outperform the state-of-the-art on the largest human activity recognition dataset available to-date, NTU RGB+D, and on the Northwestern-UCLA Multiview Action 3D Dataset.

引用

页码：469 / 478

页数：10

共 63 条

[61] Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group [J].

Vemulapalli, Raviteja ;

Arrate, Felipe ;

Chellappa, Rama .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :588-595

[62] Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition [J].

Wu, Di ;

Pigou, Lionel ;

Kindermans, Pieter-Jan ;

Nam Do-Hoang Le ;

Shao, Ling ;

Dambre, Joni ;

Odobez, Jean-Marc .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (08) :1583-1597

[63] The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection [J].

Zanfir, Mihai ;

Leordeanu, Marius ;

Sminchisescu, Cristian .

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :2752-2759

← 1 2 3 4 5 6 7 →