Egocentric Action Recognition by Automatic Relation Modeling

被引：8

作者：

Li, Haoxin ^{[1
]}

Zheng, Wei-Shi ^{[2
,3
,4
]}

Zhang, Jianguo ^{[5
,6
]}

Hu, Haifeng ^{[1
]}

Lu, Jiwen ^{[7
]}

Lai, Jian-Huang ^{[2
,8
]}

机构：

[1] Sun Yat sen Univ, Sch Elect & Informat Technol, Guangzhou 510275, Peoples R China

[2] Sun Yat sen Univ, Sch Comp Sci & Engn, Guangzhou 510275, Peoples R China

[3] Peng Cheng Lab, Shenzhen 518005, Peoples R China

[4] Sun Yat sen Univ, Key Lab Machine Intelligence & Adv Comp, Minist Educ, Guangzhou 510275, Peoples R China

[5] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen 518055, Guangdong, Peoples R China

[6] Southern Univ Sci & Technol, Res Inst Trustworthy Autonomous Syst, Shenzhen 518055, Peoples R China

[7] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol BNRist, Dept Automat, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China

[8] Guangdong Prov Key Lab Informat Secur, Shenzhen 518040, Guangdong, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 01期

关键词：

Egocentric action recognition; human-object interaction recognition; HISTOGRAMS; NETWORK;

D O I：

10.1109/TPAMI.2022.3148790

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Egocentric videos, which record the daily activities of individuals from a first-person point of view, have attracted increasing attention during recent years because of their growing use in many popular applications, including life logging, health monitoring and virtual reality. As a fundamental problem in egocentric vision, one of the tasks of egocentric action recognition aims to recognize the actions of the camera wearers from egocentric videos. In egocentric action recognition, relation modeling is important, because the interactions between the camera wearer and the recorded persons or objects form complex relations in egocentric videos. However, only a few of existing methods model the relations between the camera wearer and the interacting persons for egocentric action recognition, and moreover they require prior knowledge or auxiliary data to localize the interacting persons. In this work, we consider modeling the relations in a weakly supervised manner, i.e., without using annotations or prior knowledge about the interacting persons or objects, for egocentric action recognition. We form a weakly supervised framework by unifying automatic interactor localization and explicit relation modeling for the purpose of automatic relation modeling. First, we learn to automatically localize the interactors, i.e., the body parts of the camera wearer and the persons or objects that the camera wearer interacts with, by learning a series of keypoints directly from video data to localize the action-relevant regions with only action labels and some constraints on these keypoints. Second, more importantly, to explicitly model the relations between the interactors, we develop an ego-relational LSTM (long short-term memory) network with several candidate connections to model the complex relations in egocentric videos, such as the temporal, interactive, and contextual relations. In particular, to reduce human efforts and manual interventions needed to construct an optimal ego-relational LSTM structure, we search for the optimal connections by employing a differentiable network architecture search mechanism, which automatically constructs the ego-relational LSTM network to explicitly model different relations for egocentric action recognition. We conduct extensive experiments on egocentric video datasets to illustrate the effectiveness of our method.

引用

页码：489 / 507

页数：19

共 50 条

[41] Deep Manifold Structure Transfer for Action Recognition
Li, Ce
Zhang, Baochang
Chen, Chen
Ye, Qixiang
Han, Jungong
Guo, Guodong
Ji, Rongrong
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (09) : 4646 - 4658
[42] A New Dataset and Evaluation for Infrared Action Recognition
Gao, Chenqiang
Du, Yinhe
Liu, Jiang
Yang, Luyu
Meng, Deyu
COMPUTER VISION, CCCV 2015, PT II, 2015, 547 : 302 - 312
[43] ACTION RECOGNITION BASED ON SPARSE MOTION TRAJECTORIES
Jargalsaikhan, Iveel
Little, Suzanne
Direkoglu, Cem
O'Connor, Noel E.
2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 3982 - 3985
[44] Spatiotemporal wavelet correlogram for human action recognition
Moghaddam, Hamid Abrishami
Zare, Amin
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2019, 8 (03) : 167 - 180
[45] On the Combination of IMU and Optical Flow for Action Recognition
Alhersh, Taha
Stuckenschmidt, Heiner
2019 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS (PERCOM WORKSHOPS), 2019, : 17 - 21
[46] Attribute Regularization Based Human Action Recognition
Zhang, Zhong
Wang, Chunheng
Xiao, Baihua
Zhou, Wen
Liu, Shuang
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2013, 8 (10) : 1600 - 1609
[47] Explore human parsing modality for action recognition
Liu, Jinfu
Ding, Runwei
Wen, Yuhang
Dai, Nan
Meng, Fanyang
Zhang, Fang-Lue
Zhao, Shen
Liu, Mengyuan
CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, : 1623 - 1633
[48] Trajectory-Set Feature for Action Recognition
Matsui, Kenji
Tamaki, Toru
Raytchev, Bisser
Kaneda, Kazufumi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (08) : 1922 - 1924
[49] Dual attention convolutional network for action recognition
Li, Xiaoqiang
Xie, Miao
Zhang, Yin
Ding, Guangtai
Tong, Weiqin
IET IMAGE PROCESSING, 2020, 14 (06) : 1059 - 1065
[50] Robust relative attributes for human action recognition
Zhang, Zhong
Wang, Chunheng
Xiao, Baihua
Zhou, Wen
Liu, Shuang
PATTERN ANALYSIS AND APPLICATIONS, 2015, 18 (01) : 157 - 171

← 1 2 3 4 5 →