Egocentric Action Recognition by Automatic Relation Modeling

被引:8
|
作者
Li, Haoxin [1 ]
Zheng, Wei-Shi [2 ,3 ,4 ]
Zhang, Jianguo [5 ,6 ]
Hu, Haifeng [1 ]
Lu, Jiwen [7 ]
Lai, Jian-Huang [2 ,8 ]
机构
[1] Sun Yat sen Univ, Sch Elect & Informat Technol, Guangzhou 510275, Peoples R China
[2] Sun Yat sen Univ, Sch Comp Sci & Engn, Guangzhou 510275, Peoples R China
[3] Peng Cheng Lab, Shenzhen 518005, Peoples R China
[4] Sun Yat sen Univ, Key Lab Machine Intelligence & Adv Comp, Minist Educ, Guangzhou 510275, Peoples R China
[5] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen 518055, Guangdong, Peoples R China
[6] Southern Univ Sci & Technol, Res Inst Trustworthy Autonomous Syst, Shenzhen 518055, Peoples R China
[7] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol BNRist, Dept Automat, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China
[8] Guangdong Prov Key Lab Informat Secur, Shenzhen 518040, Guangdong, Peoples R China
关键词
Egocentric action recognition; human-object interaction recognition; HISTOGRAMS; NETWORK;
D O I
10.1109/TPAMI.2022.3148790
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Egocentric videos, which record the daily activities of individuals from a first-person point of view, have attracted increasing attention during recent years because of their growing use in many popular applications, including life logging, health monitoring and virtual reality. As a fundamental problem in egocentric vision, one of the tasks of egocentric action recognition aims to recognize the actions of the camera wearers from egocentric videos. In egocentric action recognition, relation modeling is important, because the interactions between the camera wearer and the recorded persons or objects form complex relations in egocentric videos. However, only a few of existing methods model the relations between the camera wearer and the interacting persons for egocentric action recognition, and moreover they require prior knowledge or auxiliary data to localize the interacting persons. In this work, we consider modeling the relations in a weakly supervised manner, i.e., without using annotations or prior knowledge about the interacting persons or objects, for egocentric action recognition. We form a weakly supervised framework by unifying automatic interactor localization and explicit relation modeling for the purpose of automatic relation modeling. First, we learn to automatically localize the interactors, i.e., the body parts of the camera wearer and the persons or objects that the camera wearer interacts with, by learning a series of keypoints directly from video data to localize the action-relevant regions with only action labels and some constraints on these keypoints. Second, more importantly, to explicitly model the relations between the interactors, we develop an ego-relational LSTM (long short-term memory) network with several candidate connections to model the complex relations in egocentric videos, such as the temporal, interactive, and contextual relations. In particular, to reduce human efforts and manual interventions needed to construct an optimal ego-relational LSTM structure, we search for the optimal connections by employing a differentiable network architecture search mechanism, which automatically constructs the ego-relational LSTM network to explicitly model different relations for egocentric action recognition. We conduct extensive experiments on egocentric video datasets to illustrate the effectiveness of our method.
引用
收藏
页码:489 / 507
页数:19
相关论文
共 50 条
  • [1] Can Gaze Inform Egocentric Action Recognition?
    Zhang, Zehua
    Crandall, David
    Proulx, Michael J.
    Talathi, Sachin S.
    Sharma, Abhishek
    2022 ACM SYMPOSIUM ON EYE TRACKING RESEARCH AND APPLICATIONS, ETRA 2022, 2022,
  • [2] Deep Attention Network for Egocentric Action Recognition
    Lu, Minlong
    Li, Ze-Nian
    Wang, Yueming
    Pan, Gang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (08) : 3703 - 3713
  • [3] Test-Time Adaptation for Egocentric Action Recognition
    Plananamente, Mirco
    Plizzari, Chiara
    Caputo, Barbara
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT III, 2022, 13233 : 206 - 218
  • [4] Free-Form Composition Networks for Egocentric Action Recognition
    Wang, Haoran
    Cheng, Qinghua
    Yu, Baosheng
    Zhan, Yibing
    Tao, Dapeng
    Ding, Liang
    Ling, Haibin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9967 - 9978
  • [5] Egocentric Hand Track and Object-based Human Action Recognition
    Kapidis, Georgios
    Poppe, Ronald
    van Dam, Elsbeth
    Noldus, Lucas P. J. J.
    Veltkamp, Remco C.
    2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 922 - 929
  • [6] Distilling interaction knowledge for semi-supervised egocentric action recognition
    Wang, Haoran
    Yang, Jiahao
    Yu, Baosheng
    Zhan, Yibing
    Tao, Dapeng
    Ling, Haibin
    PATTERN RECOGNITION, 2025, 157
  • [7] Slowfast Diversity-aware Prototype Learning for Egocentric Action Recognition
    Dai, Guangzhao
    Shu, Xiangbo
    Yan, Rui
    Huang, Peng
    Tang, Jinhui
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7549 - 7558
  • [8] Gaze-Informed Egocentric Action Recognition for Memory Aid Systems
    Zuo, Zheming
    Yang, Longzhi
    Peng, Yonghong
    Chao, Fei
    Qu, Yanpeng
    IEEE ACCESS, 2018, 6 : 12894 - 12904
  • [9] Masked Video and Body-Worn IMU Autoencoder for Egocentric Action Recognition
    Zhang, Mingfang
    Huang, Yifei
    Liu, Ruicong
    Sato, Yoichi
    COMPUTER VISION-ECCV 2024, PT XVIII, 2025, 15076 : 312 - 330
  • [10] Cross-view action recognition understanding from exocentric to egocentric perspective
    Truong, Thanh-Dat
    Luu, Khoa
    NEUROCOMPUTING, 2025, 614