Symbiotic Attention for Egocentric Action Recognition With Object-Centric Alignment

被引:56
|
作者
Wang, Xiaohan [1 ,2 ]
Zhu, Linchao [2 ]
Wu, Yu [1 ,2 ]
Yang, Yi [2 ]
机构
[1] Baidu Res, Beijing 100193, Peoples R China
[2] Univ Technol Sydney, Australian Artificial Intelligence Inst, ReLER Lab, Sydney, NSW 2007, Australia
关键词
Feature extraction; Cognition; Three-dimensional displays; Symbiosis; Task analysis; Two dimensional displays; Solid modeling; Egocentric video analysis; action recognition; deep learning; symbiotic attention;
D O I
10.1109/TPAMI.2020.3015894
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose to tackle egocentric action recognition by suppressing background distractors and enhancing action-relevant interactions. The existing approaches usually utilize two independent branches to recognize egocentric actions, i.e., a verb branch and a noun branch. However, the mechanism to suppress distracting objects and exploit local human-object correlations is missing. To this end, we introduce two extra sources of information, i.e., the candidate objects spatial location and their discriminative features, to enable concentration on the occurring interactions. We design a Symbiotic Attention with Object-centric feature Alignment framework (SAOA) to provide meticulous reasoning between the actor and the environment. First, we introduce an object-centric feature alignment method to inject the local object features to the verb branch and noun branch. Second, we propose a symbiotic attention mechanism to encourage the mutual interaction between the two branches and select the most action-relevant candidates for classification. The framework benefits from the communication among the verb branch, the noun branch, and the local object information. Experiments based on different backbones and modalities demonstrate the effectiveness of our method. Notably, our framework achieves the state-of-the-art on the largest egocentric video dataset.
引用
收藏
页码:6605 / 6617
页数:13
相关论文
共 50 条
  • [41] Human Action Recognition Combined With Object Detection
    Zhou B.
    Li J.-F.
    Zidonghua Xuebao/Acta Automatica Sinica, 2020, 46 (09): : 1961 - 1970
  • [42] Global and Local Knowledge-Aware Attention Network for Action Recognition
    Zheng, Zhenxing
    An, Gaoyun
    Wu, Dapeng
    Ruan, Qiuqi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (01) : 334 - 347
  • [43] Disentangling What and Where for 3D Object-Centric Representations Through Active Inference
    Van de Maele, Toon
    Verbelen, Tim
    Catal, Ozan
    Dhoedt, Bart
    MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021, PT I, 2021, 1524 : 701 - 714
  • [44] Hierarchical Graph Attention Based Multi-View Convolutional Neural Network for 3D Object Recognition
    Zeng, Hui
    Zhao, Tianmeng
    Cheng, Ruting
    Wang, Fuzhou
    Liu, Jiwei
    IEEE ACCESS, 2021, 9 (09): : 33323 - 33335
  • [45] Free-Form Composition Networks for Egocentric Action Recognition
    Wang, Haoran
    Cheng, Qinghua
    Yu, Baosheng
    Zhan, Yibing
    Tao, Dapeng
    Ding, Liang
    Ling, Haibin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9967 - 9978
  • [46] A Multimode Two-Stream Network for Egocentric Action Recognition
    Li, Ying
    Shen, Jie
    Xiong, Xin
    He, Wei
    Li, Peng
    Yan, Wenjie
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT I, 2021, 12891 : 357 - 368
  • [47] Category-Level Multi-Object 9D State Tracking Using Object-Centric Multi-Scale Transformer in Point Cloud Stream
    Sun, Jingtao
    Wang, Yaonan
    Feng, Mingtao
    Guo, Xiaofeng
    Lu, Huimin
    Chen, Xieyuanli
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1072 - 1085
  • [48] Egocentric zone-aware action recognition across environments
    Peirone, Simone Alberto
    Goletto, Gabriele
    Planamente, Mirco
    Bottino, Andrea
    Caputo, Barbara
    Averta, Giuseppe
    PATTERN RECOGNITION LETTERS, 2025, 188 : 140 - 147
  • [49] Action Recognition from Egocentric Videos Using Random Walks
    Sahu, Abhimanyu
    Bhattacharya, Rajit
    Bhura, Pallabh
    Chowdhury, Ananda S.
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON COMPUTER VISION AND IMAGE PROCESSING, CVIP 2018, VOL 2, 2020, 1024 : 389 - 402
  • [50] Visual Event-Based Egocentric Human Action Recognition
    Moreno-Rodriguez, Francisco J.
    Javier Traver, V
    Barranco, Francisco
    Dimiccoli, Mariella
    Pla, Filiberto
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2022), 2022, 13256 : 402 - 414