Intent Prediction in Human-Human Interactions

被引:5
作者
Baruah, Murchana [1 ,2 ]
Banerjee, Bonny [1 ,2 ]
Nagar, Atulya K. K. [3 ]
机构
[1] Univ Memphis, Inst Intelligent Syst, Memphis, TN 38152 USA
[2] Univ Memphis, Dept Elect & Comp Engn, Memphis, TN 38152 USA
[3] Liverpool Hope Univ, Sch Math Comp Sci & Engn, Hope Pk, Liverpool L16 9JD, England
关键词
Agent; attention; intent prediction; interaction recognition and generation; perception; proprioception; ATTENTION; FUSION;
D O I
10.1109/THMS.2023.3239648
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The human ability to infer others' intent is innate and crucial to development. Machines ought to acquire this ability for seamless interaction with humans. In this article, we propose an agent model for predicting the intent of actors in human-human interactions. This requires simultaneous generation and recognition of an interaction at any time, for which end-to-end models are scarce. The proposed agent actively samples its environment via a sequence of glimpses. At each sampling instant, the model infers the observation class and completes the partially observed body motion. It learns the sequence of body locations to sample by jointly minimizing the classification and generation errors. The model is evaluated on videos of two-skeleton interactions under two settings: (first person) one skeleton is the modeled agent and the other skeleton's joint movements constitute its visual observation, and (third person) an audience is the modeled agent and the two interacting skeletons' joint movements constitute its visual observation. Three methods for implementing the attention mechanism are analyzed using benchmark datasets. One of them, where attention is driven by sensory prediction error, achieves the highest classification accuracy in both settings by sampling less than 50% of the skeleton joints, while also being the most efficient in terms of model size. This is the first known attention-based agent to learn end-to-end from two-person interactions for intent prediction, with high accuracy and efficiency.
引用
收藏
页码:458 / 463
页数:6
相关论文
共 33 条
  • [1] [Anonymous], 2016, IEEE IND ELEC
  • [2] [Anonymous], 2017, ARXIV
  • [3] Baradel Fabien, 2017, ARXIV
  • [4] Baruah Murchana, 2020, CVPR, P1022
  • [5] Chopin Baptiste, 2022, arXiv
  • [6] Chung J., 2015, NeurIPS, P2980, DOI DOI 10.48550/ARXIV.1506.02216
  • [7] Attention-Based Multiview Re-Observation Fusion Network for Skeletal Action Recognition
    Fan, Zhaoxuan
    Zhao, Xu
    Lin, Tianwei
    Su, Haisheng
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (02) : 363 - 374
  • [8] Soft plus Hardwired attention: An LSTM framework for human trajectory prediction and abnormal event detection
    Fernando, Tharindu
    Denman, Simon
    Sridharan, Sridha
    Fookes, Clinton
    [J]. NEURAL NETWORKS, 2018, 108 : 466 - 478
  • [9] Assessing proprioception: A critical review of methods
    Han, Jia
    Waddington, Gordon
    Adams, Roger
    Anson, Judith
    Liu, Yu
    [J]. JOURNAL OF SPORT AND HEALTH SCIENCE, 2016, 5 (01) : 80 - 90
  • [10] Human interaction recognition using spatial-temporal salient feature
    Hu, Tao
    Zhu, Xinyan
    Wang, Shaohua
    Duan, Lian
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (20) : 28715 - 28735