Understanding human activity with uncertainty measure for novelty in graph convolutional networks

被引:0
作者
Xing, Hao [1 ]
Burschka, Darius [1 ]
机构
[1] Tech Univ Munich, Sch Computat Informat & Technol, Machine Vis & Percept Grp, Boltzmannstr 3, D-85748 Garching, Germany
关键词
Uncertainty quantification; human activity recognition; activity segmentation; human-object interaction;
D O I
10.1177/02783649241287800
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Understanding human activity is a crucial aspect of developing intelligent robots, particularly in the domain of human-robot collaboration. Nevertheless, existing systems encounter challenges such as over-segmentation, attributed to errors in the up-sampling process of the decoder. In response, we introduce a promising solution: the Temporal Fusion Graph Convolutional Network. This innovative approach aims to rectify the inadequate boundary estimation of individual actions within an activity stream and mitigate the issue of over-segmentation in the temporal dimension. Moreover, systems leveraging human activity recognition frameworks for decision-making necessitate more than just the identification of actions. They require a confidence value indicative of the certainty regarding the correspondence between observations and training examples. This is crucial to prevent overly confident responses to unforeseen scenarios that were not part of the training data and may have resulted in mismatches due to weak similarity measures within the system. To address this, we propose the incorporation of a Spectral Normalized Residual connection aimed at enhancing efficient estimation of novelty in observations. This innovative approach ensures the preservation of input distance within the feature space by imposing constraints on the maximum gradients of weight updates. By limiting these gradients, we promote a more robust handling of novel situations, thereby mitigating the risks associated with overconfidence. Our methodology involves the use of a Gaussian process to quantify the distance in feature space. The final model is evaluated on two challenging public datasets in the field of human-object interaction recognition, that is, Bimanual Actions and IKEA Assembly datasets, and outperforms popular existing methods in terms of action recognition and segmentation accuracy as well as out-of-distribution detection.
引用
收藏
页数:17
相关论文
共 39 条
  • [1] Behrmann J., 2019, INT C MACHINE LEARNI, P573
  • [2] The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and Pose
    Ben-Shabat, Yizhak
    Yu, Xin
    Saleh, Fatemeh
    Campbell, Dylan
    Rodriguez-Opazo, Cristian
    Li, Hongdong
    Gould, Stephen
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 846 - 858
  • [3] Blundell C, 2015, PR MACH LEARN RES, V37, P1613
  • [4] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [5] Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
    Chen, Yuxin
    Zhang, Ziqi
    Yuan, Chunfeng
    Li, Bing
    Deng, Ying
    Hu, Weiming
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13339 - 13348
  • [6] Xception: Deep Learning with Depthwise Separable Convolutions
    Chollet, Francois
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1800 - 1807
  • [7] Learning Object-Action Relations from Bimanual Human Demonstration Using Graph Networks
    Dreher, Christian R. G.
    Waechter, Mirko
    Asfour, Tamim
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (01) : 187 - 194
  • [8] Convolutional Two-Stream Network Fusion for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Zisserman, Andrew
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
  • [9] Gal Y, 2016, PR MACH LEARN RES, V48
  • [10] Henaff M., 2015, Deep convolutional networks on graph-structured data