Two-path target-aware contrastive regression for action quality assessment

被引:3
作者
Ke, Xiao [1 ,2 ]
Xu, Huangbiao [1 ,2 ]
Lin, Xiaofeng [1 ,2 ]
Guo, Wenzhong [1 ,2 ]
机构
[1] Fuzhou Univ, Coll Comp & Data Sci, Fujian Prov Key Lab Networking Comp & Intelligent, Fuzhou 350116, Peoples R China
[2] Minist Educ, Key Lab Spatial Data Min & Informat Sharing, Fuzhou 350003, Peoples R China
基金
中国国家自然科学基金;
关键词
Action quality assessment; Multi-view information; Video understanding; ACTION RECOGNITION; NETWORK;
D O I
10.1016/j.ins.2024.120347
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Action quality assessment (AQA) is a challenging vision task due to the complexity and variance of the scoring rules embedded in the videos. Recent approaches have reduced the prediction difficulty of AQA via learning action differences between videos, but there are still challenges in learning scoring rules and capturing feature differences. To address these challenges, we propose a two -path target -aware contrastive regression (T2CR) framework. We propose to fuse direct and contrastive regression and exploit the consistency of information across multiple visual fields. Specifically, we first directly learn the relational mapping between global video features and scoring rules, which builds occupational domain prior knowledge to better capture local differences between videos. Then, we acquire the auxiliary visual fields of the videos through sparse sampling to learn the commonality of feature representations in multiple visual fields and eliminate the effect of subjective noise from a single visual field. To demonstrate the effectiveness of T2CR, we conduct extensive experiments on four AQA datasets (MTL-AQA, FineDiving, AQA-7, JIGSAWS). Our method is superior to state-of-the-art methods without elaborate structural design and fine-grained information.
引用
收藏
页数:17
相关论文
共 49 条
  • [1] Action Quality Assessment with Temporal Parsing Transformer
    Bai, Yang
    Zhou, Desen
    Zhang, Songyang
    Wang, Jian
    Ding, Errui
    Guan, Yu
    Long, Yang
    Wang, Jingdong
    [J]. COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 422 - 438
  • [2] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [3] Further Results on Input-to-State Stability of Stochastic Cohen-Grossberg BAM Neural Networks with Probabilistic Time-Varying Delays
    Chandrasekar, A.
    Radhika, T.
    Zhu, Quanxin
    [J]. NEURAL PROCESSING LETTERS, 2022, 54 (01) : 613 - 635
  • [4] Chen T, 2020, PR MACH LEARN RES, V119
  • [5] Cuturi M, 2017, PR MACH LEARN RES, V70
  • [6] Learning and fusing multiple hidden substages for action quality assessment
    Dong, Li-Jia
    Zhang, Hong-Bo
    Shi, Qinghongya
    Lei, Qing
    Du, Ji-Xiang
    Gao, Shangce
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 229
  • [7] Dosovitskiy A., 2021, INT C LEARNING REPRE, DOI DOI 10.48550/ARXIV.2010.11929
  • [8] The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos
    Doughty, Hazel
    Mayol-Cuevas, Walterio
    Damen, Dima
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7854 - 7863
  • [9] SlowFast Networks for Video Recognition
    Feichtenhofer, Christoph
    Fan, Haoqi
    Malik, Jitendra
    He, Kaiming
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210
  • [10] Automatic Modelling for Interactive Action Assessment
    Gao, Jibin
    Pan, Jia-Hui
    Zhang, Shao-Jie
    Zheng, Wei-Shi
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (03) : 659 - 679