Two-path target-aware contrastive regression for action quality assessment

被引：3

作者：

Ke, Xiao ^{[1
,2
]}

Xu, Huangbiao ^{[1
,2
]}

Lin, Xiaofeng ^{[1
,2
]}

Guo, Wenzhong ^{[1
,2
]}

机构：

[1] Fuzhou Univ, Coll Comp & Data Sci, Fujian Prov Key Lab Networking Comp & Intelligent, Fuzhou 350116, Peoples R China

[2] Minist Educ, Key Lab Spatial Data Min & Informat Sharing, Fuzhou 350003, Peoples R China

来源：

INFORMATION SCIENCES | 2024年 / 664卷

基金：

中国国家自然科学基金;

关键词：

Action quality assessment; Multi-view information; Video understanding; ACTION RECOGNITION; NETWORK;

D O I：

10.1016/j.ins.2024.120347

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Action quality assessment (AQA) is a challenging vision task due to the complexity and variance of the scoring rules embedded in the videos. Recent approaches have reduced the prediction difficulty of AQA via learning action differences between videos, but there are still challenges in learning scoring rules and capturing feature differences. To address these challenges, we propose a two -path target -aware contrastive regression (T2CR) framework. We propose to fuse direct and contrastive regression and exploit the consistency of information across multiple visual fields. Specifically, we first directly learn the relational mapping between global video features and scoring rules, which builds occupational domain prior knowledge to better capture local differences between videos. Then, we acquire the auxiliary visual fields of the videos through sparse sampling to learn the commonality of feature representations in multiple visual fields and eliminate the effect of subjective noise from a single visual field. To demonstrate the effectiveness of T2CR, we conduct extensive experiments on four AQA datasets (MTL-AQA, FineDiving, AQA-7, JIGSAWS). Our method is superior to state-of-the-art methods without elaborate structural design and fine-grained information.

引用

页数：17

共 49 条

[1] Action Quality Assessment with Temporal Parsing Transformer
Bai, Yang
Zhou, Desen
Zhang, Songyang
Wang, Jian
Ding, Errui
Guan, Yu
Long, Yang
Wang, Jingdong
[J]. COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 422 - 438
[2] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[3] Further Results on Input-to-State Stability of Stochastic Cohen-Grossberg BAM Neural Networks with Probabilistic Time-Varying Delays
Chandrasekar, A.
Radhika, T.
Zhu, Quanxin
[J]. NEURAL PROCESSING LETTERS, 2022, 54 (01) : 613 - 635
[4] Chen T, 2020, PR MACH LEARN RES, V119
[5] Cuturi M, 2017, PR MACH LEARN RES, V70
[6] Learning and fusing multiple hidden substages for action quality assessment
Dong, Li-Jia
Zhang, Hong-Bo
Shi, Qinghongya
Lei, Qing
Du, Ji-Xiang
Gao, Shangce
[J]. KNOWLEDGE-BASED SYSTEMS, 2021, 229
[7] Dosovitskiy A., 2021, INT C LEARNING REPRE, DOI DOI 10.48550/ARXIV.2010.11929
[8] The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos
Doughty, Hazel
Mayol-Cuevas, Walterio
Damen, Dima
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7854 - 7863
[9] SlowFast Networks for Video Recognition
Feichtenhofer, Christoph
Fan, Haoqi
Malik, Jitendra
He, Kaiming
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210
[10] Automatic Modelling for Interactive Action Assessment
Gao, Jibin
Pan, Jia-Hui
Zhang, Shao-Jie
Zheng, Wei-Shi
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (03) : 659 - 679

← 1 2 3 4 5 →