MULTI-STAGE CONTRASTIVE REGRESSION FOR ACTION QUALITY ASSESSMENT

被引：3

作者：

An, Qi ^{[1
]}

Qi, Mengshi ^{[1
]}

Ma, Huadong ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing Key Lab Intelligent Telecommun Software &, Beijing, Peoples R China

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年

关键词：

Action Quality Assessment; Contrastive Regression; Multi-stage Segmentation;

D O I：

10.1109/ICASSP48485.2024.10447069

中图分类号：

学科分类号：

摘要：

In recent years, there has been growing interest in the video-based action quality assessment (AQA). Most existing methods typically solve AQA problem by considering the entire video yet overlooking the inherent stage-level characteristics of actions. To address this issue, we design a novel Multi-stage Contrastive Regression (MCoRe) framework for the AQA task. This approach allows us to efficiently extract spatial-temporal information, while simultaneously reducing computational costs by segmenting the input video into multiple stages or procedures. Inspired by the graph contrastive learning, we propose a new stage-wise contrastive learning loss function to enhance performance. As a result, MCoRe demonstrates the state-of-the-art result so far on the widely-adopted fine-grained AQA dataset. Our source code is available at https://github.com/Angel-1999/MCoRe.

引用

页码：4110 / 4114

页数：5

共 19 条

[1] Am I a Baller? Basketball Performance Assessment from First-Person Videos
Bertasius, Gedas
Park, Hyun Soo
Yu, Stella X.
Shi, Jianbo
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2196 - 2204
[2] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[3] Cho K., 2014, ARXIV14061078, V1406, P1078, DOI DOI 10.3115/V1/D14-1179
[4] Pairwise Contrastive Learning Network for Action Quality Assessment
Li, Mingzhe
Zhang, Hong-Bo
Lei, Qing
Fan, Zongwen
Liu, Jinghua
Du, Ji-Xiang
[J]. COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 457 - 473
[5] What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment
Parmar, Paritosh
Morris, Brendan Tran
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 304 - 313
[6] Semantics-Aware Spatial-Temporal Binaries for Cross-Modal Video Retrieval
Qi, Mengshi
Qin, Jie
Yang, Yi
Wang, Yunhong
Luo, Jiebo
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2989 - 3004
[7] Attentive Relational Networks for Mapping Images to Scene Graphs
Qi, Mengshi
Li, Weijian
Yang, Zhengyuan
Wang, Yunhong
Luo, Jiebo
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3952 - 3961
[8] STC-GAN: Spatio-Temporally Coupled Generative Adversarial Networks for Predictive Scene Parsing
Qi, Mengshi
Wang, Yunhong
Li, Annan
Luo, Jiebo
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 5420 - 5430
[9] stagNet: An Attentive Semantic RNN for Group Activity Recognition
Qi, Mengshi
Qin, Jie
Li, Annan
Wang, Yunhong
Luo, Jiebo
Van Gool, Luc
[J]. COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 104 - 120
[10] Radosavovic I, 2020, PROC CVPR IEEE, P10425, DOI 10.1109/CVPR42600.2020.01044

← 1 2 →