Automatic Modelling for Interactive Action Assessment

被引:9
作者
Gao, Jibin [1 ]
Pan, Jia-Hui [1 ]
Zhang, Shao-Jie [1 ]
Zheng, Wei-Shi [1 ,2 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518055, Peoples R China
关键词
Action assessment; Interactive action; Video understanding; SURGICAL SKILLS; COEFFICIENT; VALUES; VIDEO;
D O I
10.1007/s11263-022-01695-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action assessment, the task of visually assessing the quality of performing an action, has attracted much attention in recent years, with promising applications in areas such as medical treatment and sporting events. However, most existing methods of action assessment mainly target the actions performed by a single person; in particular, they neglect the asymmetric relations among agents (e.g., between persons and objects), limiting their performance in many nonindividual actions. In this work, we formulate a framework for modelling asymmetric interactions among agents for action assessment, considering the subordinations among agents in many interactive actions. Specifically, we propose an asymmetric interaction learner consisting of an automatic assigner and an asymmetric interaction network search module. The automatic assigner is designed to automatically group agents within an action into a primary agent (e.g., human) and secondary agents (e.g., objects); the asymmetric interaction network search module adaptively learns the asymmetric interactions between these agents. We conduct experiments on the JIGSAWS dataset containing surgical actions and additionally collect two new datasets, TASD-2 and PaSk, for action assessment on interactive sporting actions. The experimental results on these three datasets demonstrate the effectiveness of our framework in achieving state-of-the-art performance. The extensive experiments on the AQA-7 dataset also indicate the robustness of our model in conventional action assessment settings.
引用
收藏
页码:659 / 679
页数:21
相关论文
共 54 条
[1]  
Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/CVPR.2016.572, 10.1109/TPAMI.2017.2711011]
[2]   Convolutional Relational Machine for Group Activity Recognition [J].
Azar, Sina Mokhtarzadeh ;
Atigh, Mina Ghadimi ;
Nickabadi, Ahmad ;
Alahi, Alexandre .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7884-7893
[3]   Am I a Baller? Basketball Performance Assessment from First-Person Videos [J].
Bertasius, Gedas ;
Park, Hyun Soo ;
Yu, Stella X. ;
Shi, Jianbo .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2196-2204
[4]  
Cai Han, 2018, INT C LEARN REPR
[5]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[6]   Learning Person-Person Interaction in Collective Activity Recognition [J].
Chang, Xiaobin ;
Zheng, Wei-Shi ;
Zhang, Jianguo .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (06) :1905-1918
[7]   Fast Person Re-identification via Cross-camera Semantic Binary Transformation [J].
Chen, Jiaxin ;
Wang, Yunhong ;
Qin, Jie ;
Liu, Li ;
Shao, Ling .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5330-5339
[8]   Averaging correlations: Expected values and bias in combined Pearson rs and Fisher's z transformations [J].
Corey, DM ;
Dunlap, WP ;
Burke, MJ .
JOURNAL OF GENERAL PSYCHOLOGY, 1998, 125 (03) :245-261
[9]   Searching for A Robust Neural Architecture in Four GPU Hours [J].
Dong, Xuanyi ;
Yang, Yi .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1761-1770
[10]   The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos [J].
Doughty, Hazel ;
Mayol-Cuevas, Walterio ;
Damen, Dima .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7854-7863