ASTRA: An Action Spotting TRAnsformer for Soccer Videos

被引:3
|
作者
Xarles, Artur [1 ,2 ]
Escalera, Sergio [1 ,2 ,3 ]
Moeslund, Thomas B. [3 ]
Clapes, Albert [1 ,2 ]
机构
[1] Univ Barcelona, Barcelona, Spain
[2] Comp Vis Ctr, Barcelona, Spain
[3] Aalborg Univ, Aalborg, Denmark
来源
PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON MULTIMEDIA CONTENT ANALYSIS IN SPORTS, MMSPORTS 2023 | 2023年
关键词
computer vision; action spotting; transformer encoder-decoder; uncertainty estimation; balanced mixup;
D O I
10.1145/3606038.3616153
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, we introduce ASTRA, a Transformer-based model designed for the task of Action Spotting in soccer matches. ASTRA addresses several challenges inherent in the task and dataset, including the requirement for precise action localization, the presence of a long-tail data distribution, non-visibility in certain actions, and inherent label noise. To do so, ASTRA incorporates (a) a Transformer encoder-decoder architecture to achieve the desired output temporal resolution and to produce precise predictions, (b) a balanced mixup strategy to handle the long-tail distribution of the data, (c) an uncertainty-aware displacement head to capture the label variability, and (d) input audio signal to enhance detection of non-visible actions. Results demonstrate the effectiveness of ASTRA, achieving a tight Average-mAP of 66.82 on the test set. Moreover, in the SoccerNet 2023 Action Spotting challenge, we secure the 3rd position with an Average-mAP of 70.21 on the challenge set.
引用
收藏
页码:93 / 102
页数:10
相关论文
共 36 条
  • [31] Semantic2Graph: graph-based multi-modal feature fusion for action segmentation in videos
    Zhang, Junbin
    Tsai, Pei-Hsuan
    Tsai, Meng-Hsun
    APPLIED INTELLIGENCE, 2024, 54 (02) : 2084 - 2099
  • [32] Action recognition using fast HOG3D of integral videos and Smith-Waterman partial matching
    El-Henawy, Ibrahim
    Ahmed, Kareem
    Mahmoud, Hamdi
    IET IMAGE PROCESSING, 2018, 12 (06) : 896 - 908
  • [33] Automatic excavator action recognition and localisation for untrimmed video using hybrid LSTM-Transformer networks
    Martin, Abbey
    Hill, Andrew J.
    Seiler, Konstantin M.
    Balamurali, Mehala
    INTERNATIONAL JOURNAL OF MINING RECLAMATION AND ENVIRONMENT, 2024, 38 (05) : 353 - 372
  • [34] Transformer-based deep learning model and video dataset for installation action recognition in offsite projects
    Jang, Junyoung
    Jeong, Eunbeen
    Kim, Tae Wan
    AUTOMATION IN CONSTRUCTION, 2025, 172
  • [35] GR-Former: Graph-reinforcement transformer for skeleton-based driver action recognition
    Xu, Zhuoyan
    Xu, Jingke
    IET COMPUTER VISION, 2024, 18 (07) : 982 - 991
  • [36] TL-CStrans Net: a vision robot for table tennis player action recognition driven via CS-Transformer
    Ma, Libo
    Tong, Yan
    FRONTIERS IN NEUROROBOTICS, 2024, 18