ASTRA: An Action Spotting TRAnsformer for Soccer Videos

被引:3
|
作者
Xarles, Artur [1 ,2 ]
Escalera, Sergio [1 ,2 ,3 ]
Moeslund, Thomas B. [3 ]
Clapes, Albert [1 ,2 ]
机构
[1] Univ Barcelona, Barcelona, Spain
[2] Comp Vis Ctr, Barcelona, Spain
[3] Aalborg Univ, Aalborg, Denmark
来源
PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON MULTIMEDIA CONTENT ANALYSIS IN SPORTS, MMSPORTS 2023 | 2023年
关键词
computer vision; action spotting; transformer encoder-decoder; uncertainty estimation; balanced mixup;
D O I
10.1145/3606038.3616153
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, we introduce ASTRA, a Transformer-based model designed for the task of Action Spotting in soccer matches. ASTRA addresses several challenges inherent in the task and dataset, including the requirement for precise action localization, the presence of a long-tail data distribution, non-visibility in certain actions, and inherent label noise. To do so, ASTRA incorporates (a) a Transformer encoder-decoder architecture to achieve the desired output temporal resolution and to produce precise predictions, (b) a balanced mixup strategy to handle the long-tail distribution of the data, (c) an uncertainty-aware displacement head to capture the label variability, and (d) input audio signal to enhance detection of non-visible actions. Results demonstrate the effectiveness of ASTRA, achieving a tight Average-mAP of 66.82 on the test set. Moreover, in the SoccerNet 2023 Action Spotting challenge, we secure the 3rd position with an Average-mAP of 70.21 on the challenge set.
引用
收藏
页码:93 / 102
页数:10
相关论文
共 36 条
  • [21] Top-Down Attention Recurrent VLAD Encoding for Action Recognition in Videos
    Sudhakaran, Swathikiran
    Lanz, Oswald
    AI*IA 2018 - ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11298 : 375 - 386
  • [22] Convolutional Transformer with Similarity-based Boundary Prediction for Action Segmentation
    Du, Dazhao
    Su, Bing
    Li, Yu
    Qi, Zhongang
    Si, Lingyu
    Shan, Ying
    2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 855 - 860
  • [23] Atrous spatial pyramid pooling with swin transformer model for classification of gastrointestinal tract diseases from videos with enhanced explainability
    Abian, Arefin Ittesafun
    Raiaan, Mohaimenul Azam Khan
    Jonkman, Mirjam
    Islam, Sheikh Mohammed Shariful
    Azam, Sami
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 150
  • [24] Omni-TransPose: Fusion of OmniPose and Transformer Architecture for Improving Action Detection
    Phu, Khac-Anh
    Hoang, Van-Dung
    Le, Van-Tuong-Lan
    Tran, Quang-Khai
    RECENT CHALLENGES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024, 2024, 2145 : 59 - 70
  • [25] Attention-based spatial-temporal hierarchical ConvLSTM network for action recognition in videos
    Xue, Fei
    Ji, Hongbing
    Zhang, Wenbo
    Cao, Yi
    IET COMPUTER VISION, 2019, 13 (08) : 708 - 718
  • [26] Holographic Feature Learning of Egocentric-Exocentric Videos for Multi-Domain Action Recognition
    Huang, Yi
    Yang, Xiaoshan
    Gao, Junyun
    Xu, Changsheng
    IEEE Transactions on Multimedia, 2022, 24 : 2273 - 2286
  • [27] Action Transformer: A self-attention model for short-time pose-based human action recognition
    Mazzia, Vittorio
    Angarano, Simone
    Salvetti, Francesco
    Angelini, Federico
    Chiaberge, Marcello
    PATTERN RECOGNITION, 2022, 124
  • [28] Hybrid Dynamic-static Context-aware Attention Network for Action Assessment in Long Videos
    Zeng, Ling-An
    Hong, Fa-Ting
    Zheng, Wei-Shi
    Yu, Qi-Zhi
    Zeng, Wei
    Wang, Yao-Wei
    Lai, Jian-Huang
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2526 - 2534
  • [29] SL-Swin: A Transformer-Based Deep Learning Approach for Macro- and Micro-Expression Spotting on Small-Size Expression Datasets
    He, Erheng
    Chen, Qianru
    Zhong, Qinghua
    ELECTRONICS, 2023, 12 (12)
  • [30] Semantic2Graph: graph-based multi-modal feature fusion for action segmentation in videos
    Junbin Zhang
    Pei-Hsuan Tsai
    Meng-Hsun Tsai
    Applied Intelligence, 2024, 54 : 2084 - 2099