Exploring Text-Driven Approaches for Online Action Detection

被引:1
作者
Benavent-Lledo, Manuel [1 ]
Mulero-Perez, David [1 ]
Ortiz-Perez, David [1 ]
Garcia-Rodriguez, Jose [1 ]
Orts-Escolano, Sergio [2 ]
机构
[1] Univ Alicante, Dept Comp Technol, Raspeig, Spain
[2] Univ Alicante, Dept Comp Sci & Artificial Intelligence, Raspeig, Spain
来源
BIOINSPIRED SYSTEMS FOR TRANSLATIONAL APPLICATIONS: FROM ROBOTICS TO SOCIAL ENGINEERING, PT II, IWINAC 2024 | 2024年 / 14675卷
关键词
Online action detection; transformer; VLM; zero-shot; few-shot; RECOGNITION;
D O I
10.1007/978-3-031-61137-7_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The use of task-agnostic, pre-trained models for knowledge transfer has become more prevalent due to the availability of extensive open-source vision-language models (VLMs), and increased computational power. However, despite their widespread application across various domains, their potential for online action detection has not been fully explored. Current approaches rely on pre-extracted features from convolutional neural networks. In this paper we explore the potential of using VLMs for online action detection, emphasizing their effectiveness and capabilities for zero-shot and few-shot learning scenarios. Our research highlights the potential of VLMs in this field through empirical demonstrations of their robust performance, positioning them as a powerful tool for further enhancing the state of the art in online action detection.
引用
收藏
页码:55 / 64
页数:10
相关论文
共 41 条
[1]   MiniROAD: Minimal RNN Framework for Online Action Detection [J].
An, Joungbin ;
Kang, Hyolim ;
Han, Su Ho ;
Yang, Ming-Hsuan ;
Kim, Seon Joo .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :10307-10316
[2]   ViViT: A Video Vision Transformer [J].
Arnab, Anurag ;
Dehghani, Mostafa ;
Heigold, Georg ;
Sun, Chen ;
Lucic, Mario ;
Schmid, Cordelia .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826
[3]  
Azorín-López J, 2013, IEEE IJCNN
[4]   A Novel Prediction Method for Early Recognition of Global Human Behaviour in Image Sequences [J].
Azorin-Lopez, Jorge ;
Saval-Calvo, Marcelo ;
Fuster-Guillo, Andres ;
Garcia-Rodriguez, Jose .
NEURAL PROCESSING LETTERS, 2016, 43 (02) :363-387
[5]  
Bao H., 2021, arXiv
[6]   A Comprehensive Study on Pain Assessment from Multimodal Sensor Data [J].
Benavent-Lledo, Manuel ;
Mulero-Perez, David ;
Ortiz-Perez, David ;
Rodriguez-Juan, Javier ;
Berenguer-Agullo, Adrian ;
Psarrou, Alexandra ;
Garcia-Rodriguez, Jose .
SENSORS, 2023, 23 (24)
[7]   Predicting Human-Object Interactions in Egocentric Videos [J].
Benavent-Lledo, Manuel ;
Oprea, Sergiu ;
Alejandro Castro-Vargas, John ;
Mulero-Perez, David ;
Garcia-Rodriguez, Jose .
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[8]   Interaction Estimation in Egocentric Videos via Simultaneous Hand-Object Recognition [J].
Benavent-Lledo, Manuel ;
Oprea, Sergiu ;
Alejandro Castro-Vargas, John ;
Martinez-Gonzalez, Pablo ;
Garcia-Rodriguez, Jose .
16TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2021), 2022, 1401 :439-448
[9]   VINDLU : A Recipe for Effective Video-and-Language Pretraining [J].
Cheng, Feng ;
Wang, Xizi ;
Lei, Jie ;
Crandall, David ;
Bansal, Mohit ;
Bertasius, Gedas .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :10739-10750
[10]   Online Action Detection [J].
De Geest, Roeland ;
Gavves, Efstratios ;
Ghodrati, Amir ;
Li, Zhenyang ;
Snoek, Cees ;
Tuytelaars, Tinne .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :269-284