Online Action Detection with Learning Future Representations by Contrastive Learning

被引:2
作者
Leng, Haitao [1 ]
Shi, Xiaoming [2 ]
Zhou, Wei [1 ]
Zhang, Kuncai [1 ]
Shi, Qiankun [1 ]
Zhu, Pengcheng [1 ]
机构
[1] Alibaba Grp, Hangzhou, Peoples R China
[2] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年
关键词
Action Detection; Video Representation Learning; Contrastive Learning;
D O I
10.1109/ICME55011.2023.00378
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online Action Detection (OAD), which predicts the ongoing human action from a streaming video, is an important task in multimedia analysis. Compared with offline action detection, OAD only uses historical and current frames without any access to future information, resulting in limited performances. Current OAD methods obtain future information by training an additional future anticipation task and suffer from low efficiency. To address the low efficiency problem, we propose to integrate future information into visual representations for OAD. Specifically, we learn future representations through a contrastive-learning-based method. Further, two novel sampling strategies, multiple scale sampling and timeline overlap sampling, are introduced to enhance our method. The experimental results show that our method significantly improves inference efficiency and achieves promising effectiveness performances on two popular OAD benchmarks.
引用
收藏
页码:2213 / 2218
页数:6
相关论文
共 52 条
[1]  
Alwassel H, 2020, ADV NEUR IN, V33
[2]  
[Anonymous], 2016, CUHK & ETHZ & SIAT submission to ActivityNet challenge 2016
[3]   SST: Single-Stream Temporal Action Proposals [J].
Buch, Shyamal ;
Escorcia, Victor ;
Shen, Chuanqi ;
Ghanem, Bernard ;
Niebles, Juan Carlos .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6373-6382
[4]  
Cao SQ, 2022, Arxiv, DOI arXiv:2208.14209
[5]   Rethinking the Faster R-CNN Architecture for Temporal Action Localization [J].
Chao, Yu-Wei ;
Vijayanarasimhan, Sudheendra ;
Seybold, Bryan ;
Ross, David A. ;
Deng, Jia ;
Sukthankar, Rahul .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1130-1139
[6]  
Chen J., 2022, P IEEE CVF C COMP VI, p19 925
[7]   Modeling temporal structure with LSTM for online action detection [J].
De Geest, Roeland ;
Tuytelaars, Tinne .
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :1549-1557
[8]   Online Action Detection [J].
De Geest, Roeland ;
Gavves, Efstratios ;
Ghodrati, Amir ;
Li, Zhenyang ;
Snoek, Cees ;
Tuytelaars, Tinne .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :269-284
[9]   Learning to Discriminate Information for Online Action Detection [J].
Eun, Hyunjun ;
Moon, Jinyoung ;
Park, Jongyoul ;
Jung, Chanho ;
Kim, Changick .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :806-815
[10]  
Fan DV, 2023, Arxiv, DOI arXiv:2303.07317