Joint learning of images and videos with a single Vision Transformer

被引:0
作者
Shimizu, Shuki [1 ]
Tamaki, Toru [1 ]
机构
[1] Nagoya Inst Technol, Nagoya, Japan
来源
2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA | 2023年
关键词
D O I
10.23919/MVA57639.2023.10215661
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we propose a method for jointly learning of images and videos using a single model. In general, images and videos are often trained by separate models. We propose in this paper a method that takes a batch of images as input to Vision Transformer (IV-ViT), and also a set of video frames with temporal aggregation by late fusion. Experimental results on two image datasets and two action recognition datasets are presented.
引用
收藏
页数:6
相关论文
共 50 条
[31]   Manipulation Detection in Satellite Images Using Vision Transformer [J].
Horvath, Janos ;
Baireddy, Sriram ;
Hao, Hanxiang ;
Montserrat, Daniel Mas ;
Delp, Edward J. .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, :1032-1041
[32]   Recognizing persons in images by learning from videos [J].
Hoerster, Eva ;
Lux, Jochen ;
Lienhart, Rainer .
MULTIMEDIA CONTENT ACCESS: ALGORITHMS AND SYSTEMS, 2007, 6506
[33]   Learning the representation of instrument images in laparoscopy videos [J].
Kletz, Sabrina ;
Schoeffmann, Klaus ;
Husslein, Heinrich .
HEALTHCARE TECHNOLOGY LETTERS, 2019, 6 (06) :197-203
[34]   ViT-MPI: Vision Transformer Multiplane Images for Surgical Single-View View Synthesis [J].
Han, Chenming ;
Shao, Ruizhi ;
Wu, Gaochang ;
Shao, Hang ;
Liu, Yebin .
ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 :28-40
[35]   Vision Transformer Adapters for Generalizable Multitask Learning [J].
Bhattacharjee, Deblina ;
Susstrunk, Sabine ;
Salzmann, Mathieu .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :18969-18980
[36]   A New Contrastive Learning-Based Vision Transformer for Sentiment Analysis Using Scene Text Images [J].
Palaiahnakote, Shivakumara ;
Kapri, Dhruv ;
Saleem, Muhammad Hammad ;
Pal, Umapada .
International Journal of Pattern Recognition and Artificial Intelligence, 2024, 38 (16)
[37]   Anomaly detection in surveillance videos using Transformer with margin learning [J].
Wang, Dicong ;
Wu, Kaijun .
MULTIMEDIA SYSTEMS, 2024, 30 (05)
[38]   Medical Report Generation from Medical Images Using Vision Transformer and Bart Deep Learning Architectures [J].
Ucan, Murat ;
Kaya, Buket ;
Kaya, Mehmet ;
Alhajj, Reda .
SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2024, PT IV, 2025, 15214 :257-267
[39]   Online Continual Learning with Contrastive Vision Transformer [J].
Wang, Zhen ;
Liu, Liu ;
Kong, Yajing ;
Guo, Jiaxian ;
Tao, Dacheng .
COMPUTER VISION, ECCV 2022, PT XX, 2022, 13680 :631-650
[40]   Effective and Robust: A Discriminative Temporal Learning Transformer for Satellite Videos [J].
Zhang, Xin ;
Jiao, Licheng ;
Li, Lingling ;
Liu, Xu ;
Liu, Fang ;
Yang, Shuyuan .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62