Integrating Spatial and Temporal Contextual Information for Improved Video Visualization

被引:0
|
作者
Singh, Pratibha [1 ]
Kushwaha, Alok Kumar Singh [1 ]
机构
[1] Guru Ghasidas Vishwavidyalaya, Dept Comp Sci & Engn, Bilaspur, India
来源
FOURTH CONGRESS ON INTELLIGENT SYSTEMS, VOL 2, CIS 2023 | 2024年 / 869卷
关键词
Video visualization; Moment retrieval; Highlights detection; Self-attention network;
D O I
10.1007/978-981-99-9040-5_30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video representation learning is crucial for various tasks, and self-attention has emerged as an effective technique for capturing long-range dependencies. However, existing methods often neglect the distinct contextual information conveyed by spatial and temporal correlations by computing pairwise correlations simultaneously along both dimensions. To address this limitation, we suggest a novel module that sequentially models spatial and temporal correlations. This enables the efficient integration of spatial contexts into temporal modeling. By incorporating this module into a 2D CNN, we develop a self-attention module network tailored for video visualization. We evaluate the effectiveness of our approach on two benchmark datasets: Charades STA and QVHighlight, which are relevant for moment retrieval and highlight detection tasks. Through extensive experimentation, our findings show that on both datasets, the self-attention element network exceeds current methods. Notably, our models consistently surpass shallower networks and those with fewer modalities, highlighting the superiority of our approach. In summary, our proposed self-attention module contributes to advancing video representation learning by effectively capturing spatial and temporal correlations. The notable improvements achieved in moment retrieval and highlight detection tasks validate the efficacy and versatility of our approach.
引用
收藏
页码:415 / 424
页数:10
相关论文
共 50 条
  • [31] Integrating contextual information with per-pixel classification for improved land cover classification
    Stuckens, J
    Coppin, PR
    Bauer, ME
    REMOTE SENSING OF ENVIRONMENT, 2000, 71 (03) : 282 - 296
  • [32] Video foreground segmentation based on analysis of spatial-temporal information
    Min, Hua-Qing
    Chen, Cong
    Luo, Rong-Hua
    Zhu, Jin-Hui
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2011, 24 (04): : 582 - 590
  • [33] Underwater video dehazing based on spatial-temporal information fusion
    Qing, Chunmei
    Yu, Feng
    Xu, Xiangmin
    Huang, Wenyou
    Jin, Jianxiu
    MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2016, 27 (04) : 909 - 924
  • [34] Design of Objective Video Quality Metrics Using Spatial and Temporal Information
    Regis, C. D. M.
    Oliveira, I. P.
    Cardoso, J. V. M.
    Alencar, M. S.
    IEEE LATIN AMERICA TRANSACTIONS, 2015, 13 (03) : 790 - 795
  • [35] Harnessing Representative Spatial-Temporal Information for Video Question Answering
    Wang, Yuanyuan
    Liu, Meng
    Song, Xuemeng
    Nie, Liqiang
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (10)
  • [36] Spatial and temporal information as camera parameters for super-resolution video
    Tarvainen, Jussi
    Nuutinen, Mikko
    Oittinen, Pirkko
    2012 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2012, : 302 - 305
  • [37] SmallBigNet: Integrating Core and Contextual Views for Video Classification
    Li, Xianhang
    Wang, Yali
    Zhou, Zhipeng
    Qiao, Yu
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1089 - 1098
  • [38] Integrating information: A meta-analysis of the spatial contiguity and temporal contiguity effects
    Ginns, Paul
    LEARNING AND INSTRUCTION, 2006, 16 (06) : 511 - 525
  • [39] Integrating a Temporal Dimension into Research on Contextual Health Effects
    Bohn, Verena
    Voigtlaender, S.
    Spallek, J.
    Razum, O.
    EUROPEAN JOURNAL OF PUBLIC HEALTH, 2012, 22 : 198 - 199
  • [40] Integrating information visualization and retrieval for WWW information discovery
    Ohwada, H
    Mizoguchi, F
    THEORETICAL COMPUTER SCIENCE, 2003, 292 (02) : 547 - 571