Integrating Spatial and Temporal Contextual Information for Improved Video Visualization

被引:0
|
作者
Singh, Pratibha [1 ]
Kushwaha, Alok Kumar Singh [1 ]
机构
[1] Guru Ghasidas Vishwavidyalaya, Dept Comp Sci & Engn, Bilaspur, India
来源
FOURTH CONGRESS ON INTELLIGENT SYSTEMS, VOL 2, CIS 2023 | 2024年 / 869卷
关键词
Video visualization; Moment retrieval; Highlights detection; Self-attention network;
D O I
10.1007/978-981-99-9040-5_30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video representation learning is crucial for various tasks, and self-attention has emerged as an effective technique for capturing long-range dependencies. However, existing methods often neglect the distinct contextual information conveyed by spatial and temporal correlations by computing pairwise correlations simultaneously along both dimensions. To address this limitation, we suggest a novel module that sequentially models spatial and temporal correlations. This enables the efficient integration of spatial contexts into temporal modeling. By incorporating this module into a 2D CNN, we develop a self-attention module network tailored for video visualization. We evaluate the effectiveness of our approach on two benchmark datasets: Charades STA and QVHighlight, which are relevant for moment retrieval and highlight detection tasks. Through extensive experimentation, our findings show that on both datasets, the self-attention element network exceeds current methods. Notably, our models consistently surpass shallower networks and those with fewer modalities, highlighting the superiority of our approach. In summary, our proposed self-attention module contributes to advancing video representation learning by effectively capturing spatial and temporal correlations. The notable improvements achieved in moment retrieval and highlight detection tasks validate the efficacy and versatility of our approach.
引用
收藏
页码:415 / 424
页数:10
相关论文
共 50 条
  • [1] Video Object Extraction Integrating Temporal-Spatial Information
    Zhu, Shiping
    Gao, Jie
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ELECTRONIC & MECHANICAL ENGINEERING AND INFORMATION TECHNOLOGY (EMEIT-2012), 2012, 23
  • [2] Integrating Temporal and Spatial Attention for Video Action Recognition
    Zhou, Yuanding
    Li, Baopu
    Wang, Zhihui
    Li, Haojie
    SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [3] COMBINING MULTIMODAL AND TEMPORAL CONTEXTUAL INFORMATION FOR SEMANTIC VIDEO ANALYSIS
    Papadopoulos, Georgios Th.
    Mezaris, Vasileios
    Kompatsiaris, Ioannis
    Strintzis, Michael G.
    2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 4325 - +
  • [4] Improved Side Information Generation for Distributed Video Coding by Exploiting Spatial and Temporal Correlations
    Shuiming Ye
    Mourad Ouaret
    Frederic Dufaux
    Touradj Ebrahimi
    EURASIP Journal on Image and Video Processing, 2009
  • [5] Improved Side Information Generation for Distributed Video Coding by Exploiting Spatial and Temporal Correlations
    Ye, Shuiming
    Ouaret, Mourad
    Dufaux, Frederic
    Ebrahimi, Touradj
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2009,
  • [6] Video segmentation based on spatial and temporal information
    Choi, JG
    Lee, SW
    Kim, SD
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 2661 - 2664
  • [7] Visualization of temporal and spatial information in natural language descriptions
    Kyushu Inst of Technology, Iizuka-shi, Japan
    IEICE Trans Inf Syst, 5 (591-599):
  • [8] Visualization of temporal and spatial information in natural language descriptions
    Baba, H
    Noma, T
    Okada, N
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1996, E79D (05) : 591 - 599
  • [9] Impact of Spatial and Temporal Information on Video Quality and Compressibility
    Robitza, Werner
    Rao, Rakesh Rao Ramachandra
    Goring, Steve
    Raake, Alexer
    2021 13TH INTERNATIONAL CONFERENCE ON QUALITY OF MULTIMEDIA EXPERIENCE (QOMEX), 2021, : 65 - 68
  • [10] Integrating Spatial and Temporal Information for Violent Activity Detection from Video Using Deep Spiking Neural Networks
    Wang, Xiang
    Yang, Jie
    Kasabov, Nikola K.
    SENSORS, 2023, 23 (09)