Going Deeper with Semantics: Video Activity Interpretation using Semantic Contextualization

被引:4
|
作者
Aakur, Sathyanarayanan [1 ]
de Souza, Fillipe D. M. [1 ]
Sarkar, Sudeep [1 ]
机构
[1] Univ S Florida, Tampa, FL 33620 USA
来源
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2019年
关键词
RECOGNITION; HISTOGRAMS;
D O I
10.1109/WACV.2019.00026
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A deeper understanding of video activities extends beyond recognition of underlying concepts such as actions and objects: constructing deep semantic representations requires reasoning about the semantic relationships among these concepts, often beyond what is directly observed in the data. To this end, we propose an energy minimization framework that leverages large-scale commonsense knowledge bases, such as ConceptNet, to provide contextual cues to establish semantic relationships among entities directly hypothesized from video signal. We mathematically express this using the language of Grenander's canonical pattern generator theory. We show that the use of prior encoded commonsense knowledge alleviate the need for large annotated training datasets and help tackle imbalance in training through prior knowledge. Using three different publicly available datasets - Charades, Microsoft Visual Description Corpus and Breakfast Actions datasets, we show that the proposed model can generate video interpretations whose quality is better than those reported by state-of-the-art approaches, which have substantial training needs. Through extensive experiments, we show that the use of commonsense knowledge from ConceptNet allows the proposed approach to handle various challenges such as training data imbalance, weak features, complex semantic relationships and visual scenes.
引用
收藏
页码:190 / 199
页数:10
相关论文
共 50 条
  • [1] EDS: Exploring deeper into semantics for video captioning
    Lou, Yibo
    Zhang, Wenjie
    Song, Xiaoning
    Hua, Yang
    Wu, Xiao-Jun
    PATTERN RECOGNITION LETTERS, 2024, 186 : 133 - 140
  • [2] Going Deeper in Video-Cued Multivocal Ethnographies
    Hayashi, Akiko
    ANTHROPOLOGY & EDUCATION QUARTERLY, 2019, 50 (03) : 356 - 360
  • [3] Going Deeper into Embedding Learning for Video Object Segmentation
    Yang, Zongxin
    Li, Peike
    Feng, Qianyu
    Wei, Yunchao
    Yang, Yi
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 697 - 700
  • [4] Going Deeper into First-Person Activity Recognition
    Ma, Minghuang
    Fan, Haoqi
    Kitani, Kris M.
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1894 - 1903
  • [5] Tweet Contextualization Approach Using a Semantic Query Expansion
    Dhokar, Amira
    Hlaoua, Lobna
    Ben Romdhane, Lotfi
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 : 387 - 396
  • [6] Image analysis and interpretation for semantics categorization in baseball video
    Shih, HC
    Huang, CL
    ITCC 2003: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: COMPUTERS AND COMMUNICATIONS, PROCEEDINGS, 2003, : 379 - 383
  • [7] Going deeper with two-stream ConvNets for action recognition in video surveillance
    Han, Yamin
    Zhang, Peng
    Zhuo, Tao
    Huang, Wei
    Zhang, Yanning
    PATTERN RECOGNITION LETTERS, 2018, 107 : 83 - 90
  • [8] GOING DEEPER WITH BRAIN MORPHOMETRY USING NEURAL NETWORKS
    Santa Cruz, Rodrigo
    Lebrat, Leo
    Bourgeat, Pierrick
    Dore, Vincent
    Dowling, Jason
    Fripp, Jurgen
    Fookes, Clinton
    Salvado, Olivier
    2021 IEEE 18TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2021, : 711 - 715
  • [9] Fuzzy reasoning framework to improve semantic video interpretation
    Zarka, Mohamed
    Ben Ammar, Anis
    Alimi, Adel M.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (10) : 5719 - 5750
  • [10] Fuzzy reasoning framework to improve semantic video interpretation
    Mohamed Zarka
    Anis Ben Ammar
    Adel M. Alimi
    Multimedia Tools and Applications, 2016, 75 : 5719 - 5750