Recognizing high-level audio-visual concepts using context

被引:0
|
作者
Naphade, MR [1 ]
Huang, TS [1 ]
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL 61801 USA
来源
2001 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS | 2001年
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Recognition of high-level semantics from audio-visual data is a challenging multimedia understanding problem The difficulty mainly lies in the gap that exists between low level media features and high level semantic concepts In an attempt to bridge this gap we proposed a probabilistic framework for semantic understanding [6, 5] The components of this framework are probabilistic multimedia objects and a graphical network of such objects In this paper we show how the framework supports detection of multiple high-level concepts, which enjoy spatial and temporal support More importantly, we show why context matters and how it can be modeled Using a factor graph framework, we model context and use it to improve detection of sites, objects and events Using concepts Outdoor and flying-helicopter we demonstrate how the factor graph multinet models context Using ROC curves and probability of error curves we support the intuition that context should help.
引用
收藏
页码:46 / 49
页数:4
相关论文
共 50 条
  • [41] Audio-visual speech recognition using deep learning
    Noda, Kuniaki
    Yamaguchi, Yuki
    Nakadai, Kazuhiro
    Okuno, Hiroshi G.
    Ogata, Tetsuya
    APPLIED INTELLIGENCE, 2015, 42 (04) : 722 - 737
  • [42] Harnessing high-level concepts, visual, and auditory features for violence detection in videos
    Peixoto, Bruno M.
    Lavi, Bahram
    Dias, Zanoni
    Rocha, Anderson
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 78
  • [43] Using high-level visual information for color constancy
    van de Weijer, Joost
    Schmid, Cordelia
    Verbeek, Jakob
    2007 IEEE 11TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1-6, 2007, : 2197 - 2204
  • [44] Audio-visual speech recognition using deep learning
    Kuniaki Noda
    Yuki Yamaguchi
    Kazuhiro Nakadai
    Hiroshi G. Okuno
    Tetsuya Ogata
    Applied Intelligence, 2015, 42 : 722 - 737
  • [45] Audio-visual speech recognition using an infrared headset
    Huang, J
    Potamianos, G
    Connell, J
    Neti, C
    SPEECH COMMUNICATION, 2004, 44 (1-4) : 83 - 96
  • [46] USING MULTIPLE VISUAL TANDEM STREAMS IN AUDIO-VISUAL SPEECH RECOGNITION
    Topkaya, Ibrahim Saygin
    Erdogan, Hakan
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4988 - 4991
  • [47] Audio-visual speech recognition using MPEGA compliant visual features
    Aleksic, PS
    Williams, JJ
    Wu, ZL
    Katsaggelos, AK
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1213 - 1227
  • [48] Object category detection using audio-visual cues
    Luo, Jie
    Caputo, Barbara
    Zweig, Alon
    Bach, Joerg-Hendrik
    Anemueller, Joern
    COMPUTER VISION SYSTEMS, PROCEEDINGS, 2008, 5008 : 539 - 548
  • [49] Audio-Visual Model Distillation Using Acoustic Images
    Perez, Andres F.
    Sanguineti, Valentina
    Morerio, Pietro
    Murino, Vittorio
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 2843 - 2852
  • [50] AUDIO-VISUAL SCENE-AWARE DIALOG AND REASONING USING AUDIO-VISUAL TRANSFORMERS WITH JOINT STUDENT-TEACHER LEARNING
    Shah, Ankit
    Geng, Shijie
    Gao, Peng
    Cherian, Anoop
    Hori, Takaaki
    Marks, Tim K.
    Le Roux, Jonathan
    Hori, Chiori
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7732 - 7736