Generalized concept overlay for semantic multi-modal analysis of audio-visual content

被引:0
|
作者
Mezaris, Vasileios [1 ]
Gidaros, Spyros [1 ]
Kompatsiaris, Ioannis [1 ]
机构
[1] Ctr Res & Technol Hellas, Informat & Telemat Inst, Thermi 57001, Greece
来源
PROCEEDINGS 2009 FOURTH INTERNATIONAL WORKSHOP ON SEMANTIC MEDIA ADAPTATION AND PERSONALIZATION | 2009年
关键词
Video analysis; Semantic multi-modal analysis;
D O I
10.1109/SMAP.2009.13
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this work, the problem of performing multimodal analysis of audio-visual streams by effectively combining the results of multiple uni-modal analysis techniques is addressed. A non-learning-based approach is proposed to this end, that takes into account the potential variability of the different uni-modal analysis techniques in terms of the decomposition of the audio-visual stream that they adopt, the concepts of an ontology that they consider, the varying semantic importance of each modality, and other factors. Preliminary results from the application of the proposed approach to broadcast News content reveal its effectiveness.
引用
收藏
页码:27 / 32
页数:6
相关论文
共 50 条
  • [21] Semantic Audio-Visual Navigation
    Chen, Changan
    Al-Halah, Ziad
    Grauman, Kristen
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15511 - 15520
  • [22] Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking
    Li, Yidi
    Liu, Hong
    Tang, Hao
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1456 - 1463
  • [23] MTCAM: A Novel Weakly-Supervised Audio-Visual Saliency Prediction Model With Multi-Modal Transformer
    Zhu, Dandan
    Zhu, Kun
    Ding, Weiping
    Zhang, Nana
    Min, Xiongkuo
    Zhai, Guangtao
    Yang, Xiaokang
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (02): : 1756 - 1771
  • [24] A multi-purpose audio-visual corpus for multi-modal Persian speech recognition: The Arman-AV dataset
    Peymanfard, Javad
    Heydarian, Samin
    Lashini, Ali
    Zeinali, Hossein
    Mohammadi, Mohammad Reza
    Mozayani, Nasser
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [25] Audio-visual event detection based on mining of semantic audio-visual labels
    Goh, KS
    Miyahara, K
    Radhakrishan, R
    Xiong, ZY
    Divakaran, A
    STORAGE AND RETRIEVAL METHODS AND APPLICATIONS FOR MULTIMEDIA 2004, 2004, 5307 : 292 - 299
  • [26] Visual audio and textual triplet fusion network for multi-modal sentiment analysis
    Lv, Cai-Chao
    Zhang, Xuan
    Zhang, Hong-Bo
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (12) : 9505 - 9513
  • [27] RETRACTED: An Empirical Analysis of Audio-Visual Teaching and Network Multi-Modal Learning Environment Theory for English Majors (Retracted Article)
    Gao, Yuanyuan
    JOURNAL OF ENVIRONMENTAL AND PUBLIC HEALTH, 2022, 2022
  • [28] Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress
    Gulin Dogan
    Fatma Patlar Akbulut
    Neural Computing and Applications, 2023, 35 : 24435 - 24454
  • [29] Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress
    Dogan, Gulin
    Akbulut, Fatma Patlar
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (34): : 24435 - 24454
  • [30] Multi-Modal Anomaly Detection by Using Audio and Visual Cues
    Rehman, Ata-Ur
    Ullah, Hafiz Sami
    Farooq, Haroon
    Khan, Muhammad Salman
    Mahmood, Tayyeb
    Khan, Hafiz Owais Ahmed
    IEEE ACCESS, 2021, 9 : 30587 - 30603