Generalized concept overlay for semantic multi-modal analysis of audio-visual content

被引：0

作者：

Mezaris, Vasileios ^{[1
]}

Gidaros, Spyros ^{[1
]}

Kompatsiaris, Ioannis ^{[1
]}

机构：

[1] Ctr Res & Technol Hellas, Informat & Telemat Inst, Thermi 57001, Greece

来源：

PROCEEDINGS 2009 FOURTH INTERNATIONAL WORKSHOP ON SEMANTIC MEDIA ADAPTATION AND PERSONALIZATION | 2009年

关键词：

Video analysis; Semantic multi-modal analysis;

D O I：

10.1109/SMAP.2009.13

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this work, the problem of performing multimodal analysis of audio-visual streams by effectively combining the results of multiple uni-modal analysis techniques is addressed. A non-learning-based approach is proposed to this end, that takes into account the potential variability of the different uni-modal analysis techniques in terms of the decomposition of the audio-visual stream that they adopt, the concepts of an ontology that they consider, the varying semantic importance of each modality, and other factors. Preliminary results from the application of the proposed approach to broadcast News content reveal its effectiveness.

引用

页码：27 / 32

页数：6

共 50 条

[21] Semantic Audio-Visual Navigation
Chen, Changan
Al-Halah, Ziad
Grauman, Kristen
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15511 - 15520
[22] Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking
Li, Yidi
Liu, Hong
Tang, Hao
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1456 - 1463
[23] MTCAM: A Novel Weakly-Supervised Audio-Visual Saliency Prediction Model With Multi-Modal Transformer
Zhu, Dandan
Zhu, Kun
Ding, Weiping
Zhang, Nana
Min, Xiongkuo
Zhai, Guangtao
Yang, Xiaokang
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (02): : 1756 - 1771
[24] A multi-purpose audio-visual corpus for multi-modal Persian speech recognition: The Arman-AV dataset
Peymanfard, Javad
Heydarian, Samin
Lashini, Ali
Zeinali, Hossein
Mohammadi, Mohammad Reza
Mozayani, Nasser
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
[25] Audio-visual event detection based on mining of semantic audio-visual labels
Goh, KS
Miyahara, K
Radhakrishan, R
Xiong, ZY
Divakaran, A
STORAGE AND RETRIEVAL METHODS AND APPLICATIONS FOR MULTIMEDIA 2004, 2004, 5307 : 292 - 299
[26] Visual audio and textual triplet fusion network for multi-modal sentiment analysis
Lv, Cai-Chao
Zhang, Xuan
Zhang, Hong-Bo
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (12) : 9505 - 9513
[27] RETRACTED: An Empirical Analysis of Audio-Visual Teaching and Network Multi-Modal Learning Environment Theory for English Majors (Retracted Article)
Gao, Yuanyuan
JOURNAL OF ENVIRONMENTAL AND PUBLIC HEALTH, 2022, 2022
[28] Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress
Gulin Dogan
Fatma Patlar Akbulut
Neural Computing and Applications, 2023, 35 : 24435 - 24454
[29] Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress
Dogan, Gulin
Akbulut, Fatma Patlar
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (34): : 24435 - 24454
[30] Multi-Modal Anomaly Detection by Using Audio and Visual Cues
Rehman, Ata-Ur
Ullah, Hafiz Sami
Farooq, Haroon
Khan, Muhammad Salman
Mahmood, Tayyeb
Khan, Hafiz Owais Ahmed
IEEE ACCESS, 2021, 9 : 30587 - 30603

← 1 2 3 4 5 →