USING COMPRESSED AUDIO-VISUAL WORDS FOR MULTI-MODAL SCENE CLASSIFICATION

被引：0

作者：

Kurcius, Jan J. ^{[1
]}

Breckon, Toby P. ^{[2
]}

机构：

[1] Cranfield Univ, Cranfield MK43 0AL, Beds, England

[2] Univ Durham, Durham, England

来源：

2014 INTERNATIONAL WORKSHOP ON COMPUTATIONAL INTELLIGENCE FOR MULTIMEDIA UNDERSTANDING (IWCIM) | 2014年

关键词：

multi-resolution; bag of words; MFCC; compressed sensing; audio-visual; multi-modal; RECOGNITION; FEATURES;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We present a novel approach to scene classification using combined audio signal and video image features and compare this methodology to scene classification results using each modality in isolation. Each modality is represented using summary features, namely Mel-frequency Cepstral Coefficients (audio) and Scale Invariant Feature Transform (SIFT) (video) within a multi-resolution bag-of-features model. Uniquely, we extend the classical bag-of-words approach over both audio and video feature spaces, whereby we introduce the concept of compressive sensing as a novel methodology for multi-modal fusion via audiovisual feature dimensionality reduction. We perform evaluation over a range of environments showing performance that is both comparable to the state of the art (86%, over ten scene classes) and invariant to a ten-fold dimensionality reduction within the audio-visual feature space using our compressive representation approach.

引用

页数：5

共 50 条

[1] Audio-Visual Scene Classification Based on Multi-modal Graph Fusion
Lei, Han
Chen, Ning
INTERSPEECH 2022, 2022, : 4157 - 4161
[2] Audio-Visual Emotion Recognition System Using Multi-Modal Features
Handa, Anand
Agarwal, Rashi
Kohli, Narendra
INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2021, 15 (04)
[3] Multi-modal audio-visual event recognition for football analysis
Barnard, M
Odobez, JM
Bengio, S
2003 IEEE XIII WORKSHOP ON NEURAL NETWORKS FOR SIGNAL PROCESSING - NNSP'03, 2003, : 469 - 478
[4] Multi-modal authentication system based on audio-visual data
Debnath, Saswati
Roy, Pinki
PROCEEDINGS OF THE 2019 IEEE REGION 10 CONFERENCE (TENCON 2019): TECHNOLOGY, KNOWLEDGE, AND SOCIETY, 2019, : 2507 - 2512
[5] Multi-Modal Multi-Correlation Learning for Audio-Visual Speech Separation
Wang, Xiaoyu
Kong, Xiangyu
Peng, Xiulian
Lu, Yan
INTERSPEECH 2022, 2022, : 886 - 890
[6] Audio-visual flow - A variational approach to multi-modal flow estimation
Hamid, R
Bobick, A
Yezzi, A
ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5, 2004, : 2563 - 2566
[7] Shot genre classification using compressed audio-visual features
Sugano, M
Isaksson, R
Nakajima, Y
Yanagihara, H
2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 2, PROCEEDINGS, 2003, : 17 - 20
[8] Audio-visual Speaker Recognition via Multi-modal Correlated Neural Networks
Geng, Jiajia
Liu, Xin
Cheung, Yiu-ming
2016 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE WORKSHOPS (WIW 2016), 2016, : 123 - 128
[9] Generalized concept overlay for semantic multi-modal analysis of audio-visual content
Mezaris, Vasileios
Gidaros, Spyros
Kompatsiaris, Ioannis
PROCEEDINGS 2009 FOURTH INTERNATIONAL WORKSHOP ON SEMANTIC MEDIA ADAPTATION AND PERSONALIZATION, 2009, : 27 - 32
[10] Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing
Mo, Shentong
Tian, Yapeng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,

← 1 2 3 4 5 →