Fixation Prediction through Multimodal Analysis

被引:91
作者
Min, Xiongkuo J [1 ]
Zhai, Guangtao [1 ]
Gu, Ke [1 ]
Yang, Xiaokang [1 ]
机构
[1] Shanghai Jiao Tong Univ, Inst Image Commun & Network Engn, Shanghai Key Lab Digital Media Proc & Transmiss, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Audiovisual attention; multimodal analysis; saliency; eye fixation prediction; attention fusion; SALIENCY DETECTION; INFLUENCE GAZE; MODEL; ATTENTION; LOCALIZATION; FRAMEWORK; FUSION;
D O I
10.1145/2996463
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we propose to predict human eye fixation through incorporating both audio and visual cues. Traditional visual attention models generally make the utmost of stimuli's visual features, yet they bypass all audio information. In the real world, however, we not only direct our gaze according to visual saliency; but also are attracted by salient audio cues. Psychological experiments show that audio has an influence on visual attention, and subjects tend to be attracted by the sound sources. Therefbre, we propose fusing both audio and visual information to predict eye fixation. In our proposed framework, we first localize the moving-sound-generating objects through multimodal analysis and generate an audio attention map. Then, we calculate the spatial and temporal attention maps using the visual modality. Finally, the audio, spatial, and temporal attention maps are fused to generate the final audiovisual saliency map. The proposed method is applicable to scenes containing moving sound-generating objects. We gather a set of video sequences and collect eye tracking data under an audiovisual test condition. Experiment results show that we can achieve better eye fixation prediction performance when taking both audio and visual cues into consideration, especially in some typical scenes in which object motion and audio are highly correlated.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] A Systematic Evaluation of Feature Encoding Techniques for Gait Analysis Using Multimodal Sensory Data
    Fatima, Rimsha
    Khan, Muhammad Hassan
    Nisar, Muhammad Adeel
    Doniec, Rafal
    Farid, Muhammad Shahid
    Grzegorzek, Marcin
    SENSORS, 2024, 24 (01)
  • [22] Quaternion-Based Spectral Saliency Detection for Eye Fixation Prediction
    Schauerte, Boris
    Stiefelhagen, Rainer
    COMPUTER VISION - ECCV 2012, PT II, 2012, 7573 : 116 - 129
  • [23] Emotions through texts and images A multimodal analysis of reactions to the Brexit vote on Flickr
    Bouko, Catherine
    PRAGMATICS, 2020, 30 (02): : 222 - 246
  • [24] Learning a Saliency Map for Fixation Prediction
    Xu, Linfeng
    Zeng, Liaoyuan
    Wang, Zhengning
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (10): : 2294 - 2297
  • [25] Fixation Prediction based on Scene Contours
    Zhan, Tengfei
    Ye, Ming
    Jiang, Wenwen
    Li, Yongjie
    Yang, Kaifu
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 2548 - 2554
  • [26] Local Structure Prediction with Convolutional Neural Networks for Multimodal Brain Tumor Segmentation
    Dvorak, Pavel
    Menze, Bjoern
    MEDICAL COMPUTER VISION: ALGORITHMS FOR BIG DATA, 2016, 9601 : 59 - 71
  • [27] An efficient pitch-by-pitch extraction algorithm through multimodal information
    Hua, Kai-Lung
    Lai, Chao-Ting
    You, Chuang-Wen
    Cheng, Wen-Huang
    INFORMATION SCIENCES, 2015, 294 : 64 - 77
  • [28] Multimodal climate change prediction in a monsoon climate
    Mohan S.
    Sinha A.
    Journal of Water and Climate Change, 2023, 14 (09) : 2919 - 2934
  • [29] Deep Neural Network-Based Impacts Analysis of Multimodal Factors on Heat Demand Prediction
    Ma, Zhanyu
    Xie, Jiyang
    Li, Hailong
    Sun, Qie
    Wallin, Fredrik
    Si, Zhongwei
    Guo, Jun
    IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (03) : 594 - 605
  • [30] Remote Sensing and Time Series Data Fused Multimodal Prediction Model Based on Interaction Analysis
    Zhang, Zhiwei
    Wang, Dong
    ICVIP 2019: PROCEEDINGS OF 2019 3RD INTERNATIONAL CONFERENCE ON VIDEO AND IMAGE PROCESSING, 2019, : 190 - 194