Fixation Prediction through Multimodal Analysis

被引:91
作者
Min, Xiongkuo J [1 ]
Zhai, Guangtao [1 ]
Gu, Ke [1 ]
Yang, Xiaokang [1 ]
机构
[1] Shanghai Jiao Tong Univ, Inst Image Commun & Network Engn, Shanghai Key Lab Digital Media Proc & Transmiss, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Audiovisual attention; multimodal analysis; saliency; eye fixation prediction; attention fusion; SALIENCY DETECTION; INFLUENCE GAZE; MODEL; ATTENTION; LOCALIZATION; FRAMEWORK; FUSION;
D O I
10.1145/2996463
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we propose to predict human eye fixation through incorporating both audio and visual cues. Traditional visual attention models generally make the utmost of stimuli's visual features, yet they bypass all audio information. In the real world, however, we not only direct our gaze according to visual saliency; but also are attracted by salient audio cues. Psychological experiments show that audio has an influence on visual attention, and subjects tend to be attracted by the sound sources. Therefbre, we propose fusing both audio and visual information to predict eye fixation. In our proposed framework, we first localize the moving-sound-generating objects through multimodal analysis and generate an audio attention map. Then, we calculate the spatial and temporal attention maps using the visual modality. Finally, the audio, spatial, and temporal attention maps are fused to generate the final audiovisual saliency map. The proposed method is applicable to scenes containing moving sound-generating objects. We gather a set of video sequences and collect eye tracking data under an audiovisual test condition. Experiment results show that we can achieve better eye fixation prediction performance when taking both audio and visual cues into consideration, especially in some typical scenes in which object motion and audio are highly correlated.
引用
收藏
页数:23
相关论文
共 50 条
  • [31] Deep Neural Network-Based Impacts Analysis of Multimodal Factors on Heat Demand Prediction
    Ma, Zhanyu
    Xie, Jiyang
    Li, Hailong
    Sun, Qie
    Wallin, Fredrik
    Si, Zhongwei
    Guo, Jun
    IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (03) : 594 - 605
  • [32] Reconstructing representations using diffusion models for multimodal sentiment analysis through reading comprehension
    Zhang, Hua
    Yan, Yongjian
    Cai, Zijing
    Zhan, Peiqian
    Chen, Bi
    Jiang, Bo
    Xie, Bo
    APPLIED SOFT COMPUTING, 2024, 167
  • [33] Multimodal Pedestrian Trajectory Prediction Based on Relative Interactive Spatial-Temporal Graph
    Zhao, Duan
    Li, Tao
    Zou, Xiangyu
    He, Yaoyi
    Zhao, Lichang
    Chen, Hui
    Zhuo, Minmin
    IEEE ACCESS, 2022, 10 : 88707 - 88718
  • [34] Emotion recognition from unimodal to multimodal analysis: A review
    Ezzameli, K.
    Mahersia, H.
    INFORMATION FUSION, 2023, 99
  • [35] Toward a General Framework for Multimodal Big Data Analysis
    Bellandi, Valerio
    Ceravolo, Paolo
    Maghool, Samira
    Siccardi, Stefano
    BIG DATA, 2022, 10 (05) : 408 - 424
  • [36] Scanning, attention, and reasoning multimodal content for sentiment analysis
    Liu, Yun
    Li, Zhoujun
    Zhou, Ke
    Zhang, Leilei
    Li, Lang
    Tian, Peng
    Shen, Shixun
    KNOWLEDGE-BASED SYSTEMS, 2023, 268
  • [37] Joint multimodal sentiment analysis based on information relevance
    Chen, Danlei
    Su, Wang
    Wu, Peng
    Hua, Bolin
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
  • [38] Multidataset Independent Subspace Analysis With Application to Multimodal Fusion
    Silva, Rogers F.
    Plis, Sergey M.
    Adali, Tulay
    Pattichis, Marios S.
    Calhoun, Vince D.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 588 - 602
  • [39] Prediction of Human Eye Fixation by a Single Filter
    He Tang
    Chuanbo Chen
    Yanan Bie
    Journal of Signal Processing Systems, 2017, 87 : 197 - 202
  • [40] Fixation prediction for advertising images: Dataset and benchmark
    Liang, Song
    Liu, Ruihang
    Qian, Jiansheng
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 81