Fixation Prediction through Multimodal Analysis

被引:0
|
作者
Min, Xiongkuo [1 ]
Zhai, Guangtao [1 ]
Hu, Chunjia [1 ]
Gu, Ke [1 ]
机构
[1] Shanghai Jiao Tong Univ, Inst Image Commun & Network Engn, Shanghai, Peoples R China
来源
2015 VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP) | 2015年
关键词
Audio-visual attention; multimodal analysis; saliency; fixation prediction; attention fusion; MODEL;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose to predict human fixations by incorporating both audio and visual cues. Traditional visual attention models generally make the utmost of stimuli's visual features, while discarding all audio information. But in the real world, we human beings not only direct our gaze according to visual saliency but also may be attracted by some salient audio. Psychological experiments show that audio may have some influence on visual attention, and subjects tend to be attracted the sound sources. Therefore, we propose to fuse both audio and visual information to predict fixations. In our framework, we first localize the moving-sounding objects through multimodal analysis and generate an audio attention map, in which greater value denotes higher possibility of a position being the sound source. Then we calculate the spatial and temporal attention maps using only the visual modality. At last, the audio, spatial and temporal attention maps are fused, generating our final audio-visual saliency map. We gather a set of videos and collect eye-tracking data under audio-visual test conditions. Experiment results show that we can achieve better performance when considering both audio and visual cues.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] Fixation Prediction through Multimodal Analysis
    Min, Xiongkuo J
    Zhai, Guangtao
    Gu, Ke
    Yang, Xiaokang
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2017, 13 (01)
  • [2] RINet: Relative Importance-Aware Network for Fixation Prediction
    Song, Yingjie
    Liu, Zhi
    Li, Gongyang
    Zeng, Dan
    Zhang, Tianhong
    Xu, Lihua
    Wang, Jijun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9263 - 9277
  • [3] Stochastic bottom-up fixation prediction and saccade generation
    Tavakoli, Hamed Rezazadegan
    Rahtu, Esa
    Heikkila, Janne
    IMAGE AND VISION COMPUTING, 2013, 31 (09) : 686 - 693
  • [4] Multimodal Analysis and Prediction of Persuasiveness in Online Social Multimedia
    Park, Sunghyun
    Shim, Han Suk
    Chatterjee, Moitreya
    Sagae, Kenji
    Morency, Louis-Philippe
    ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS, 2016, 6 (03)
  • [5] Multimodal cooperative learning for micro-video advertising click prediction
    Chen, Runyu
    INTERNET RESEARCH, 2022, 32 (02) : 477 - 495
  • [6] Towards fixation prediction: a nonparametric estimation-based approach through key-points
    Oliveira, Saulo A. F.
    Rocha Neto, Ajalmar R.
    Gomes, Joao P. P.
    PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 391 - 396
  • [7] Emotions through texts and images A multimodal analysis of reactions to the Brexit vote on Flickr
    Bouko, Catherine
    PRAGMATICS, 2020, 30 (02): : 222 - 246
  • [8] Fixation Prediction based on Scene Contours
    Zhan, Tengfei
    Ye, Ming
    Jiang, Wenwen
    Li, Yongjie
    Yang, Kaifu
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 2548 - 2554
  • [9] Visual fixation prediction with incomplete attention map based on brain storm optimization
    Yang, Jian
    Shen, Yang
    Shi, Yuhui
    APPLIED SOFT COMPUTING, 2020, 96
  • [10] Multimodal climate change prediction in a monsoon climate
    Mohan S.
    Sinha A.
    Journal of Water and Climate Change, 2023, 14 (09) : 2919 - 2934