Fixation Prediction through Multimodal Analysis

被引:0
|
作者
Min, Xiongkuo [1 ]
Zhai, Guangtao [1 ]
Hu, Chunjia [1 ]
Gu, Ke [1 ]
机构
[1] Shanghai Jiao Tong Univ, Inst Image Commun & Network Engn, Shanghai, Peoples R China
来源
2015 VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP) | 2015年
关键词
Audio-visual attention; multimodal analysis; saliency; fixation prediction; attention fusion; MODEL;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose to predict human fixations by incorporating both audio and visual cues. Traditional visual attention models generally make the utmost of stimuli's visual features, while discarding all audio information. But in the real world, we human beings not only direct our gaze according to visual saliency but also may be attracted by some salient audio. Psychological experiments show that audio may have some influence on visual attention, and subjects tend to be attracted the sound sources. Therefore, we propose to fuse both audio and visual information to predict fixations. In our framework, we first localize the moving-sounding objects through multimodal analysis and generate an audio attention map, in which greater value denotes higher possibility of a position being the sound source. Then we calculate the spatial and temporal attention maps using only the visual modality. At last, the audio, spatial and temporal attention maps are fused, generating our final audio-visual saliency map. We gather a set of videos and collect eye-tracking data under audio-visual test conditions. Experiment results show that we can achieve better performance when considering both audio and visual cues.
引用
收藏
页数:4
相关论文
共 50 条
  • [21] Learning a Combined Model of Visual Saliency for Fixation Prediction
    Wang, Jingwei
    Borji, Ali
    Kuo, C. -C. Jay
    Itti, Laurent
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (04) : 1566 - 1579
  • [22] An Image Statistics-Based Model for Fixation Prediction
    Yanulevskaya, Victoria
    Marsman, Jan Bernard
    Cornelissen, Frans
    Geusebroek, Jan-Mark
    COGNITIVE COMPUTATION, 2011, 3 (01) : 94 - 104
  • [23] Interactive software for multimodal analysis
    O'Halloran, Kay L.
    Podlasov, Alexey
    Chua, Alvin
    Marissa, K. L. E.
    VISUAL COMMUNICATION, 2012, 11 (03) : 363 - 381
  • [24] Fixation Prediction and Visual Priority Maps for Biped Locomotion
    Anantrasirichai, Nantheera
    Daniels, Katherine A. J.
    Burn, Jeremy F.
    Gilchrist, Iain D.
    Bull, David R.
    IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (08) : 2294 - 2306
  • [25] A Multimodal Analysis of Streaming Subscription
    Lee, Yi-Cheng
    Yang, Yu-chen
    Lee, Yen-Hsien
    Chu, Tsai-Hsin
    HCI IN BUSINESS, GOVERNMENT AND ORGANIZATIONS, PT II, HCIBGO 2024, 2024, 14721 : 200 - 208
  • [26] Multimodal analysis for critical thinking
    O'Halloran, Kay L.
    Tan, Sabine
    E, Marissa K. L.
    LEARNING MEDIA AND TECHNOLOGY, 2017, 42 (02) : 147 - 170
  • [27] Fixation Prediction in Videos using Unsupervised Hierarchical Features
    Wang, Julius
    Tavakoli, Hamed R.
    Laaksonen, Jorma
    2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 2225 - 2232
  • [28] Dynamics of Ga penetration in textured Al polycrystal revealed through multimodal three-dimensional analysis
    Lu, N.
    Moniri, S.
    Wiltse, M. R.
    Spielman, J.
    Senabulya, N.
    Shahani, A. J.
    ACTA MATERIALIA, 2021, 217
  • [29] A deep multimodal network for multi-task trajectory prediction
    Lei, Da
    Xu, Min
    Wang, Shuaian
    INFORMATION FUSION, 2025, 113
  • [30] Multimodal Neural Network for Recurrence Prediction of Papillary Thyroid Carcinoma
    Kim, Geun-Hyeong
    Lee, Dong-Hwa
    Choi, Jee-Woo
    Jeon, Hyun-Jeong
    Park, Seung
    ADVANCED INTELLIGENT SYSTEMS, 2023, 5 (02)