Improved image reconstruction from brain activity through automatic image captioning

被引:0
|
作者
Kalantari, Fatemeh [1 ]
Faez, Karim [1 ]
Amindavar, Hamidreza [1 ]
Nazari, Soheila [2 ]
机构
[1] Amirkabir Univ Technol, Dept Elect Engn, Tehran, Iran
[2] Shahid Beheshti Univ, Fac Elect Engn, Tehran, Iran
来源
SCIENTIFIC REPORTS | 2025年 / 15卷 / 01期
关键词
Semantic image reconstruction; Brain human activity; Latent diffusion model; Visual and semantic decoding; Bootstrapping language-image pre-training; Visual cortex; NATURAL IMAGES; REPRESENTATIONS;
D O I
10.1038/s41598-025-89242-3
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Significant progress has been made in the field of image reconstruction using functional magnetic resonance imaging (fMRI). Certain investigations reconstructed images with visual information decoded from brain signals, yielding insufficient accuracy and quality. The combination of semantic information in the reconstruction was recommended to improve performance. However, this issue continues to come across numerous difficulties. To address such problems, we proposed an approach that combines semantically complex details with visual details for reconstruction. Our proposed method consists of two main modules: visual reconstruction and semantic reconstruction. In the visual reconstruction module, visual information is decoded from brain data using a decoder. This module employs a deep generator network (DGN) to produce images and utilizes a VGG19 network to extract visual features from the generated images. Image optimization is performed iteratively to minimize the error between features decoded from brain data and features extracted from the generated image. In the semantic reconstruction module, two models BLIP and LDM are employed. Using the BLIP model, we generate 10 captions for each training image. The semantic features extracted from the image captions, along with brain data obtained from training sessions, are used to train a decoder. The trained decoder is then utilized to decode semantic features from human brain activity. Finally, the reconstructed image from the visual reconstruction module is used as input to the LDM model, while the semantic features decoded from brain activity are provided as conditional input for semantic reconstruction. Including decoded semantic features improves reconstruction quality, as confirmed by our ablation study. Our strategy is superior both qualitatively and quantitatively to Shen et al.'s method, which utilizes a similar dataset. Our methodology achieved an accuracy of 0.812 and 0.815 for the inception and contrastive language-image pre-training (CLIP) metrics, respectively, which are excellent for the quantitative evaluation of semantic content. We achieved an accuracy of 0.328 in the structural similarity index measure (SSIM), indicating superior performance as a low-level metric. Moreover, our proposed approach for semantic reconstruction of artificial shapes and imagined images achieved acceptable success, attaining accuracies of 0.566 and 0.627 based on the CLIP metric, and 0.671 and 0.565 based on the SSIM metric, respectively.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Automatic image captioning
    Pan, JY
    Yang, HJ
    Duygulu, P
    Faloutsos, C
    2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1987 - 1990
  • [2] Text to Image Synthesis for Improved Image Captioning
    Hossain, Md. Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    Bennamoun, Mohammed
    IEEE ACCESS, 2021, 9 : 64918 - 64928
  • [3] Deep image reconstruction from human brain activity
    Shen, Guohua
    Horikawa, Tomoyasu
    Majima, Kei
    Kamitani, Yukiyasu
    PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (01)
  • [4] Chittron: An Automatic Bangla Image Captioning System
    Rahman, Matiur
    Mohammed, Nabeel
    Mansoor, Nafees
    Momen, Sifat
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY [ICICT-2019], 2019, 154 : 636 - 642
  • [5] Image Captioning Based on Automatic Constraint Loss
    Xu, Chaoqian
    Zhu, Gengming
    Wang, Lixin
    ICMLC 2019: 2019 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2019, : 461 - 465
  • [6] Image captioning improved visual question answering
    Himanshu Sharma
    Anand Singh Jalal
    Multimedia Tools and Applications, 2022, 81 : 34775 - 34796
  • [7] Improved Image Captioning Using GAN and ViT
    Rao, Vrushank D.
    Shashank, B. N.
    Bhattu, S. Nagesh
    COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT III, 2024, 2011 : 375 - 385
  • [8] Improved Transformer with Parallel Encoders for Image Captioning
    Lou, Liangshan
    Lu, Ke
    Xue, Jian
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4072 - 4078
  • [9] Image captioning improved visual question answering
    Sharma, Himanshu
    Jalal, Anand Singh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34775 - 34796
  • [10] Gender Biases in Automatic Evaluation Metrics for Image Captioning
    Qiu, Haoyi
    Dou, Zi-Yi
    Wang, Tianlu
    Celikyilmaz, Asli
    Peng, Nanyun
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 8358 - 8375