Improved image reconstruction from brain activity through automatic image captioning

被引：0

作者：

Kalantari, Fatemeh ^{[1
]}

Faez, Karim ^{[1
]}

Amindavar, Hamidreza ^{[1
]}

Nazari, Soheila ^{[2
]}

机构：

[1] Amirkabir Univ Technol, Dept Elect Engn, Tehran, Iran

[2] Shahid Beheshti Univ, Fac Elect Engn, Tehran, Iran

来源：

SCIENTIFIC REPORTS | 2025年 / 15卷 / 01期

关键词：

Semantic image reconstruction; Brain human activity; Latent diffusion model; Visual and semantic decoding; Bootstrapping language-image pre-training; Visual cortex; NATURAL IMAGES; REPRESENTATIONS;

D O I：

10.1038/s41598-025-89242-3

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Significant progress has been made in the field of image reconstruction using functional magnetic resonance imaging (fMRI). Certain investigations reconstructed images with visual information decoded from brain signals, yielding insufficient accuracy and quality. The combination of semantic information in the reconstruction was recommended to improve performance. However, this issue continues to come across numerous difficulties. To address such problems, we proposed an approach that combines semantically complex details with visual details for reconstruction. Our proposed method consists of two main modules: visual reconstruction and semantic reconstruction. In the visual reconstruction module, visual information is decoded from brain data using a decoder. This module employs a deep generator network (DGN) to produce images and utilizes a VGG19 network to extract visual features from the generated images. Image optimization is performed iteratively to minimize the error between features decoded from brain data and features extracted from the generated image. In the semantic reconstruction module, two models BLIP and LDM are employed. Using the BLIP model, we generate 10 captions for each training image. The semantic features extracted from the image captions, along with brain data obtained from training sessions, are used to train a decoder. The trained decoder is then utilized to decode semantic features from human brain activity. Finally, the reconstructed image from the visual reconstruction module is used as input to the LDM model, while the semantic features decoded from brain activity are provided as conditional input for semantic reconstruction. Including decoded semantic features improves reconstruction quality, as confirmed by our ablation study. Our strategy is superior both qualitatively and quantitatively to Shen et al.'s method, which utilizes a similar dataset. Our methodology achieved an accuracy of 0.812 and 0.815 for the inception and contrastive language-image pre-training (CLIP) metrics, respectively, which are excellent for the quantitative evaluation of semantic content. We achieved an accuracy of 0.328 in the structural similarity index measure (SSIM), indicating superior performance as a low-level metric. Moreover, our proposed approach for semantic reconstruction of artificial shapes and imagined images achieved acceptable success, attaining accuracies of 0.566 and 0.627 based on the CLIP metric, and 0.671 and 0.565 based on the SSIM metric, respectively.

引用

页数：17

共 50 条

[1] Automatic image captioning
Pan, JY
Yang, HJ
Duygulu, P
Faloutsos, C
2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1987 - 1990
[2] Text to Image Synthesis for Improved Image Captioning
Hossain, Md. Zakir
Sohel, Ferdous
Shiratuddin, Mohd Fairuz
Laga, Hamid
Bennamoun, Mohammed
IEEE ACCESS, 2021, 9 : 64918 - 64928
[3] Deep image reconstruction from human brain activity
Shen, Guohua
Horikawa, Tomoyasu
Majima, Kei
Kamitani, Yukiyasu
PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (01)
[4] Chittron: An Automatic Bangla Image Captioning System
Rahman, Matiur
Mohammed, Nabeel
Mansoor, Nafees
Momen, Sifat
PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY [ICICT-2019], 2019, 154 : 636 - 642
[5] Image Captioning Based on Automatic Constraint Loss
Xu, Chaoqian
Zhu, Gengming
Wang, Lixin
ICMLC 2019: 2019 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2019, : 461 - 465
[6] Image captioning improved visual question answering
Himanshu Sharma
Anand Singh Jalal
Multimedia Tools and Applications, 2022, 81 : 34775 - 34796
[7] Improved Image Captioning Using GAN and ViT
Rao, Vrushank D.
Shashank, B. N.
Bhattu, S. Nagesh
COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT III, 2024, 2011 : 375 - 385
[8] Improved Transformer with Parallel Encoders for Image Captioning
Lou, Liangshan
Lu, Ke
Xue, Jian
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4072 - 4078
[9] Image captioning improved visual question answering
Sharma, Himanshu
Jalal, Anand Singh
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34775 - 34796
[10] Gender Biases in Automatic Evaluation Metrics for Image Captioning
Qiu, Haoyi
Dou, Zi-Yi
Wang, Tianlu
Celikyilmaz, Asli
Peng, Nanyun
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 8358 - 8375

← 1 2 3 4 5 →