Image Captioning for the Visually Impaired and Blind: A Recipe for Low-Resource Languages

被引:3
作者
Arystanbekov, Batyr [1 ]
Kuzdeuov, Askat [1 ]
Nurgaliyev, Shakhizat [1 ]
Varol, Huseyin Atakan [1 ]
机构
[1] Nazarbayev Univ, Inst Smart Syst & Artificial Intelligence ISSAI, Astana, Kazakhstan
来源
2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC | 2023年
关键词
D O I
10.1109/EMBC40787.2023.10340575
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visually impaired and blind people often face a range of socioeconomic problems that can make it difficult for them to live independently and participate fully in society. Advances in machine learning pave new venues to implement assistive devices for the visually impaired and blind. In this work, we combined image captioning and text-to-speech technologies to create an assistive device for the visually impaired and blind. Our system can provide the user with descriptive auditory feedback in the Kazakh language on a scene acquired in real-time by a head-mounted camera. The image captioning model for the Kazakh language provided satisfactory results in both quantitative metrics and subjective evaluation. Finally, experiments with a visually unimpaired blindfolded participant demonstrated the feasibility of our approach.
引用
收藏
页数:4
相关论文
共 28 条
[1]  
Al-muzaini H. A., 2018, INT J ADV COMPUTER S, V9
[2]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[3]  
[Anonymous], 2015, Microsoft COCO captions: Data collection and evaluation server
[4]  
[Anonymous], 2014, Deep visual-semantic alignments for generating image descriptions
[5]  
[Anonymous], 2012, P 13 C EUR CHAPT ASS
[6]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473, DOI 10.48550/ARXIV.1409.0473]
[7]   An insight into assistive technology for the visually impaired and blind people: state-of-the-art and future trends [J].
Bhowmick, Alexy ;
Hazarika, Shyamanta M. .
JOURNAL ON MULTIMODAL USER INTERFACES, 2017, 11 (02) :149-172
[8]  
Biswas P, 2005, I CONF VLSI DESIGN, P651
[9]   Magnitude, temporal trends, and projections of the global prevalence of blindness and distance and near vision impairment: a systematic review and meta-analysis [J].
Bourne, Rupert R. A. ;
Flaxman, Seth R. ;
Braithwaite, Tasanee ;
Cicinelli, Maria V. ;
Das, Aditi ;
Jonas, Jost B. ;
Keeffe, Jill ;
Kempen, John H. ;
Leasher, Janet ;
Limburg, Hans ;
Naidoo, Kovin ;
Pesudovs, Konrad ;
Resnikoff, Serge ;
Silvester, Alex ;
Stevens, Gretchen A. ;
Tahhan, Nina ;
Wong, Tien Y. ;
Taylor, Hugh R. .
LANCET GLOBAL HEALTH, 2017, 5 (09) :E888-E897
[10]  
Brady E., 2013, P SIGCHI C HUM FACT, DOI 10.1145/2470654.2481291