Human-AI Collaboration for Remote Sighted Assistance: Perspectives from the LLM Era

被引:0
作者
Yu, Rui [1 ]
Lee, Sooyeon [2 ]
Xie, Jingyi [3 ]
Billah, Syed Masum [3 ]
Carroll, John M. [3 ]
机构
[1] Univ Louisville, Dept Comp Sci & Engn, Louisville, KY 40208 USA
[2] New Jersey Inst Technol, Ying Wu Coll Comp, Dept Informat, Newark, NJ 07102 USA
[3] Penn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
关键词
people with visual impairments; remote sighted assistance; conversational assistance; computer vision; artificial intelligence; human-AI collaboration; large language models; COMPUTER VISION; SYSTEM; RECOGNITION; NAVIGATION;
D O I
10.3390/fi16070254
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Remote sighted assistance (RSA) has emerged as a conversational technology aiding people with visual impairments (VI) through real-time video chat communication with sighted agents. We conducted a literature review and interviewed 12 RSA users to understand the technical and navigational challenges faced by both agents and users. The technical challenges were categorized into four groups: agents' difficulties in orienting and localizing users, acquiring and interpreting users' surroundings and obstacles, delivering information specific to user situations, and coping with poor network connections. We also presented 15 real-world navigational challenges, including 8 outdoor and 7 indoor scenarios. Given the spatial and visual nature of these challenges, we identified relevant computer vision problems that could potentially provide solutions. We then formulated 10 emerging problems that neither human agents nor computer vision can fully address alone. For each emerging problem, we discussed solutions grounded in human-AI collaboration. Additionally, with the advent of large language models (LLMs), we outlined how RSA can integrate with LLMs within a human-AI collaborative framework, envisioning the future of visual prosthetics.
引用
收藏
页数:32
相关论文
共 179 条
[1]  
Achiam OJ, 2023, Arxiv, DOI [arXiv:2303.08774, 10.48550/arXiv.2303.08774, DOI 10.48550/ARXIV.2303.08774]
[2]   ReCog: Supporting Blind People in Recognizing Personal Objects [J].
Ahmetovic, Dragan ;
Sato, Daisuke ;
Oh, Uran ;
Ishihara, Tatsuya ;
Kitani, Kris ;
Asakawa, Chieko .
PROCEEDINGS OF THE 2020 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'20), 2020,
[3]  
Ahmetovic D, 2017, ACM T ACCESS COMPUT, V9, DOI 10.1145/3046790
[4]   Zebra Crossing Spotter: Automatic Population of Spatial Databases for Increased Safety of Blind Travelers [J].
Ahmetovic, Dragan ;
Manduchi, Roberto ;
Coughlan, James M. ;
Mascetti, Sergio .
ASSETS'15: PROCEEDINGS OF THE 17TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS & ACCESSIBILITY, 2015, :251-258
[5]  
aira, 2024, Aira, a Visual Interpreting Service
[6]   Shared Privacy Concerns of the Visually Impaired and Sighted Bystanders with Camera-Based Assistive Technologies [J].
Akter, Taslima ;
Ahmed, Tousif ;
Kapadia, Apu ;
Swaminathan, Manohar .
ACM TRANSACTIONS ON ACCESSIBLE COMPUTING, 2022, 15 (02)
[7]   Social LSTM: Human Trajectory Prediction in Crowded Spaces [J].
Alahi, Alexandre ;
Goel, Kratarth ;
Ramanathan, Vignesh ;
Robicquet, Alexandre ;
Li Fei-Fei ;
Savarese, Silvio .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :961-971
[8]   AIGuide: An Augmented Reality Hand Guidance Application for People with Visual Impairments [J].
Aldas, Nelson Daniel Troncoso ;
Lee, Sooyeon ;
Lee, Chonghan ;
Rosson, Mary Beth ;
Carroll, John M. ;
Narayanan, Vijaykrishnan .
22ND INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY (ASSETS '20), 2020,
[9]  
Alzantot M, 2012, P 20 INT C ADV GEOGR, P99, DOI DOI 10.1145/2424321.2424335
[10]  
ARCore, 2024, About us