The Potential of a Visual Dialogue Agent In a Tandem Automated Audio Description System for Videos

被引:4
|
作者
Stangl, Abigale [1 ]
Ihorn, Shasta [2 ]
Siu, Yue-Ting [3 ]
Bodi, Aditya [4 ]
Castanon, Mar [4 ]
Narins, Lothar D. [4 ]
Yoon, Ilmi [4 ]
机构
[1] Univ Washington, Human Centered Design & Engn, Seattle, WA 98195 USA
[2] San Francisco State Univ, Dept Psychol, San Francisco, CA 94132 USA
[3] Northwest Ctr Assist Technol Training CATT NW, Washington, DC USA
[4] San Francisco State Univ, Dept Comp Sci, San Francisco, CA 94132 USA
来源
PROCEEDINGS OF THE 25TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, ASSETS 2023 | 2023年
基金
美国国家科学基金会;
关键词
Audio Description; AI; Visual Dialogue; Virtual Agents; Virtual Volunteer; Visual Assistance; Visual Question Answering; Minimum Viable Description; Blind and Low Vision; EMPLOYMENT; YOUTHS;
D O I
10.1145/3597638.3608402
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The relentless pace of video production exacerbates the digital accessibility gap that individuals who are blind or low vision (BLV) face on a daily basis, resulting in disproportionate exclusion from community opportunities and risk management. Whereas previous automated audio description (AD) systems provide single-tool approaches for delivering minimum viable description (MVD) or delivering on-demand visual question answering (VQA), we present a tandem AI-based AD tool that combines MVD and on-demand VQA. A user study with 26 BLV individuals explored how the tandem system may be used under the conditions of delivering MVD and/or on-demand VQA with AI-only or human-in-the-loop support. When each tool was used in isolation, AI-only conditions scored significantly lower in both user enjoyment and comprehension. When used in tandem, AI-only conditions matched outcomes delivered with human-in-the-loop, which suggests that AI-only AD tools may be most effective when both types of tools are used in tandem. A multimodal analysis of interactions with the tandem system revealed areas for system improvement in terms of the timing of AD delivery and accurate content delivery. We discuss how the use of both types of tools in a tandem system can mitigate some of the digital frictions that have plagued efforts in machine learning and automated tools for accessibility.
引用
收藏
页数:17
相关论文
共 11 条
  • [1] Audio Description of Videos for People with Visual Disabilities
    Facanha, Agebson Rocha
    de Oliveira, Adonias Caetano
    de Andrade Lima, Marcos Vinicius
    Viana, Windson
    Sanchez, Jaime
    UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION: USERS AND CONTEXT DIVERSITY, PT III, 2016, 9739 : 505 - 515
  • [2] Applying Audio Description for Context Understanding of Surveillance Videos by People With Visual Impairments
    Campos, Virginia Pinto
    Goncalves, Luiz Marcos G.
    de Araujo, Tiago Maritan U.
    2017 14TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2017,
  • [3] Audio-Visual Isolated Words Recognition for Voice Dialogue System
    Chaloupka, Josef
    ANALYSIS OF VERBAL AND NONVERBAL COMMUNICATION AND ENACTMENT: THE PROCESSING ISSUES, 2011, 6800 : 88 - 94
  • [4] CineAD: a system for automated audio description script generation for the visually impaired
    Campos, Virginia P.
    de Araujo, Tiago M. U.
    de Souza Filho, Guido L.
    Goncalves, Luiz M. G.
    UNIVERSAL ACCESS IN THE INFORMATION SOCIETY, 2020, 19 (01) : 99 - 111
  • [5] CineAD: a system for automated audio description script generation for the visually impaired
    Virginia P. Campos
    Tiago M. U. de Araújo
    Guido L. de Souza Filho
    Luiz M. G. Gonçalves
    Universal Access in the Information Society, 2020, 19 : 99 - 111
  • [6] Avatar Therapy: an audio-visual dialogue system for treating auditory hallucinations
    Huckvale, Mark
    Leff, Julian
    Williams, Geoff
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 392 - 396
  • [7] An automated framework for advertisement detection and removal from sports videos using audio-visual cues
    Abeer TOHEED
    Ali JAVED
    Aun IRTAZA
    Hassan DAWOOD
    Hussain DAWOOD
    Ahmed SALFAKEEH
    Frontiers of Computer Science, 2021, (02) : 32 - 35
  • [8] An automated framework for advertisement detection and removal from sports videos using audio-visual cues
    Abeer Toheed
    Ali Javed
    Aun Irtaza
    Hassan Dawood
    Hussain Dawood
    Ahmed S. Alfakeeh
    Frontiers of Computer Science, 2021, 15
  • [9] An automated framework for advertisement detection and removal from sports videos using audio-visual cues
    Toheed, Abeer
    Javed, Ali
    Irtaza, Aun
    Dawood, Hassan
    Dawood, Hussain
    Alfakeeh, Ahmed S.
    FRONTIERS OF COMPUTER SCIENCE, 2021, 15 (02)
  • [10] Using automated reasoning in the design of an audio-visual communication system
    Campos, JC
    Harrison, MD
    DESIGN, SPECIFICATION AND VERIFICATION OF INTERACTIVE SYSTEMS'99, 1999, : 167 - 188