The Potential of a Visual Dialogue Agent In a Tandem Automated Audio Description System for Videos

被引：4

作者：

Stangl, Abigale ^{[1
]}

Ihorn, Shasta ^{[2
]}

Siu, Yue-Ting ^{[3
]}

Bodi, Aditya ^{[4
]}

Castanon, Mar ^{[4
]}

Narins, Lothar D. ^{[4
]}

Yoon, Ilmi ^{[4
]}

机构：

[1] Univ Washington, Human Centered Design & Engn, Seattle, WA 98195 USA

[2] San Francisco State Univ, Dept Psychol, San Francisco, CA 94132 USA

[3] Northwest Ctr Assist Technol Training CATT NW, Washington, DC USA

[4] San Francisco State Univ, Dept Comp Sci, San Francisco, CA 94132 USA

来源：

PROCEEDINGS OF THE 25TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, ASSETS 2023 | 2023年

基金：

美国国家科学基金会;

关键词：

Audio Description; AI; Visual Dialogue; Virtual Agents; Virtual Volunteer; Visual Assistance; Visual Question Answering; Minimum Viable Description; Blind and Low Vision; EMPLOYMENT; YOUTHS;

D O I：

10.1145/3597638.3608402

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The relentless pace of video production exacerbates the digital accessibility gap that individuals who are blind or low vision (BLV) face on a daily basis, resulting in disproportionate exclusion from community opportunities and risk management. Whereas previous automated audio description (AD) systems provide single-tool approaches for delivering minimum viable description (MVD) or delivering on-demand visual question answering (VQA), we present a tandem AI-based AD tool that combines MVD and on-demand VQA. A user study with 26 BLV individuals explored how the tandem system may be used under the conditions of delivering MVD and/or on-demand VQA with AI-only or human-in-the-loop support. When each tool was used in isolation, AI-only conditions scored significantly lower in both user enjoyment and comprehension. When used in tandem, AI-only conditions matched outcomes delivered with human-in-the-loop, which suggests that AI-only AD tools may be most effective when both types of tools are used in tandem. A multimodal analysis of interactions with the tandem system revealed areas for system improvement in terms of the timing of AD delivery and accurate content delivery. We discuss how the use of both types of tools in a tandem system can mitigate some of the digital frictions that have plagued efforts in machine learning and automated tools for accessibility.

引用

页数：17

共 11 条

[1] Audio Description of Videos for People with Visual Disabilities
Facanha, Agebson Rocha
de Oliveira, Adonias Caetano
de Andrade Lima, Marcos Vinicius
Viana, Windson
Sanchez, Jaime
UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION: USERS AND CONTEXT DIVERSITY, PT III, 2016, 9739 : 505 - 515
[2] Applying Audio Description for Context Understanding of Surveillance Videos by People With Visual Impairments
Campos, Virginia Pinto
Goncalves, Luiz Marcos G.
de Araujo, Tiago Maritan U.
2017 14TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2017,
[3] Audio-Visual Isolated Words Recognition for Voice Dialogue System
Chaloupka, Josef
ANALYSIS OF VERBAL AND NONVERBAL COMMUNICATION AND ENACTMENT: THE PROCESSING ISSUES, 2011, 6800 : 88 - 94
[4] CineAD: a system for automated audio description script generation for the visually impaired
Campos, Virginia P.
de Araujo, Tiago M. U.
de Souza Filho, Guido L.
Goncalves, Luiz M. G.
UNIVERSAL ACCESS IN THE INFORMATION SOCIETY, 2020, 19 (01) : 99 - 111
[5] CineAD: a system for automated audio description script generation for the visually impaired
Virginia P. Campos
Tiago M. U. de Araújo
Guido L. de Souza Filho
Luiz M. G. Gonçalves
Universal Access in the Information Society, 2020, 19 : 99 - 111
[6] Avatar Therapy: an audio-visual dialogue system for treating auditory hallucinations
Huckvale, Mark
Leff, Julian
Williams, Geoff
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 392 - 396
[7] An automated framework for advertisement detection and removal from sports videos using audio-visual cues
Abeer TOHEED
Ali JAVED
Aun IRTAZA
Hassan DAWOOD
Hussain DAWOOD
Ahmed SALFAKEEH
Frontiers of Computer Science, 2021, (02) : 32 - 35
[8] An automated framework for advertisement detection and removal from sports videos using audio-visual cues
Abeer Toheed
Ali Javed
Aun Irtaza
Hassan Dawood
Hussain Dawood
Ahmed S. Alfakeeh
Frontiers of Computer Science, 2021, 15
[9] An automated framework for advertisement detection and removal from sports videos using audio-visual cues
Toheed, Abeer
Javed, Ali
Irtaza, Aun
Dawood, Hassan
Dawood, Hussain
Alfakeeh, Ahmed S.
FRONTIERS OF COMPUTER SCIENCE, 2021, 15 (02)
[10] Using automated reasoning in the design of an audio-visual communication system
Campos, JC
Harrison, MD
DESIGN, SPECIFICATION AND VERIFICATION OF INTERACTIVE SYSTEMS'99, 1999, : 167 - 188

← 1 2 →