SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers

被引：3

作者：

Ning, Zheng ^{[1
]}

Wimer, Brianna L. ^{[1
]}

Jiang, Kaiwen ^{[2
]}

Chen, Keyi ^{[2
]}

Ban, Jerrick ^{[1
]}

Tian, Yapeng ^{[3
]}

Zhao, Yuhang ^{[4
]}

Li, Toby Jia-Jun ^{[1
]}

机构：

[1] Univ Notre Dame, Notre Dame, IN 46556 USA

[2] Univ Calif San Diego, La Jolla, CA USA

[3] Univ Texas Dallas, Richardson, TX USA

[4] Univ Wisconsin Madison, Madison, WI USA

来源：

PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS (CHI 2024) | 2024年

关键词：

audio description; video consumption; accessibility; COGNITIVE APPROACH;

D O I：

10.1145/3613904.3642632

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video content. However, conventional static ADs can leave out detailed information in videos, impose a high mental load, neglect the diverse needs and preferences of BLV users, and lack immersion. To tackle these challenges, we introduce Spica, an AI-powered system that enables BLV users to interactively explore video content. Informed by prior empirical studies on BLV video consumption, Spica offers interactive mechanisms for supporting temporal navigation of frame captions and spatial exploration of objects within key frames. Leveraging an audio-visual machine learning pipeline, Spica augments existing ADs by adding interactivity, spatial sound effects, and individual object descriptions without requiring additional human annotation. Through a user study with 14 BLV participants, we evaluated the usability and usefulness of Spica and explored user behaviors, preferences, and mental models when interacting with augmented ADs.

引用

页数：18

共 79 条

[1] Video Description: A Survey of Methods, Datasets, and Evaluation Metrics
Aafaq, Nayyer
Mian, Ajmal
Liu, Wei
Gilani, Syed Zulqarnain
Shah, Mubarak
[J]. ACM COMPUTING SURVEYS, 2020, 52 (06)
[2] Agostinelli A., 2023, ARXIV
[3] [Anonymous], 2021, P 23 INT ACM SIGACCE, DOI DOI 10.1109/WCNC49053.2021.9417292
[4] [Anonymous], 2010, P 12 INT ACM SIGACCE
[5] A Literature Review of Video-Sharing Platform Research in HCI
Bartolome, Ava
Niu, Shuo
[J]. PROCEEDINGS OF THE 2023 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2023, 2023,
[6] LiveDescribe: Can Amateur Describers Create High-Quality Audio Description?
Branje, Carmen J.
Fels, Deborah I.
[J]. JOURNAL OF VISUAL IMPAIRMENT & BLINDNESS, 2012, 106 (03) : 154 - 165
[7] Accessible Voice Interfaces
Brewer, Robin N.
Lasecki, Walter
Findlater, Leah
Munteanu, Cosmin
Kaye, Joseph 'Jofish'
Weber, Astrid
[J]. COMPANION OF THE 2018 ACM CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING (CSCW'18), 2018, : 436 - 441
[8] Qualitative research and content validity: developing best practices based on science and experience
Brod, Meryl
Tesler, Laura E.
Christensen, Torsten L.
[J]. QUALITY OF LIFE RESEARCH, 2009, 18 (09) : 1263 - 1278
[9] Caldwell B., 2008, WWW CONSORTIUM W3C, V290, P1
[10] CineAD: a system for automated audio description script generation for the visually impaired
Campos, Virginia P.
de Araujo, Tiago M. U.
de Souza Filho, Guido L.
Goncalves, Luiz M. G.
[J]. UNIVERSAL ACCESS IN THE INFORMATION SOCIETY, 2020, 19 (01) : 99 - 111

← 1 2 3 4 5 6 7 8 →