SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers

被引:3
作者
Ning, Zheng [1 ]
Wimer, Brianna L. [1 ]
Jiang, Kaiwen [2 ]
Chen, Keyi [2 ]
Ban, Jerrick [1 ]
Tian, Yapeng [3 ]
Zhao, Yuhang [4 ]
Li, Toby Jia-Jun [1 ]
机构
[1] Univ Notre Dame, Notre Dame, IN 46556 USA
[2] Univ Calif San Diego, La Jolla, CA USA
[3] Univ Texas Dallas, Richardson, TX USA
[4] Univ Wisconsin Madison, Madison, WI USA
来源
PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS (CHI 2024) | 2024年
关键词
audio description; video consumption; accessibility; COGNITIVE APPROACH;
D O I
10.1145/3613904.3642632
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video content. However, conventional static ADs can leave out detailed information in videos, impose a high mental load, neglect the diverse needs and preferences of BLV users, and lack immersion. To tackle these challenges, we introduce Spica, an AI-powered system that enables BLV users to interactively explore video content. Informed by prior empirical studies on BLV video consumption, Spica offers interactive mechanisms for supporting temporal navigation of frame captions and spatial exploration of objects within key frames. Leveraging an audio-visual machine learning pipeline, Spica augments existing ADs by adding interactivity, spatial sound effects, and individual object descriptions without requiring additional human annotation. Through a user study with 14 BLV participants, we evaluated the usability and usefulness of Spica and explored user behaviors, preferences, and mental models when interacting with augmented ADs.
引用
收藏
页数:18
相关论文
共 79 条
  • [1] Video Description: A Survey of Methods, Datasets, and Evaluation Metrics
    Aafaq, Nayyer
    Mian, Ajmal
    Liu, Wei
    Gilani, Syed Zulqarnain
    Shah, Mubarak
    [J]. ACM COMPUTING SURVEYS, 2020, 52 (06)
  • [2] Agostinelli A., 2023, ARXIV
  • [3] [Anonymous], 2021, P 23 INT ACM SIGACCE, DOI DOI 10.1109/WCNC49053.2021.9417292
  • [4] [Anonymous], 2010, P 12 INT ACM SIGACCE
  • [5] A Literature Review of Video-Sharing Platform Research in HCI
    Bartolome, Ava
    Niu, Shuo
    [J]. PROCEEDINGS OF THE 2023 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2023, 2023,
  • [6] LiveDescribe: Can Amateur Describers Create High-Quality Audio Description?
    Branje, Carmen J.
    Fels, Deborah I.
    [J]. JOURNAL OF VISUAL IMPAIRMENT & BLINDNESS, 2012, 106 (03) : 154 - 165
  • [7] Accessible Voice Interfaces
    Brewer, Robin N.
    Lasecki, Walter
    Findlater, Leah
    Munteanu, Cosmin
    Kaye, Joseph 'Jofish'
    Weber, Astrid
    [J]. COMPANION OF THE 2018 ACM CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING (CSCW'18), 2018, : 436 - 441
  • [8] Qualitative research and content validity: developing best practices based on science and experience
    Brod, Meryl
    Tesler, Laura E.
    Christensen, Torsten L.
    [J]. QUALITY OF LIFE RESEARCH, 2009, 18 (09) : 1263 - 1278
  • [9] Caldwell B., 2008, WWW CONSORTIUM W3C, V290, P1
  • [10] CineAD: a system for automated audio description script generation for the visually impaired
    Campos, Virginia P.
    de Araujo, Tiago M. U.
    de Souza Filho, Guido L.
    Goncalves, Luiz M. G.
    [J]. UNIVERSAL ACCESS IN THE INFORMATION SOCIETY, 2020, 19 (01) : 99 - 111