The Efficacy of Collaborative Authoring of Video Scene Descriptions

被引:6
作者
Natalie, Rosiana [1 ]
Loh, Jolene [1 ]
Tan, Huei Suen [1 ]
Tseng, Joshua [1 ]
Chan, Ian Luke Yi-Ren [1 ]
Jarjue, Ebrima H. [2 ]
Kacorri, Hernisa [2 ]
Hara, Kotaro [1 ]
机构
[1] Singapore Management Univ, Singapore, Singapore
[2] Univ Maryland, College Pk, MD USA
来源
23RD INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, ASSETS 2021 | 2021年
基金
新加坡国家研究基金会;
关键词
Scene description; visual impairment; video accessibility;
D O I
10.1145/3441852.3471201
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The majority of online video contents remain inaccessible to people with visual impairments due to the lack of audio descriptions to depict the video scenes. Content creators have traditionally relied on professionals to author audio descriptions, but their service is costly and not readily-available. We investigate the feasibility of creating more cost-effective audio descriptions that are also of high quality by involving novices. Specifically, we designed, developed, and evaluated ViScene, a web-based collaborative audio description authoring tool that enables a sighted novice author and a reviewer either sighted or blind to interact and contribute to scene descriptions (SDs)-text that can be transformed into audio through text-to-speech. Through a mixed-design study with N = 60 participants, we assessed the quality of SDs created by sighted novices with feedback from both sighted and blind reviewers. Our results showed that with ViScene novices could produce content that is Descriptive, Objective, Referable, and Clear at a cost of i.e., US$2.81pvm to US$5.48pvm, which is 54% to 96% lower than the professional service. However, the descriptions lacked in other quality dimensions (e.g., learning, a measure of how well an SD conveys the video's intended message). While professional audio describers remain the gold standard, for content creators who cannot afford it, ViScene offers a cost-effective alternative, ultimately leading to a more accessible medium.
引用
收藏
页数:15
相关论文
共 53 条
[31]   Pedestrian Detection with Wearable Cameras for the Blind: A Two-way Perspective [J].
Lee, Kyungjun ;
Sato, Daisuke ;
Asakawa, Saki ;
Kacorri, Hernisa ;
Asakawa, Chieko .
PROCEEDINGS OF THE 2020 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'20), 2020,
[32]   Rich Representations of Visual Content for Screen Reader Users [J].
Morris, Meredith Ringel ;
Johnson, Jazette ;
Bennett, Cynthia L. ;
Cutrell, Edward .
PROCEEDINGS OF THE 2018 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI 2018), 2018,
[33]   ViScene: A Collaborative Authoring Tool for Scene Descriptions in Videos [J].
Natalie, Rosiana ;
Jarjue, Ebrima ;
Kacorri, Hernisa ;
Hara, Kotaro .
22ND INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY (ASSETS '20), 2020,
[34]  
Netflix, 2020, Audio Description Style Guide v2.1.
[35]  
Note Able Player, 2020, Able Player: Fuly Accessible cross-browser HTML Media Note.
[36]  
Note[American Council of the Blind, 2021, Audio Description using the Web Speech API.
[37]  
Note[Audio Description Coalition, 2009, Standards for Audio Description and Code of 12] Note [Professional Conduct for Describers
[38]  
Note Mediakix, 2019, The Most Popular Types of YouTube Video.
[39]  
Packer J, 2015, J VISUAL IMPAIR BLIN, V109, P83
[40]  
Pauls Jamie, 2016, Audio Description Comes to Netflix.