Towards computer-vision software tools to increase production and accessibility of video description for people with vision loss

被引:0
作者
Langis Gagnon
Samuel Foucher
Maguelonne Heritier
Marc Lalonde
David Byrns
Claude Chapdelaine
James Turner
Suzanne Mathieu
Denis Laurendeau
Nath Tan Nguyen
Denis Ouellet
机构
[1] Computer Research Institute of Montreal (CRIM),R&D Department
[2] Université de Montréal,École de bibliothéconomie et des sciences de l’information
[3] Laval University,Department of Electrical and Computer Engineering
来源
Universal Access in the Information Society | 2009年 / 8卷
关键词
e-Accessibility; Video description; Video indexing; Computer vision;
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents the status of a R&D project targeting the development of computer-vision tools to assist humans in generating and rendering video description for people with vision loss. Three principal issues are discussed: (1) production practices, (2) needs of people with vision loss, and (3) current system design, core technologies and implementation. The paper provides the main conclusions of consultations with producers of video description regarding their practices and with end-users regarding their needs, as well as an analysis of described productions that lead to propose a video description typology. The current status of a prototype software is also presented (audio-vision manager) that uses many computer-vision technologies (shot transition detection, key-frame identification, key-face recognition, key-text spotting, visual motion, gait/gesture characterization, key-place identification, key-object spotting and image categorization) to automatically extract visual content, associate textual descriptions and add them to the audio track with a synthetic voice. A proof of concept is also briefly described for a first adaptive video description player which allows end users to select various levels of video description.
引用
收藏
页码:199 / 218
页数:19
相关论文
共 29 条
  • [1] Piety PJ(2004)The language system of audio description: an investigation as a discursive process J. Vis. Impair. Blind. 98 1-36
  • [2] Turner JM(2004)Using audio description for indexing moving images Knowl. Org. 31 222-230
  • [3] Colinet E(2006)A first person narrative approach to video description for animated comedy J. Vis. Impair. Blind. 100 295-305
  • [4] Fels DI(2000)Systematic evaluation of logical story unit segmentation IEEE. Trans. Multimed 4 492-499
  • [5] Udo JP(2003)Automated location matching in movies Comput. Vis. Image Underst. 42 236-264
  • [6] Diamond JE(2003)Latent Dirichlet allocation J Mach Learn Res 3 993-1022
  • [7] Diamond JI(2004)Two-dimensional PCA: a new approach to appearance-based face representation and recognition Trans Pattern Anal Mach Intell 26 131-137
  • [8] Vendrig J(2006)Diagonal principal component analysis for face recognition Pattern Recognit 39 140-142
  • [9] Worring M(1999)VideoOCR: indexing digital news libraries by recognition of superimposed caption ACM J Multimed Syst 7 385-395
  • [10] Schaffalitzky F.(2002)Localizing and segmenting text in images and videos IEEE Trans Circuits Syst Video Technol 12 256-268