Segmentation and annotation of audiovisual recordings based on automated speech recognition

被引:0
|
作者
Repp, Stephan [1 ]
Waitelonis, Joerg [2 ]
Sack, Harald [2 ]
Meinel, Christoph [1 ,2 ]
机构
[1] Hasso Plattner Inst Softwaresyst Tech GmbH, POB 900460, D-14440 Potsdam, Germany
[2] Univ Jena, D-07743 Jena, Germany
来源
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2007 | 2007年 / 4881卷
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Searching multimedia data in particular audiovisual data is still a challenging task to fulfill. The number of digital video recordings has increased dramatically as recording technology has become more affordable and network infrastructure has become easy enough to provide download and streaming solutions. But, the accessibility and traceability of its content for further use is still rather limited. In our paper we are describing and evaluating a new approach to synchronizing auxiliary text-based material as, e. g. presentation slides with lecture video recordings. Our goal is to show that the tentative transliteration is sufficient for synchronization. Different approaches to synchronize textual material with deficient transliterations of lecture recordings are discussed and evaluated in this paper. Our evaluation data-set is based on different languages and various speakers' recordings.
引用
收藏
页码:620 / +
页数:3
相关论文
共 50 条
  • [1] Dynamic browsing of audiovisual lecture recordings based on automated speech recognition
    Repp, Stephan
    Gross, Andreas
    Meinel, Christoph
    INTELLIGENT TUTORING SYSTEM, PROCEEDINGS, 2008, 5091 : 662 - 664
  • [2] Automatic Acoustic Segmentation for Speech Recognition on Broadcast Recordings
    Peng, Gang
    Hwang, Mei-Yuh
    Ostendorf, Mari
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2580 - 2583
  • [3] ALIGNING AUDIOVISUAL FEATURES FOR AUDIOVISUAL SPEECH RECOGNITION
    Tao, Fei
    Busso, Carlos
    2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
  • [4] The segmentation of multi-channel meeting recordings for automatic speech recognition
    Dines, John
    Vepa, Jithendra
    Hain, Thomas
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1213 - +
  • [5] Audiovisual Annotation Procedure for Multi-view Field Recordings
    Guyot, Patrice
    Malon, Thierry
    Roman-Jimenez, Geoffrey
    Chambon, Sylvie
    Charvillat, Vincent
    Crouzil, Alain
    Peninou, Andre
    Pinquier, Julien
    Sedes, Florence
    Senac, Christine
    MULTIMEDIA MODELING (MMM 2019), PT I, 2019, 11295 : 399 - 410
  • [6] Audiovisual speech recognition based on a deep convolutional neural network
    Rudregowda S.
    Patilkulkarni S.
    Ravi V.
    H.L. G.
    Krichen M.
    Data Science and Management, 2024, 7 (01): : 25 - 34
  • [7] Fusion Architectures for Word-based Audiovisual Speech Recognition
    Wand, Michael
    Schmidhuber, Jurgen
    INTERSPEECH 2020, 2020, : 3491 - 3495
  • [8] AVATAR: Unconstrained Audiovisual Speech Recognition
    Gabeur, Valentin
    Seo, Paul Hongsuck
    Nagrani, Arsha
    Sun, Chen
    Alahari, Karteek
    Schmid, Cordelia
    INTERSPEECH 2022, 2022, : 2818 - 2822
  • [9] Audiovisual speech recognition: A review and forecast
    Xia, Linlin
    Chen, Gang
    Xu, Xun
    Cui, Jiashuo
    Gao, Yiping
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2020, 17 (06)
  • [10] Stream-based classification and segmentation of speech events in meeting recordings
    Ogata, Jun
    Asano, Futoshi
    MULTIMEDIA CONTENT REPRESENTATION, CLASSIFICATION AND SECURITY, 2006, 4105 : 793 - 800