Segmentation and annotation of audiovisual recordings based on automated speech recognition

被引：0

作者：

Repp, Stephan ^{[1
]}

Waitelonis, Joerg ^{[2
]}

Sack, Harald ^{[2
]}

Meinel, Christoph ^{[1
,2
]}

机构：

[1] Hasso Plattner Inst Softwaresyst Tech GmbH, POB 900460, D-14440 Potsdam, Germany

[2] Univ Jena, D-07743 Jena, Germany

来源：

INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2007 | 2007年 / 4881卷

关键词：

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Searching multimedia data in particular audiovisual data is still a challenging task to fulfill. The number of digital video recordings has increased dramatically as recording technology has become more affordable and network infrastructure has become easy enough to provide download and streaming solutions. But, the accessibility and traceability of its content for further use is still rather limited. In our paper we are describing and evaluating a new approach to synchronizing auxiliary text-based material as, e. g. presentation slides with lecture video recordings. Our goal is to show that the tentative transliteration is sufficient for synchronization. Different approaches to synchronize textual material with deficient transliterations of lecture recordings are discussed and evaluated in this paper. Our evaluation data-set is based on different languages and various speakers' recordings.

引用

页码：620 / +

页数：3

共 50 条

[1] Dynamic browsing of audiovisual lecture recordings based on automated speech recognition
Repp, Stephan
Gross, Andreas
Meinel, Christoph
INTELLIGENT TUTORING SYSTEM, PROCEEDINGS, 2008, 5091 : 662 - 664
[2] Automatic Acoustic Segmentation for Speech Recognition on Broadcast Recordings
Peng, Gang
Hwang, Mei-Yuh
Ostendorf, Mari
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2580 - 2583
[3] ALIGNING AUDIOVISUAL FEATURES FOR AUDIOVISUAL SPEECH RECOGNITION
Tao, Fei
Busso, Carlos
2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
[4] The segmentation of multi-channel meeting recordings for automatic speech recognition
Dines, John
Vepa, Jithendra
Hain, Thomas
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1213 - +
[5] Audiovisual Annotation Procedure for Multi-view Field Recordings
Guyot, Patrice
Malon, Thierry
Roman-Jimenez, Geoffrey
Chambon, Sylvie
Charvillat, Vincent
Crouzil, Alain
Peninou, Andre
Pinquier, Julien
Sedes, Florence
Senac, Christine
MULTIMEDIA MODELING (MMM 2019), PT I, 2019, 11295 : 399 - 410
[6] Audiovisual speech recognition based on a deep convolutional neural network
Rudregowda S.
Patilkulkarni S.
Ravi V.
H.L. G.
Krichen M.
Data Science and Management, 2024, 7 (01): : 25 - 34
[7] Fusion Architectures for Word-based Audiovisual Speech Recognition
Wand, Michael
Schmidhuber, Jurgen
INTERSPEECH 2020, 2020, : 3491 - 3495
[8] AVATAR: Unconstrained Audiovisual Speech Recognition
Gabeur, Valentin
Seo, Paul Hongsuck
Nagrani, Arsha
Sun, Chen
Alahari, Karteek
Schmid, Cordelia
INTERSPEECH 2022, 2022, : 2818 - 2822
[9] Audiovisual speech recognition: A review and forecast
Xia, Linlin
Chen, Gang
Xu, Xun
Cui, Jiashuo
Gao, Yiping
INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2020, 17 (06)
[10] Stream-based classification and segmentation of speech events in meeting recordings
Ogata, Jun
Asano, Futoshi
MULTIMEDIA CONTENT REPRESENTATION, CLASSIFICATION AND SECURITY, 2006, 4105 : 793 - 800

← 1 2 3 4 5 →