Synthetic Data Generation for Semantic Segmentation of Lecture Videos

被引：0

作者：

Davila, Kenny ^{[1
]}

Xu, Fei ^{[2
]}

Molina, James ^{[1
]}

Setlur, Srirangaraj ^{[2
]}

Govindaraju, Venu ^{[2
]}

机构：

[1] Univ Tecnol Ctr Amer, Tegucigalpa, Honduras

[2] Univ Buffalo, Buffalo, NY USA

来源：

FRONTIERS IN HANDWRITING RECOGNITION, ICFHR 2022 | 2022年 / 13639卷

基金：

美国国家科学基金会;

关键词：

Semantic Segmentation; Lecture videos; Synthetic data; TEXT; HANDWRITTEN; MATH;

D O I：

10.1007/978-3-031-21648-0_32

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Lecture videos have become a great resource for students and teachers. These videos are a vast information source, but most search engines only index them by their audio. To make these videos searchable by handwritten content, it is important to develop accurate methods for analyzing such content at scale. However, training deep neural networks to their full potential requires large-scale lecture video datasets. In this paper, we use synthetic data generation to improve binarization of lecture videos. We also use it to semantically segment pixels into background, speaker, text, mathematical expressions, and graphics. Our method for synthetic data generation renders content from multiple handwritten and typeset datasets, and blends it into real images using random tight layouts and the location of the people. In addition, we also propose a mixed data approach that trains networks on two detection tasks at once: person and text. Both binarization and semantic segmentation are carried out using fully convolutional neural networks with a typical encoder-decoder architecture and residual connections. Our experiments show that pretraining on both synthetic and mixed data leads to better performance than training with real data alone. While final results are promising, more work will be needed to reduce the domain shift between synthetic and real data. Our code and data are publicly available.

引用

页码：468 / 483

页数：16

共 50 条

[1] Automatic Semantic Segmentation and Annotation of MOOC Lecture Videos
Das, Ananda
Das, Partha Pratim
DIGITAL LIBRARIES AT THE CROSSROADS OF DIGITAL INFORMATION FOR THE FUTURE, ICADL 2019, 2019, 11853 : 181 - 188
[2] LectureKhoj: Automatic Tagging and Semantic Segmentation of Online Lecture Videos
Baidya , Esha
Goel, Sanjay
2014 SEVENTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2014, : 37 - 43
[3] SYNTHETIC DATA GENERATION AND TESTING FOR THE SEMANTIC SEGMENTATION OF HERITAGE BUILDINGS
Pellis, E.
Masiero, A.
Grussenmeyer, P.
Betti, M.
Tucci, G.
29TH CIPA SYMPOSIUM DOCUMENTING, UNDERSTANDING, PRESERVING CULTURAL HERITAGE. HUMANITIES AND DIGITAL TECHNOLOGIES FOR SHAPING THE FUTURE, VOL. 48-M-2, 2023, : 1189 - 1196
[4] Synthetic Data for Semantic Segmentation in Underwater Imagery
Pergeorelis, Michael
Bazik, Maxim
Saponaro, Philip
Kim, Joong
Kambhamettu, Chandra
2022 OCEANS HAMPTON ROADS, 2022,
[5] Exploring the effects of synthetic data generation: a case study on autonomous driving for semantic segmentation
Silva, Manuel
Seoane, Antonio
Mures, Omar A.
Lopez, Antonio M.
Iglesias-Guitian, Jose A.
VISUAL COMPUTER, 2025,
[6] Semantic Segmentation in Compressed Videos
Li, Ang
Lu, Yiwei
Wang, Yang
2019 IEEE 21ST INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP 2019), 2019,
[7] Semantic indexing for recorded educational lecture videos
Repp, S
Meinel, C
FOURTH ANNUAL IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS, PROCEEDINGS, 2006, : 240 - +
[8] Synthetic Data for Sentinel-2 Semantic Segmentation
Clabaut, Etienne
Foucher, Samuel
Bouroubi, Yacine
Germain, Mickael
REMOTE SENSING, 2024, 16 (05)
[9] Semantic analysis for topical segmentation of videos
Park, Youngja
Li, Ying
ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 161 - +
[10] Semantic Co-segmentation in Videos
Tsai, Yi-Hsuan
Zhong, Guangyu
Yang, Ming-Hsuan
COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 760 - 775

← 1 2 3 4 5 →