Synthetic Data Generation for Semantic Segmentation of Lecture Videos

被引:0
|
作者
Davila, Kenny [1 ]
Xu, Fei [2 ]
Molina, James [1 ]
Setlur, Srirangaraj [2 ]
Govindaraju, Venu [2 ]
机构
[1] Univ Tecnol Ctr Amer, Tegucigalpa, Honduras
[2] Univ Buffalo, Buffalo, NY USA
来源
FRONTIERS IN HANDWRITING RECOGNITION, ICFHR 2022 | 2022年 / 13639卷
基金
美国国家科学基金会;
关键词
Semantic Segmentation; Lecture videos; Synthetic data; TEXT; HANDWRITTEN; MATH;
D O I
10.1007/978-3-031-21648-0_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Lecture videos have become a great resource for students and teachers. These videos are a vast information source, but most search engines only index them by their audio. To make these videos searchable by handwritten content, it is important to develop accurate methods for analyzing such content at scale. However, training deep neural networks to their full potential requires large-scale lecture video datasets. In this paper, we use synthetic data generation to improve binarization of lecture videos. We also use it to semantically segment pixels into background, speaker, text, mathematical expressions, and graphics. Our method for synthetic data generation renders content from multiple handwritten and typeset datasets, and blends it into real images using random tight layouts and the location of the people. In addition, we also propose a mixed data approach that trains networks on two detection tasks at once: person and text. Both binarization and semantic segmentation are carried out using fully convolutional neural networks with a typical encoder-decoder architecture and residual connections. Our experiments show that pretraining on both synthetic and mixed data leads to better performance than training with real data alone. While final results are promising, more work will be needed to reduce the domain shift between synthetic and real data. Our code and data are publicly available.
引用
收藏
页码:468 / 483
页数:16
相关论文
共 50 条
  • [1] Automatic Semantic Segmentation and Annotation of MOOC Lecture Videos
    Das, Ananda
    Das, Partha Pratim
    DIGITAL LIBRARIES AT THE CROSSROADS OF DIGITAL INFORMATION FOR THE FUTURE, ICADL 2019, 2019, 11853 : 181 - 188
  • [2] LectureKhoj: Automatic Tagging and Semantic Segmentation of Online Lecture Videos
    Baidya , Esha
    Goel, Sanjay
    2014 SEVENTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2014, : 37 - 43
  • [3] SYNTHETIC DATA GENERATION AND TESTING FOR THE SEMANTIC SEGMENTATION OF HERITAGE BUILDINGS
    Pellis, E.
    Masiero, A.
    Grussenmeyer, P.
    Betti, M.
    Tucci, G.
    29TH CIPA SYMPOSIUM DOCUMENTING, UNDERSTANDING, PRESERVING CULTURAL HERITAGE. HUMANITIES AND DIGITAL TECHNOLOGIES FOR SHAPING THE FUTURE, VOL. 48-M-2, 2023, : 1189 - 1196
  • [4] Synthetic Data for Semantic Segmentation in Underwater Imagery
    Pergeorelis, Michael
    Bazik, Maxim
    Saponaro, Philip
    Kim, Joong
    Kambhamettu, Chandra
    2022 OCEANS HAMPTON ROADS, 2022,
  • [5] Exploring the effects of synthetic data generation: a case study on autonomous driving for semantic segmentation
    Silva, Manuel
    Seoane, Antonio
    Mures, Omar A.
    Lopez, Antonio M.
    Iglesias-Guitian, Jose A.
    VISUAL COMPUTER, 2025,
  • [6] Semantic Segmentation in Compressed Videos
    Li, Ang
    Lu, Yiwei
    Wang, Yang
    2019 IEEE 21ST INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP 2019), 2019,
  • [7] Semantic indexing for recorded educational lecture videos
    Repp, S
    Meinel, C
    FOURTH ANNUAL IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS, PROCEEDINGS, 2006, : 240 - +
  • [8] Synthetic Data for Sentinel-2 Semantic Segmentation
    Clabaut, Etienne
    Foucher, Samuel
    Bouroubi, Yacine
    Germain, Mickael
    REMOTE SENSING, 2024, 16 (05)
  • [9] Semantic analysis for topical segmentation of videos
    Park, Youngja
    Li, Ying
    ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 161 - +
  • [10] Semantic Co-segmentation in Videos
    Tsai, Yi-Hsuan
    Zhong, Guangyu
    Yang, Ming-Hsuan
    COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 760 - 775