Synthetic Data Generation for Semantic Segmentation of Lecture Videos

被引:0
作者
Davila, Kenny [1 ]
Xu, Fei [2 ]
Molina, James [1 ]
Setlur, Srirangaraj [2 ]
Govindaraju, Venu [2 ]
机构
[1] Univ Tecnol Ctr Amer, Tegucigalpa, Honduras
[2] Univ Buffalo, Buffalo, NY USA
来源
FRONTIERS IN HANDWRITING RECOGNITION, ICFHR 2022 | 2022年 / 13639卷
基金
美国国家科学基金会;
关键词
Semantic Segmentation; Lecture videos; Synthetic data; TEXT; HANDWRITTEN; MATH;
D O I
10.1007/978-3-031-21648-0_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Lecture videos have become a great resource for students and teachers. These videos are a vast information source, but most search engines only index them by their audio. To make these videos searchable by handwritten content, it is important to develop accurate methods for analyzing such content at scale. However, training deep neural networks to their full potential requires large-scale lecture video datasets. In this paper, we use synthetic data generation to improve binarization of lecture videos. We also use it to semantically segment pixels into background, speaker, text, mathematical expressions, and graphics. Our method for synthetic data generation renders content from multiple handwritten and typeset datasets, and blends it into real images using random tight layouts and the location of the people. In addition, we also propose a mixed data approach that trains networks on two detection tasks at once: person and text. Both binarization and semantic segmentation are carried out using fully convolutional neural networks with a typical encoder-decoder architecture and residual connections. Our experiments show that pretraining on both synthetic and mixed data leads to better performance than training with real data alone. While final results are promising, more work will be needed to reduce the domain shift between synthetic and real data. Our code and data are publicly available.
引用
收藏
页码:468 / 483
页数:16
相关论文
共 30 条
[1]   A Hybrid Method for Mathematical Expression Detection in Scientific Document Images [J].
Bui Hai Phong ;
Thang Manh Hoang ;
Thi-Lan Le .
IEEE ACCESS, 2020, 8 :83663-83684
[2]   VISA: A supervised approach to indexing video lectures with semantic annotations [J].
Cagliero, Luca ;
Canale, Lorenzo ;
Farinetti, Laura .
2019 IEEE 43RD ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2019, :226-235
[3]   Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition [J].
Ch'ng, Chee Kheng ;
Chan, Chee Seng .
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :935-942
[4]   FCN-LectureNet: Extractive Summarization of Whiteboard and Chalkboard Lecture Videos [J].
Davila, Kenny ;
Xu, Fei ;
Setlur, Srirangaraj ;
Govindaraju, Venu .
IEEE ACCESS, 2021, 9 :104469-104484
[5]  
Davila K, 2019, LECT NOTES COMPUT SC, V11437, P681, DOI 10.1007/978-3-030-15712-8_44
[6]   Visual Search Engine for Handwritten and Typeset Math in Lecture Videos and LATEX Notes [J].
Davila, Kenny ;
Zanibbi, Richard .
PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, :50-55
[7]   Whiteboard Video Summarization via Spatio-Temporal Conflict Minimization [J].
Davila, Kenny ;
Zanibbi, Richard .
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :355-362
[8]  
Deng D, 2018, AAAI CONF ARTIF INTE, P6773
[9]   ScanSSD-XYc: Faster Detection for Math Formulas [J].
Dey, Abhisek ;
Zanibbi, Richard .
DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021 WORKSHOPS, PT I, 2021, 12916 :91-96
[10]   Localizing and Recognizing Text in Lecture Videos [J].
Dutta, Kartik ;
Mathew, Minesh ;
Krishnan, Praveen ;
Jawahar, C. V. .
PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, :235-240