FCN-LectureNet: Extractive Summarization of Whiteboard and Chalkboard Lecture Videos

被引:16
作者
Davila, Kenny [1 ]
Xu, Fei [2 ]
Setlur, Srirangaraj [2 ]
Govindaraju, Venu [2 ]
机构
[1] Univ Tecnol Centroamer, Fac Ingn, Tegucigalpa 11101, Honduras
[2] Univ Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
基金
美国国家科学基金会;
关键词
Videos; Feature extraction; Tutorials; Training; Protocols; Indexing; Handwriting recognition; Fully convolutional networks; handwritten text detection; image binarization; lecture videos; video summarization; NETWORK; TEXT;
D O I
10.1109/ACCESS.2021.3099427
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recording and sharing of educational or lecture videos has increased in recent years. Within these recordings, we find a large number of math-oriented lectures and tutorials which attract students of all levels. Many of the topics covered by these recordings are better explained using handwritten content on whiteboards or chalkboards. Hence, we find large numbers of lecture videos that feature the instructor writing on a surface. In this work, we propose a novel method for extraction and summarization of the handwritten content found in such videos. Our method is based on a fully convolutional network, FCN-LectureNet, which can extract the handwritten content from the video as binary images. These are further analyzed to identify the unique and stable units of content to produce a spatial-temporal index of handwritten content. A signal which approximates content deletion events is then built using information from the spatial-temporal index. The peaks of this signal are used to create temporal segments of the lecture based on the notion that sub-topics change when large portions of content are deleted. Finally, we use these segments to create an extractive summary of the handwritten content based on key-frames. This will facilitate content-based search and retrieval of these lecture videos. In this work, we also extend the AccessMath dataset to create a novel dataset for benchmarking of lecture video summarization called LectureMath. Our experiments on both datasets show that our novel method can outperform existing methods especially on the larger and more challenging dataset. Our code and data are publicly available.
引用
收藏
页码:104469 / 104484
页数:16
相关论文
共 53 条
[1]  
Andra Muhammad Bagus, 2019, 2019 International Conference of Artificial Intelligence and Information Technology (ICAIIT). Proceedings, P54, DOI 10.1109/ICAIIT.2019.8834514
[2]  
Apostolidis E., ARXIV210106072
[3]   Unsupervised Video Summarization via Attention-Driven Adversarial Learning [J].
Apostolidis, Evlampios ;
Adamantidou, Eleni ;
Metsai, Alexandros, I ;
Mezaris, Vasileios ;
Patras, Ioannis .
MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 :492-504
[4]   Automatic Detection of Handwritten Texts from Video Frames of Lectures [J].
Banerjee, Purnendu ;
Bhattacharya, Ujjwal ;
Chaudhuri, Bidyut B. .
2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, :627-632
[5]  
Chand D., P IEEE 19 WORLD S AP P IEEE 19 WORLD S AP, P299
[6]   Summarization of visual content in instructional videos [J].
Choudary, Chekuri ;
Liu, Tiecheng .
IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (07) :1443-1455
[7]  
Das A., ARXIV201207589
[8]   Visual Search Engine for Handwritten and Typeset Math in Lecture Videos and LATEX Notes [J].
Davila, Kenny ;
Zanibbi, Richard .
PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, :50-55
[9]   Whiteboard Video Summarization via Spatio-Temporal Conflict Minimization [J].
Davila, Kenny ;
Zanibbi, Richard .
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :355-362
[10]  
Davila K, 2013, WEST NY IMAGE PROCES, P14, DOI 10.1109/WNYIPW.2013.6890981