Accurately Predicting the Location of Code Fragments in Programming Video Tutorials Using Deep Learning

被引：17

作者：

Alahmadi, Mohammad ^{[1
]}

Hassel, Jonathan ^{[1
]}

Parajuli, Biswas ^{[1
]}

Haiduc, Sonia ^{[1
]}

Kumar, Piyush ^{[1
]}

机构：

[1] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA

来源：

PROMISE'18: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON PREDICTIVE MODELS AND DATA ANALYTICS IN SOFTWARE ENGINEERING | 2018年

基金：

美国国家科学基金会;

关键词：

Programming video tutorials; Software documentation; Source code; Deep learning; Video mining;

D O I：

10.1145/3273934.3273935

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Background: Video programming tutorials are becoming a popular resource for developers looking for quick answers to a specific programming problem or trying to learn a programming topic in more depth. Since the most important source of information for developers in many such videos is source code, it is important to be able to accurately extract this code from the screen, such that developers can easily integrate it into their programs. Aims: Our main goal is to facilitate the accurate and noise-free extraction of code appearing in programming video tutorials. In particular, in this paper we aim to accurately predict the location of source code in video frames. This will allow for the dramatic reduction of noise when using extraction techniques such as Optical Character Recognition, which could otherwise extract a large amount of irrelevant text (e.g., text found in menu items, package hierarchy, etc.). Method: We propose an approach using a deep Convolutional Neural Network (CNN) to predict the bounding box of fully-visible code sections in video frames. To evaluate our approach, we collected a set of 150 Java programming tutorials, having more than 82K frames in total. A sample of 4,000 frames from these videos were then manually annotated with the code bounding box location and used as the ground truth in an experiment evaluating our approach. Results: The results of the evaluation show that our approach is able to successfully predict the code bounding box in a given frame with 92% accuracy. Conclusions: Our CNN-based approach is able to accurately predict the location of source code within the frames of programming video tutorials.

引用

页码：2 / 11

页数：10

共 25 条

[1]

[Anonymous], 1985, PARALLEL DISTRIBUTED

[2]

[Anonymous], 2016, Deep learning. vol

[3] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[4]

Brandt J, 2009, CHI2009: PROCEEDINGS OF THE 27TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, VOLS 1-4, P1589

[5]

Ellmann Mathias., 2017, Proceedings of the 3rd ACM SIGSOFT International Workshop on Software Analytics, SWAN 2017, P1, DOI [DOI 10.1145/3121257.3121260, 10.1145/3121257.3121260]

[6] Text Retrieval-based Tagging of Software Engineering Video Tutorials [J].

Escobar-Avila, Javier ;

Parra, Esteban ;

Haiduc, Sonia .

PROCEEDINGS OF THE 2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C 2017), 2017, :341-343

[7] Task-specific information retrieval systems for software engineers [J].

Grzywaczewski, Adam ;

Iqbal, Rahat .

JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2012, 78 (04) :1204-1218

[8]

Hartigan J. A., 1979, Applied Statistics, V28, P100, DOI 10.2307/2346830

[9] Deep Convolutional Neural Networks for Hyperspectral Image Classification [J].

Hu, Wei ;

Huang, Yangyu ;

Wei, Li ;

Zhang, Fan ;

Li, Hengchao .

JOURNAL OF SENSORS, 2015, 2015

[10] Object recognition with gradient-based learning [J].

LeCun, Y ;

Haffner, P ;

Bottou, L ;

Bengio, Y .

SHAPE, CONTOUR AND GROUPING IN COMPUTER VISION, 1999, 1681 :319-345

← 1 2 3 →