Are Deep Neural Networks the Best Choice for Modeling Source Code?

被引：206

作者：

Hellendoorn, Vincent J. ^{[1
]}

Devanbu, Premkumar ^{[1
]}

机构：

[1] Univ Calif Davis, Comp Sci Dept, Davis, CA 95616 USA

来源：

ESEC/FSE 2017: PROCEEDINGS OF THE 2017 11TH JOINT MEETING ON FOUNDATIONS OF SOFTWARE ENGINEERING | 2017年

基金：

美国国家科学基金会;

关键词：

naturalness; language models; software tools;

D O I：

10.1145/3106237.3106290

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Current statistical language modeling techniques, including deep-learning based models, have proven to be quite effective for source code. We argue here that the special properties of source code can be exploited for further improvements. In this work, we enhance established language modeling approaches to handle the special challenges of modeling source code, such as: frequent changes, larger, changing vocabularies, deeply nested scopes, etc. We present a fast, nested language modeling toolkit specifically designed for software, with the ability to add & remove text, and mix & swap out many models. Specifically, we improve upon prior cache-modeling work and present a model with a much more expansive, multi-level notion of locality that we show to be well-suited for modeling software. We present results on varying corpora in comparison with traditional N-gram, as well as RNN, and LSTM deep-learning language models, and release all our source code for public use. Our evaluations suggest that carefully adapting N-gram models for source code can yield performance that surpasses even RNN and LSTM based deep-learning models.

引用

页码：763 / 773

页数：11

共 38 条

[1]

ABADI M, 2015, TENSORFLOW LARGE SCA, DOI DOI 10.48550/ARXIV.1605.08695

[2]

Allamanis M, 2016, PR MACH LEARN RES, V48

[3]

Allamanis M, 2015, PR MACH LEARN RES, V37, P2123

[4] Learning Natural Coding Conventions [J].

Allamanis, Miltiadis ;

Barr, Earl T. ;

Bird, Christian ;

Sutton, Charles .

22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, :281-293

[5]

Allamanis M, 2013, IEEE WORK CONF MIN S, P207, DOI 10.1109/MSR.2013.6624029

[6] Graph-based Statistical Language Model for Code [J].

Anh Tuan Nguyen ;

Nguyen, Tien N. .

2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 1, 2015, :858-868

[7]

[Anonymous], 2016, arXiv

[8]

[Anonymous], 2016, arXiv preprint arXiv:1602.02410

[9]

Bielik P, 2016, PR MACH LEARN RES, V48

[10] Learning from Examples to Improve Code Completion Systems [J].

Bruch, Marcel ;

Monperrus, Martin ;

Mezini, Mira .

7TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2009, :213-222

← 1 2 3 4 →