Building Statistical Language Models of Code

被引:0
|
作者
Schulam, Peter [1 ]
Rosenfeld, Roni [1 ]
Devanbu, Premkumar [2 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
[2] Univ Calif Davis, Dept Comp Sci, Davis, CA USA
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present the Source Code Statistical Language Model data analysis pattern. Statistical language models have been an enabling tool for a wide array of important language technologies. Speech recognition, machine translation, and document summarization (to name a few) all rely on statistical language models to assign probability estimates to natural language utterances or sentences. In this data analysis pattern, we describe the process of building n-gram language models over software source files. We hope that by introducing the empirical software engineering community to best practices that have been established over the years in research for natural languages, statistical language models can become a tool that SE researchers are able to use to explore new research directions.
引用
收藏
页码:1 / 3
页数:3
相关论文
共 50 条
  • [1] Code Completion with Statistical Language Models
    Raychev, Veselin
    Vechev, Martin
    Yahav, Eran
    ACM SIGPLAN NOTICES, 2014, 49 (06) : 419 - 428
  • [2] A Possibilistic Approach for Building Statistical Language Models
    Momtazi, Saeedeh
    Sameti, Hossein
    2009 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2009, : 1014 - +
  • [3] Structural Language Models of Code
    Alon, Uri
    Sadaka, Roy
    Levy, Omer
    Yahav, Eran
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [4] Adapting Code-Switching Language Models with Statistical-Based Text Augmentation
    Prachaseree, Chaiyasait
    Gupta, Kshitij
    Thi Nga Ho
    Peng, Yizhou
    Tun, Kyaw Zin
    Chng, Eng Siong
    Chalapthi, G. S. S.
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT II, 2023, 13996 : 310 - 322
  • [5] Automatic inference of models for statistical code compression
    Fraser, CW
    ACM SIGPLAN NOTICES, 1999, 34 (05) : 242 - 246
  • [6] Automatic inference of models for statistical code compression
    Fraser, Christopher W.
    Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 1999, : 242 - 246
  • [7] Graph-based Statistical Language Model for Code
    Anh Tuan Nguyen
    Nguyen, Tien N.
    2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 1, 2015, : 858 - 868
  • [8] SLAMPA: Recommending Code Snippets with Statistical Language Model
    Zhou, Shufan
    Zhong, Hao
    Shen, Beijun
    2018 25TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2018), 2018, : 79 - 88
  • [9] An Automated Fire Code Compliance Checking Jointly Using Building Information Models and Natural Language Processing
    Wang, Yukang
    Liu, Yue
    Cai, Haozhe
    Wang, Jia
    Zhou, Xiaoping
    FIRE-SWITZERLAND, 2023, 6 (09):
  • [10] Can Machines Read Coding Manuals Yet? - A Benchmark for Building Better Language Models for Code Understanding
    Abdelaziz, Ibrahim
    Dolby, Julian
    McCusker, Jamie
    Srinivas, Kavitha
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 4415 - 4423