AcMC2: Accelerated Markov Chain Monte Carlo for Probabilistic Models

被引:14
作者
Banerjee, Subho S. [1 ]
Kalbarczyk, Zbigniew T. [1 ]
Iyer, Ravishankar K. [1 ]
机构
[1] Univ Illinois, Champaign, IL 61820 USA
来源
TWENTY-FOURTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXIV) | 2019年
基金
美国国家科学基金会;
关键词
Accelerator; Markov Chain Monte Carlo; Probabilistic Graphical Models; Probabilistic Programming; ALGORITHMS; INFERENCE; ARCHITECTURES;
D O I
10.1145/3297858.3304019
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Probabilistic models (PMs) are ubiquitously used across a variety of machine learning applications. They have been shown to successfully integrate structural prior information about data and effectively quantify uncertainty to enable the development of more powerful, interpretable, and efficient learning algorithms. This paper presents AcMC2, a compiler that transforms PMs into optimized hardware accelerators (for use in FPGAs or ASICs) that utilize Markov chain Monte Carlo methods to infer and query a distribution of posterior samples from the model. The compiler analyzes statistical dependencies in the PM to drive several optimizations to maximally exploit the parallelism and data locality available in the problem. We demonstrate the use of AcMC2 to implement several learning and inference tasks on a Xilinx Virtex-7 FPGA. AcMC2-generated accelerators provide a 47 - 100x improvement in runtime performance over a 6-core IBM Power8 CPU and a 8 - 18x improvement over an NVIDIA K80 GPU. This corresponds to a 753 - 1600x improvement over the CPU and 248 - 463x over the GPU in performance-per-watt terms.
引用
收藏
页码:515 / 528
页数:14
相关论文
共 67 条
[51]  
MILCH B, 2004, ICML 2004 WORKSH STA
[52]   Particle MCMC algorithms and architectures for accelerating inference in state-space models [J].
Mingas, Grigorios ;
Bottolo, Leonardo ;
Bouganis, Christos-Savvas .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2017, 83 :413-433
[53]  
Mooij JM, 2010, J MACH LEARN RES, V11, P2169
[54]   Bro: a system for detecting network intruders in real-time [J].
Paxson, V .
COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 1999, 31 (23-24) :2435-2463
[55]   Plasticine: A Reconfigurable Architecture For Parallel Patterns [J].
Prabhakar, Raghu ;
Zhang, Yaqi ;
Koeplinger, David ;
Feldman, Matt ;
Zhao, Tian ;
Hadjis, Stefan ;
Pedram, Ardavan ;
Kozyrakis, Christos ;
Olukotun, Kunle .
44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, :389-402
[56]  
Recht Benjamin, 2011, Advances in neural information processing systems, V24
[57]   Probabilistic programming in Python']Python using PyMC3 [J].
Salvatier, John ;
Wiecki, Thomas, V ;
Fonnesbeck, Christopher .
PEERJ COMPUTER SCIENCE, 2016, 2016 (04)
[58]   Efficient parallelisation of Metropolis-Hastings algorithms using a prefetching approach [J].
Strid, Ingvar .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (11) :2814-2835
[59]   CAPI: A Coherent Accelerator Processor Interface [J].
Stuecheli, J. ;
Blaner, B. ;
Johns, C. R. ;
Siegel, M. S. .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2015, 59 (01)
[60]   Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures [J].
Suchard, Marc A. ;
Wang, Quanli ;
Chan, Cliburn ;
Frelinger, Jacob ;
Cron, Andrew ;
West, Mike .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2010, 19 (02) :419-438