Fast Simulation of Conway's Game of Life Using Bitwise Parallel Bulk Computation on a GPU

被引:2
作者
Fujita, Toru [1 ]
Nakano, Koji [1 ]
Ito, Yasuaki [1 ]
机构
[1] Hiroshima Univ, Dept Informat Engn, Kagamiyama 1-4-1, Higashihiroshima 7398527, Japan
关键词
Cellular automaton; parallel algorithms; CUDA; GPGPU; MEMORY MACHINE; OFFLINE PERMUTATION; ALGORITHMS;
D O I
10.1142/S0129054116500404
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Conway's Game of Life is the most well-known cellular automaton. The universe of the Came of Life is a 2-dimensional array of cells, each of which takes two possible states, alive or dead. The state of every cell is repeatedly updated according to those of eight neighbors. A. cell will be alive if exactly three neighbors are alive, or if it is alive and two neighbors are alive. The main contribution of this paper is to develop several acceleration techniques for simulating the Game of Life using a CPU as follows: (1) the states of 32/64 cells in 32/64-bit words (integers) and the next states are computed by the Bitwise Parallel Bulk Computation (BPBC) technique, (2) the states of cells stored in 2 words are updated at the same time by a thread, (3) warp shuffle instruction is used to directly transfer the current states stored in registers, and (4) multi-step simulation is performed to reduce the overhead of data transfer and invoking CUDA kernel. The experimental results show that, the performance of our GPU implementation using GeForce GTX TITAN X is 1350 x 10(9) updates per second for 16K-step simulation of 512K x 512K cells stored in the SSD. Since Intel Core i7 CPU using the same technique performs 13.4 x 10(9) updates per second, our CPU implementation for the Game of Life achieves a speedup factor of 100. Thus, these techniques work very efficiently on a CPU.
引用
收藏
页码:981 / 1003
页数:23
相关论文
共 30 条
  • [21] Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs
    Su, C. -C.
    Hsieh, C. -W.
    Smith, M. R.
    Jermy, M. C.
    Wu, J. -S.
    27TH INTERNATIONAL SYMPOSIUM ON RAREFIED GAS DYNAMICS, 2010, PTS ONE AND TWO, 2011, 1333 : 343 - +
  • [22] Simulation of disc-bulge-halo galaxies using parallel GPU based codes
    Veles, O.
    Berczik, P.
    Just, A.
    STAR CLUSTERS AND BLACK HOLES IN GALAXIES ACROSS COSMIC TIME, 2016, 10 (312): : 79 - 81
  • [23] The future of parallel computing: GPU vs CELL - General purpose planning against fast graphical computation architectures, which is the best solution for general purposes computation?
    Bianchi, Luca
    Gatti, Riccardo
    Lombardi, Luca
    GRAPP 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS THEORY AND APPLICATIONS, 2008, : 419 - 425
  • [24] Parallel SAX/GA for financial pattern matching using NVIDIA's GPU
    Bauto, Joao
    Canelas, Antonio
    Neves, Rui
    Horta, Nuno
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 105 : 77 - 88
  • [25] Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Computing on GPU with CUDA and ELLPACK-R Sparse Format
    Bustamam, Alhadi
    Burrage, Kevin
    Hamilton, Nicholas A.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (03) : 679 - 692
  • [26] Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU
    Shen, Wenfeng
    Wei, Daming
    Xu, Weimin
    Zhu, Xin
    Yuan, Shizhong
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2010, 100 (01) : 87 - 96
  • [27] Statically Optimal Binary Search Tree Computation Using Non-Serial Polyadic Dynamic Programming on GPU's
    Wani, Mohsin Altaf
    Ahmad, Manzoor
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2019, 11 (01) : 49 - 70
  • [28] Using GPU's to Accelerate Stencil-based Computation Kernels for the Development of Large Scale Scientific Applications on Heterogeneous Systems
    Tao, Jian
    Blazewicz, Marek
    Brandt, Steven R.
    ACM SIGPLAN NOTICES, 2012, 47 (08) : 287 - 288
  • [29] Pattern classification using updated fuzzy hyper-line segment neural network and it's GPU parallel implementation for large datasets using CUDA
    Dhabe, Priyadarshan
    Vyas, Prashant
    Ganeriwal, Devrat
    Pathak, Aditya
    2016 INTERNATIONAL CONFERENCE ON COMPUTING, ANALYTICS AND SECURITY TRENDS (CAST), 2016, : 24 - 29
  • [30] Simulation and experimental control of a 3-RPR parallel robot using optimal fuzzy controller and fast on/off solenoid valves based on the PWM wave
    Moezi, Seyed Alireza
    Rafeeyan, Mansour
    Zakeri, Ehsan
    Zare, Amin
    ISA TRANSACTIONS, 2016, 61 : 265 - 286