Performance Comparision of TPU, GPU, CPU on Google Colaboratory over Distributed Deep Learning

被引：14

作者：

Kimm, Haklin ^{[1
]}

Paik, Incheon ^{[2
]}

Kimm, Hanke ^{[3
]}

机构：

[1] East Stroudsburg Univ, Dept Comp Sci, East Stroudsburg, PA 18301 USA

[2] Univ Aizu, Sch Comp Sci & Engn, Fukushima, Japan

[3] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY 11794 USA

来源：

2021 IEEE 14TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC 2021) | 2021年

关键词：

Distributed Deep Learning; Google Colab; TPU and GPU; Human Activity Recognition; Bidirectional LSTM;

D O I：

10.1109/MCSoC51149.2021.00053

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep Learning models need massive amounts compute powers and tend to improve performance running on special purpose processors accelerators designed to speed up compute-intensive applications. The accelerators like Tensor Processing Units (TPUs) and Graphics Processing Units (GPUs) are widely used as deep learning hardware platforms which can often achieve better performance than CPUs, with their massive parallel execution resources and high memory bandwidth. Google Colaboratory known as Colab is a cloud service based on Jupyter Notebook that allows the users to write and execute mostly Python in a browser and admits free access to TPUs and GPUs without extra configuration need, which are widely available cloud hardware platforms. In this paper, we present a through comparison of the hardware platforms on Google Colab that is benchmarked with Distributed Bidirectional Long Short-Term Memory (dBLSTM) models upon the number of layers, the number of units each layer, and the numbers of input and output units the datasets. Human Activity Recognition (HAR) data from UCI machine-learning library have been applied to the proposed distributed bidirectional LSTM model to find the performance, strengths, bottlenecks of the hardware platforms of TPU, GPU and CPU upon hyperparameters, execution time, and evaluation metrics: accuracy, precision, recall and F1 score.

引用

页码：312 / 319

页数：8

共 16 条

[1]

Abadi M., 2016, Large-scale machine learning on heterogeneous systems, P265

[2]

Ben-Nun T., 2018, CoRR, abs/1802.09941

[3] Performance Analysis of Google Colaboratory as a Tool for Accelerating Deep Learning Applications [J].

Carneiro, Tiago ;

Medeiros Da Nobrega, Raul Victor ;

Nepomuceno, Thiago ;

Bian, Gui-Bin ;

De Albuquerque, Victor Hugo C. ;

Reboucas Filho, Pedro Pedrosa .

IEEE ACCESS, 2018, 6 :61677-61685

[4]

Cheng X., 2020, ARXIV200603259

[5]

Chollet F, 2019, DEEP LEARNING PYTHON

[6]

Dean J., 2017, Hot Chips

[7]

Debnath B., 2020, ARXIV200914326

[8]

Fazli M., 2020, ARXIV201016052

[9]

Gao W., 2020, ARXIV200614435

[10]

Goodfellow Ian, 2019, DEEP LEARNING

← 1 2 →