Training Google Neural Machine Translation on an Intel CPU Cluster

被引:1
作者
Kalamkar, Dhiraj D. [1 ]
Banerjee, Kunal [1 ]
Srinivasan, Sudarshan [1 ]
Sridharan, Srinivas [1 ]
Georganas, Evangelos [2 ]
Smorkalov, Mikhail E. [3 ]
Xu, Cong [3 ]
Heinecke, Alexander [2 ]
机构
[1] Intel Corp, Parallel Comp Lab, Bangalore, Karnataka, India
[2] Intel Corp, Parallel Comp Lab, Santa Clara, CA USA
[3] Intel Corp, Intel Arch Graph & Sw, Nizhnii Novgorod, Russia
来源
2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) | 2019年
关键词
machine translation; recurrent neural networks; TensorFlow; LIBXSMM; Intel architecture;
D O I
10.1109/cluster.2019.8891019
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Google's neural machine translation (GNMT) is state-of-the-art recurrent neural network (RNN/LSTM) based language translation application. It is computationally more demanding than well-studied convolutional neural networks (CNNs). Also, in contrast to CNNs, RNNs heavily mix compute and memory bound layers which requires careful tuning on a latency machine to optimally use fast on-die memories for best single processor performance. Additionally, due to massive compute demand, it is essential to distribute the entire workload among several processors and even compute nodes. To the best of our knowledge, this is the first work which attempts to scale this application on an Intel CPU cluster. Our CPU-based GNMT optimization, the first of its kind, achieves this by the following steps: (i) we choose a monolithic long short-term memory (LSTM) cell implementation from LIBXSMM library (specifically tuned for CPUs) and integrate it into TensorFlow, (ii) we modify GNMT code to use fused time step LSTM op for the encoding stage, (iii) we combine Horovod and Intel MLSL scaling libraries for improved performance on multiple nodes, and (iv) we extend the bucketing logic for grouping similar length sentences together to multiple nodes for achieving load balance across multiple ranks. In summary, we demonstrate that due to these changes we are able to outperform Google's stock CPU-based GNMT implementation by similar to 2x on single node and potentially enable more than 25x speedup using 16 node CPU cluster.
引用
收藏
页码:193 / 202
页数:10
相关论文
共 50 条
  • [31] MTIL2017: Machine Translation Using Recurrent Neural Network on Statistical Machine Translation
    Mahata, Sainik Kumar
    Das, Dipankar
    Bandyopadhyay, Sivaji
    JOURNAL OF INTELLIGENT SYSTEMS, 2019, 28 (03) : 447 - 453
  • [32] Neural Machine Translation Based on Back-Translation for Multilingual Translation Evaluation Task
    Lai, Siyu
    Yang, Yueting
    Xu, Jin'an
    Chen, Yufeng
    Huang, Hui
    MACHINE TRANSLATION, CCMT 2020, 2020, 1328 : 132 - 141
  • [33] Urdu to Punjabi Machine Translation: An Incremental Training Approach
    Singh, Umrinderpal
    Goyal, Vishal
    Lehal, Gurpreet Singh
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (04) : 227 - 238
  • [34] Low-Resource Neural Machine Translation with Neural Episodic Control
    Wu, Nier
    Hou, Hongxu
    Sun, Shuo
    Zheng, Wei
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [35] Machine Translation of a Training Set for Semantic Extraction of Relations
    Pena-Torres, Jefferson A.
    Bucheli, Victor
    Gutierrez De Pinerez Reyes, Raul E.
    CUADERNOS DE LINGUISTICA HISPANICA, 2022, 39
  • [36] Analysis of Rule-Based Machine Translation and Neural Machine Translation Approaches for Translating Portuguese to LIBRAS
    Moraes de Oliveira, Caio Cesar
    do Rego, Thais Gaudencio
    Cavalcanti Brandao Lima, Manuella Aschoff
    Ugulino de Araujo, Tiago Maritan
    WEBMEDIA 2019: PROCEEDINGS OF THE 25TH BRAZILLIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB, 2019, : 117 - 124
  • [37] Machine translation training data for English-Tshivenda
    Gaustad, Tanja
    McKellar, Cindy A.
    Puttkammer, Martin J.
    DATA IN BRIEF, 2024, 57
  • [38] Recurrent Neural Network Techniques: Emphasis on Use in Neural Machine Translation
    Suleiman, Dima
    Etaiwi, Wael
    Awajan, Arafat
    INFORMATICA-AN INTERNATIONAL JOURNAL OF COMPUTING AND INFORMATICS, 2021, 45 (07): : 107 - 114
  • [39] Opportunities and Implementation of Neural Machine Translation for Network Configuration
    Li, Fuliang
    Zhang, Jiajie
    Li, Minglong
    Wang, Xingwei
    IEEE NETWORK, 2023, 37 (04): : 82 - 89
  • [40] Syntax-Informed Interactive Neural Machine Translation
    Gupta, Kamal Kumar
    Haque, Rejwanul
    Ekbal, Asif
    Bhattacharyya, Pushpak
    Way, Andy
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,