Training Google Neural Machine Translation on an Intel CPU Cluster

被引:1
作者
Kalamkar, Dhiraj D. [1 ]
Banerjee, Kunal [1 ]
Srinivasan, Sudarshan [1 ]
Sridharan, Srinivas [1 ]
Georganas, Evangelos [2 ]
Smorkalov, Mikhail E. [3 ]
Xu, Cong [3 ]
Heinecke, Alexander [2 ]
机构
[1] Intel Corp, Parallel Comp Lab, Bangalore, Karnataka, India
[2] Intel Corp, Parallel Comp Lab, Santa Clara, CA USA
[3] Intel Corp, Intel Arch Graph & Sw, Nizhnii Novgorod, Russia
来源
2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) | 2019年
关键词
machine translation; recurrent neural networks; TensorFlow; LIBXSMM; Intel architecture;
D O I
10.1109/cluster.2019.8891019
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Google's neural machine translation (GNMT) is state-of-the-art recurrent neural network (RNN/LSTM) based language translation application. It is computationally more demanding than well-studied convolutional neural networks (CNNs). Also, in contrast to CNNs, RNNs heavily mix compute and memory bound layers which requires careful tuning on a latency machine to optimally use fast on-die memories for best single processor performance. Additionally, due to massive compute demand, it is essential to distribute the entire workload among several processors and even compute nodes. To the best of our knowledge, this is the first work which attempts to scale this application on an Intel CPU cluster. Our CPU-based GNMT optimization, the first of its kind, achieves this by the following steps: (i) we choose a monolithic long short-term memory (LSTM) cell implementation from LIBXSMM library (specifically tuned for CPUs) and integrate it into TensorFlow, (ii) we modify GNMT code to use fused time step LSTM op for the encoding stage, (iii) we combine Horovod and Intel MLSL scaling libraries for improved performance on multiple nodes, and (iv) we extend the bucketing logic for grouping similar length sentences together to multiple nodes for achieving load balance across multiple ranks. In summary, we demonstrate that due to these changes we are able to outperform Google's stock CPU-based GNMT implementation by similar to 2x on single node and potentially enable more than 25x speedup using 16 node CPU cluster.
引用
收藏
页码:193 / 202
页数:10
相关论文
共 50 条
  • [1] Improvements of Google Neural Machine Translation
    李瑞
    蒋美佳
    海外英语, 2017, (15) : 132 - 134
  • [2] Iterative Training of Unsupervised Neural and Statistical Machine Translation Systems
    Marie, Benjamin
    Fujita, Atsushi
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (05)
  • [3] Survey on Neural Machine Translation for multilingual translation system
    Basmatkar, Pranjali
    Holani, Hemant
    Kaushal, Shivani
    PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, : 443 - 448
  • [4] The impact of institutional interactions on cluster response to innovation: The case of Montreal and neural machine translation
    Girard, Meaghan J.
    Turkina, Ekaterina
    CANADIAN JOURNAL OF ADMINISTRATIVE SCIENCES-REVUE CANADIENNE DES SCIENCES DE L ADMINISTRATION, 2024, 41 (02): : 194 - 211
  • [5] A Survey of Neural Machine Translation
    Li Y.-C.
    Xiong D.-Y.
    Zhang M.
    Zhang, Min (minzhang@suda.edu.cn), 2018, Science Press (41): : 2734 - 2755
  • [6] Interactive neural machine translation
    Peris, Alvaro
    Domingo, Miguel
    Casacuberta, Francisco
    COMPUTER SPEECH AND LANGUAGE, 2017, 45 : 201 - 220
  • [7] Research on self-training neural machine translation based on monolingual priority sampling
    Zhang X.
    Pang L.
    Du X.
    Lu T.
    Xia Y.
    Tongxin Xuebao/Journal on Communications, 2024, 45 (04): : 65 - 72
  • [8] Integrating Prior Translation Knowledge Into Neural Machine Translation
    Chen, Kehai
    Wang, Rui
    Utiyama, Masao
    Sumita, Eiichiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 330 - 339
  • [9] Neural Machine Translation of Indian Languages
    Revanuru, Karthik
    Turlapaty, Kaushik
    Rao, Shrisha
    COMPUTE'17: PROCEEDINGS OF THE 10TH ANNUAL ACM INDIA COMPUTE CONFERENCE, 2017, : 11 - 20
  • [10] Neural machine translation of Hindi and English
    Bhatnagar, Sahil
    Chatterjee, Niladri
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2071 - 2079