Training Google Neural Machine Translation on an Intel CPU Cluster

被引:1
作者
Kalamkar, Dhiraj D. [1 ]
Banerjee, Kunal [1 ]
Srinivasan, Sudarshan [1 ]
Sridharan, Srinivas [1 ]
Georganas, Evangelos [2 ]
Smorkalov, Mikhail E. [3 ]
Xu, Cong [3 ]
Heinecke, Alexander [2 ]
机构
[1] Intel Corp, Parallel Comp Lab, Bangalore, Karnataka, India
[2] Intel Corp, Parallel Comp Lab, Santa Clara, CA USA
[3] Intel Corp, Intel Arch Graph & Sw, Nizhnii Novgorod, Russia
来源
2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) | 2019年
关键词
machine translation; recurrent neural networks; TensorFlow; LIBXSMM; Intel architecture;
D O I
10.1109/cluster.2019.8891019
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Google's neural machine translation (GNMT) is state-of-the-art recurrent neural network (RNN/LSTM) based language translation application. It is computationally more demanding than well-studied convolutional neural networks (CNNs). Also, in contrast to CNNs, RNNs heavily mix compute and memory bound layers which requires careful tuning on a latency machine to optimally use fast on-die memories for best single processor performance. Additionally, due to massive compute demand, it is essential to distribute the entire workload among several processors and even compute nodes. To the best of our knowledge, this is the first work which attempts to scale this application on an Intel CPU cluster. Our CPU-based GNMT optimization, the first of its kind, achieves this by the following steps: (i) we choose a monolithic long short-term memory (LSTM) cell implementation from LIBXSMM library (specifically tuned for CPUs) and integrate it into TensorFlow, (ii) we modify GNMT code to use fused time step LSTM op for the encoding stage, (iii) we combine Horovod and Intel MLSL scaling libraries for improved performance on multiple nodes, and (iv) we extend the bucketing logic for grouping similar length sentences together to multiple nodes for achieving load balance across multiple ranks. In summary, we demonstrate that due to these changes we are able to outperform Google's stock CPU-based GNMT implementation by similar to 2x on single node and potentially enable more than 25x speedup using 16 node CPU cluster.
引用
收藏
页码:193 / 202
页数:10
相关论文
共 50 条
  • [41] A proposed model for neural machine translation of Sanskrit into English
    Koul N.
    Manvi S.S.
    International Journal of Information Technology, 2021, 13 (1) : 375 - 381
  • [42] Augmented Spanish-Persian Neural Machine Translation
    Ahmadnia, Benyamin
    Aranovich, Raul
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2021, : 482 - 488
  • [43] Neural Machine Translation model for University Email Application
    Aneja, Sandhya
    Mazid, Siti Nur Afikah Bte Abdul
    Aneja, Nagender
    SSPS 2020: 2020 2ND SYMPOSIUM ON SIGNAL PROCESSING SYSTEMS, 2020, : 74 - 79
  • [44] A Vietnamese-English Neural Machine Translation System
    Thien Hai Nguyen
    Nguyen, Tuan-Duy H.
    Duy Phung
    Duy Tran-Cong Nguyen
    Hieu Minh Tran
    Manh Luong
    Tin Duy Vo
    Hung Hai Bui
    Dinh Phung
    Dat Quoc Nguyen
    INTERSPEECH 2022, 2022, : 5543 - 5544
  • [45] Attention With Sparsity Regularization for Neural Machine Translation and Summarization
    Zhang, Jiajun
    Zhao, Yang
    Li, Haoran
    Zong, Chengqing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 507 - 518
  • [46] Research on Machine Translation Model Based on Neural Network
    Han, Zhuoran
    Li, Shenghong
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, CSPS 2018, VOL III: SYSTEMS, 2020, 517 : 244 - 251
  • [47] A Survey of Non-Autoregressive Neural Machine Translation
    Li, Feng
    Chen, Jingxian
    Zhang, Xuejun
    ELECTRONICS, 2023, 12 (13)
  • [48] Exploration of Chinese-Uyghur Neural Machine Translation
    Mahmut, Gulnigar
    Memet, Rehmutulla
    Nijat, Mewlude
    Hamdulla, Askar
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 176 - 179
  • [49] Open and Competitive Multilingual Neural Machine Translation in Production
    Tattar, Andre
    Purason, Taido
    Kuulmets, Hele-Andra
    Luhtaru, Agnes
    Ratsep, Liisa
    Tars, Maali
    Pinnis, Marcis
    Bergmanis, Toms
    Fishel, Mark
    BALTIC JOURNAL OF MODERN COMPUTING, 2022, 10 (03): : 422 - 434
  • [50] Document Sub-structure in Neural Machine Translation
    Dobreva, Radina
    Zhou, Jie
    Bawden, Rachel
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3657 - 3667