Hyperscale Hardware Optimized Neural Architecture Search

被引：5

作者：

Li, Sheng ^{[1
]}

Andersen, Garrett ^{[1
]}

Chen, Tao ^{[1
]}

Cheng, Liqun ^{[1
]}

Grady, Julian ^{[1
]}

Da Huang ^{[1
]}

Le, Quoc V. ^{[1
]}

Li, Andrew ^{[1
]}

Li, Xin ^{[1
]}

Li, Yang ^{[1
]}

Liang, Chen ^{[1
]}

Lu, Yifeng ^{[1
]}

Ni, Yun ^{[1
]}

Pang, Ruoming ^{[1
]}

Tan, Mingxing ^{[1
]}

Wicke, Martin ^{[1
]}

Wu, Gang ^{[1
]}

Zhu, Shengqi ^{[1
]}

Ranganathan, Parthasarathy ^{[1
]}

Jouppi, Norman P. ^{[1
]}

机构：

[1] Google, Mountain View, CA 94043 USA

来源：

PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, VOL 3, ASPLOS 2023 | 2023年

关键词：

Hyperscale Hardware; Accelerator; TPU; GPU; Machine Learning; Deep Learning; Neural Architecture Search; Pareto Optimization;

D O I：

10.1145/3582016.3582049

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent advances in machine learning have leveraged dramatic increases in computational power, a trend expected to continue in the future. This paper introduces the first Hyperscale Hardware Optimized Neural Architecture Search (H2O-NAS) to automatically design accurate and performant machine learning models tailored to the underlying hardware architecture. H2O-NAS consists of three key components: a new massively parallel "one-shot" search algorithm with intelligent weight sharing, which can scale to search spaces of O(10(280)) and handle large volumes of production traffic; hardware-optimized search spaces for diverse ML models on heterogeneous hardware; and a novel two-phase hybrid performance model and a multi-objective reward function optimized for large-scale deployments. H2O-NAS has been implemented around state-of-the-art machine learning models (e.g. convolutional models, vision transformers, and deep learning recommendation models) and deployed at zettaflop scale in production. Our results demonstrate significant improvements in performance (22% similar to 56%) and energy efficiency (17% similar to 25%) at same or better quality. Our solution is designed for large-scale deployment, streamlining privacy and security processes and reducing manual overhead. This facilitates a smooth and automated transition from research to production.

引用

页码：343 / 358

页数：16

共 60 条

[1]

Abadi Martin, 2016, arXiv

[2]

Adiwardana D, 2020, Arxiv, DOI [arXiv:2001.09977, 10.48550/arXiv.2001.09977]

[3]

[Anonymous], 2020, Xla: Compiling machine learning for peak performance

[4] Supporting Massive DLRM Inference through Software Defined Memory [J].

Ardestani, Ehsan K. ;

Kim, Changkyu ;

Lee, Seung Jae ;

Pan, Luoshang ;

Axboe, Jens ;

Rampersad, Valmiki ;

Agrawal, Banit ;

Yu, Fuxun ;

Yu, Ansha ;

Trung Le ;

Yuen, Hector ;

Mudigere, Dheevatsa ;

Juluri, Shishir ;

Nanda, Akshat ;

Wodekar, Manoj ;

Nair, Krishnakumar ;

Naumov, Maxim ;

Petersen, Chris ;

Smelyanskiy, Mikhail ;

Rao, Vijay .

2022 IEEE 42ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2022), 2022, :302-312

[5] Can weight sharing outperform random architecture search? An investigation with TuNAS [J].

Bender, Gabriel ;

Liu, Hanxiao ;

Chen, Bo ;

Chu, Grace ;

Cheng, Shuyang ;

Kindermans, Pieter-Jan ;

Le, Quoc .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :14311-14320

[6]

Brown TB, 2020, ADV NEUR IN, V33

[7]

Cai Han, 2020, P INT C LEARN REPR, DOI [10.48550/arXiv.1908, DOI 10.48550/ARXIV.1908]

[8]

Cai Han, 2019, arXiv, DOI [10.48550/arXiv.1812.00332, DOI 10.48550/ARXIV.1812.00332]

[9] AutoFormer: Searching Transformers for Visual Recognition [J].

Chen, Minghao ;

Peng, Houwen ;

Fu, Jianlong ;

Ling, Haibin .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :12250-12260

[10] Volta: Performance and Programmability [J].

Choquette, Jack ;

Giroux, Olivier ;

Foley, Denis .

IEEE MICRO, 2018, 38 (02) :42-52

← 1 2 3 4 5 6 →