Hyperscale Hardware Optimized Neural Architecture Search

被引:5
作者
Li, Sheng [1 ]
Andersen, Garrett [1 ]
Chen, Tao [1 ]
Cheng, Liqun [1 ]
Grady, Julian [1 ]
Da Huang [1 ]
Le, Quoc V. [1 ]
Li, Andrew [1 ]
Li, Xin [1 ]
Li, Yang [1 ]
Liang, Chen [1 ]
Lu, Yifeng [1 ]
Ni, Yun [1 ]
Pang, Ruoming [1 ]
Tan, Mingxing [1 ]
Wicke, Martin [1 ]
Wu, Gang [1 ]
Zhu, Shengqi [1 ]
Ranganathan, Parthasarathy [1 ]
Jouppi, Norman P. [1 ]
机构
[1] Google, Mountain View, CA 94043 USA
来源
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, VOL 3, ASPLOS 2023 | 2023年
关键词
Hyperscale Hardware; Accelerator; TPU; GPU; Machine Learning; Deep Learning; Neural Architecture Search; Pareto Optimization;
D O I
10.1145/3582016.3582049
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recent advances in machine learning have leveraged dramatic increases in computational power, a trend expected to continue in the future. This paper introduces the first Hyperscale Hardware Optimized Neural Architecture Search (H2O-NAS) to automatically design accurate and performant machine learning models tailored to the underlying hardware architecture. H2O-NAS consists of three key components: a new massively parallel "one-shot" search algorithm with intelligent weight sharing, which can scale to search spaces of O(10(280)) and handle large volumes of production traffic; hardware-optimized search spaces for diverse ML models on heterogeneous hardware; and a novel two-phase hybrid performance model and a multi-objective reward function optimized for large-scale deployments. H2O-NAS has been implemented around state-of-the-art machine learning models (e.g. convolutional models, vision transformers, and deep learning recommendation models) and deployed at zettaflop scale in production. Our results demonstrate significant improvements in performance (22% similar to 56%) and energy efficiency (17% similar to 25%) at same or better quality. Our solution is designed for large-scale deployment, streamlining privacy and security processes and reducing manual overhead. This facilitates a smooth and automated transition from research to production.
引用
收藏
页码:343 / 358
页数:16
相关论文
共 60 条
[1]  
Abadi Martin, 2016, arXiv
[2]  
Adiwardana D, 2020, Arxiv, DOI [arXiv:2001.09977, 10.48550/arXiv.2001.09977]
[3]  
[Anonymous], 2020, Xla: Compiling machine learning for peak performance
[4]   Supporting Massive DLRM Inference through Software Defined Memory [J].
Ardestani, Ehsan K. ;
Kim, Changkyu ;
Lee, Seung Jae ;
Pan, Luoshang ;
Axboe, Jens ;
Rampersad, Valmiki ;
Agrawal, Banit ;
Yu, Fuxun ;
Yu, Ansha ;
Trung Le ;
Yuen, Hector ;
Mudigere, Dheevatsa ;
Juluri, Shishir ;
Nanda, Akshat ;
Wodekar, Manoj ;
Nair, Krishnakumar ;
Naumov, Maxim ;
Petersen, Chris ;
Smelyanskiy, Mikhail ;
Rao, Vijay .
2022 IEEE 42ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2022), 2022, :302-312
[5]   Can weight sharing outperform random architecture search? An investigation with TuNAS [J].
Bender, Gabriel ;
Liu, Hanxiao ;
Chen, Bo ;
Chu, Grace ;
Cheng, Shuyang ;
Kindermans, Pieter-Jan ;
Le, Quoc .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :14311-14320
[6]  
Brown TB, 2020, ADV NEUR IN, V33
[7]  
Cai Han, 2020, P INT C LEARN REPR, DOI [10.48550/arXiv.1908, DOI 10.48550/ARXIV.1908]
[8]  
Cai Han, 2019, arXiv, DOI [10.48550/arXiv.1812.00332, DOI 10.48550/ARXIV.1812.00332]
[9]   AutoFormer: Searching Transformers for Visual Recognition [J].
Chen, Minghao ;
Peng, Houwen ;
Fu, Jianlong ;
Ling, Haibin .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :12250-12260
[10]   Volta: Performance and Programmability [J].
Choquette, Jack ;
Giroux, Olivier ;
Foley, Denis .
IEEE MICRO, 2018, 38 (02) :42-52