One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

被引：13

作者：

Lu, Bingqian ^{[1
]}

Yang, Jianyi ^{[1
]}

Jiang, Weiwen ^{[2
]}

Shi, Yiyu ^{[3
]}

Ren, Shaolei ^{[1
]}

机构：

[1] Univ Calif Riverside, 900 Univ Ave, Riverside, CA 92521 USA

[2] George Mason Univ, 4400 Univ Dr, Fairfax, VA 22030 USA

[3] Univ Notre Dame, 257 Fitzpatrick Hall, Notre Dame, IN 46556 USA

来源：

PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS | 2021年 / 5卷 / 03期

关键词：

Neural Architecture Search; Hardware-Aware; Scalability; AutoML;

D O I：

10.1145/3491046

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis. To run CNN inference on various target devices, hardwareaware neural architecture search (NAS) is crucial. A key requirement of efficient hardware-aware NAS is the fast evaluation of inference latencies in order to rank different architectures. While building a latency predictor for each target device has been commonly used in state of the art, this is a very time-consuming process, lacking scalability in the presence of extremely diverse devices. In this work, we address the scalability challenge by exploiting latency monotonicity - the architecture latency rankings on different devices are often correlated. When strong latency monotonicity exists, we can re-use architectures searched for one proxy device on new target devices, without losing optimality. In the absence of strong latency monotonicity, we propose an efficient proxy adaptation technique to significantly boost the latency monotonicity. Finally, we validate our approach and conduct experiments with devices of different platforms on multiple mainstream search spaces, including MobileNet-V2, MobileNet-V3, NAS-Bench-201, ProxylessNAS and FBNet. Our results highlight that, by using just one proxy device, we can find almost the same Pareto-optimal architectures as the existing per-device NAS, while avoiding the prohibitive cost of building a latency predictor for each device.

引用

页数：34

共 51 条

[1]

Abdelfattah M. S., 2021, ICLR

[2]

AI-Benchmark, PERF MOB PHON

[3] User's guide to correlation coefficients [J].

Akoglu, Haldun .

TURKISH JOURNAL OF EMERGENCY MEDICINE, 2018, 18 (03) :91-93

[4]

[Anonymous], 2017, P INT C LEARN REPR T

[5]

Bender G, 2018, PR MACH LEARN RES, V80

[6] Can weight sharing outperform random architecture search? An investigation with TuNAS [J].

Bender, Gabriel ;

Liu, Hanxiao ;

Chen, Bo ;

Chu, Grace ;

Cheng, Shuyang ;

Kindermans, Pieter-Jan ;

Le, Quoc .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :14311-14320

[7]

Cai E, 2017, NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks, P622

[8]

Cai Han, 2019, INT C LEARN REPR ICL

[9]

Chen W., 2021, ICLR

[10]

Chu Grace, 2020, DISCOVERING MULTIHAR

← 1 2 3 4 5 6 →