Machine Learning Inference on Serverless Platforms Using Model Decomposition

被引：0

作者：

Gallego, Adrien ^{[1
]}

Odyurt, Uraz ^{[2
,3
]}

Cheng, Yi ^{[1
]}

Wang, Yuandou ^{[1
]}

Zhao, Zhiming ^{[1
]}

机构：

[1] Univ Amsterdam, Inst Informat, Amsterdam, Netherlands

[2] Radboud Univ Nijmegen, High Energy Phys, Nijmegen, Netherlands

[3] Nikhef, Amsterdam, Netherlands

来源：

16TH IEEE/ACM INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING, UCC 2023 | 2023年

关键词：

Serverless computing; Machine learning; Model decomposition; Inference;

D O I：

10.1145/3603166.3632535

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Serverless offers a scalable and cost-effective service model for users to run applications without focusing on underlying infrastructure or physical servers. While the Serverless architecture is not designed to address the unique challenges posed by resource-intensive workloads, e.g., Machine Learning (ML) tasks, it is highly scalable. Due to the limitations of Serverless function deployment and resource provisioning, the combination of ML and Serverless is a complex undertaking. We tackle this problem through decomposition of large ML models into smaller sub-models, referred to as slices. We set up ML inference tasks using these slices as a Serverless workflow, i.e., sequence of functions. Our experimental evaluations are performed on the Serverless offering by AWS for demonstration purposes, considering an open-source format for ML model representation, Open Neural Network Exchange. Achieved results portray that our decomposition method enables the execution of ML inference tasks on Serverless, regardless of the model size, benefiting from the high scalability of this architecture while lowering the strain on computing resources, such as required run-time memory.

引用

页数：6

共 19 条

[11]

PyTorch, 2020, Efficient serverless deployment of pytorch models on azure

[12]

PyTorch, 2023, PyTorch Documentation

[13]

PyTorch, 2023, Exporting a model from pytorch to onnx and running it using onnx runtime

[14]

Safka Christian., 2021, Onnx inference with python in aws lambda

[15] Serverless Computing: A Survey of Opportunities, Challenges, and Applications [J].

Shafiei, Hossein ;

Khonsari, Ahmad ;

Mousavi, Payam .

ACM COMPUTING SURVEYS, 2022, 54 (11S)

[16]

TensorFlow, 2023, Tensorflow documentation

[17]

Turewicz Marcus., 2020, Serverless image classification with onnx,.net and azure functions

[18] λDNN: Achieving Predictable Distributed DNN Training With Serverless Architectures [J].

Xu, Fei ;

Qin, Yiling ;

Chen, Li ;

Zhou, Zhi ;

Liu, Fangming .

IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (02) :450-463

[19] DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters [J].

Zhao, Zhuoran ;

Barijough, Kamyar Mirzazad ;

Gerstlauer, Andreas .

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (11) :2348-2359

← 1 2 →