Machine Learning Inference on Serverless Platforms Using Model Decomposition

被引：0

作者：

Gallego, Adrien ^{[1
]}

Odyurt, Uraz ^{[2
,3
]}

Cheng, Yi ^{[1
]}

Wang, Yuandou ^{[1
]}

Zhao, Zhiming ^{[1
]}

机构：

[1] Univ Amsterdam, Inst Informat, Amsterdam, Netherlands

[2] Radboud Univ Nijmegen, High Energy Phys, Nijmegen, Netherlands

[3] Nikhef, Amsterdam, Netherlands

来源：

16TH IEEE/ACM INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING, UCC 2023 | 2023年

关键词：

Serverless computing; Machine learning; Model decomposition; Inference;

D O I：

10.1145/3603166.3632535

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Serverless offers a scalable and cost-effective service model for users to run applications without focusing on underlying infrastructure or physical servers. While the Serverless architecture is not designed to address the unique challenges posed by resource-intensive workloads, e.g., Machine Learning (ML) tasks, it is highly scalable. Due to the limitations of Serverless function deployment and resource provisioning, the combination of ML and Serverless is a complex undertaking. We tackle this problem through decomposition of large ML models into smaller sub-models, referred to as slices. We set up ML inference tasks using these slices as a Serverless workflow, i.e., sequence of functions. Our experimental evaluations are performed on the Serverless offering by AWS for demonstration purposes, considering an open-source format for ML model representation, Open Neural Network Exchange. Achieved results portray that our decomposition method enables the execution of ML inference tasks on Serverless, regardless of the model size, benefiting from the high scalability of this architecture while lowering the strain on computing resources, such as required run-time memory.

引用

页数：6

共 19 条

[1]

[Anonymous], 2021, Caffe2onnx 2.0.1.

[2]

[Anonymous], 2023, Tf2onnx

[3] Exploring Serverless Computing for Neural Network Training [J].

Feng, Lang ;

Kudva, Prabhakar ;

Da Silva, Dilma ;

Hu, Jiang .

PROCEEDINGS 2018 IEEE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2018, :334-341

[4] Toward Collaborative Inferencing of Deep Neural Networks on Internet-of-Things Devices [J].

Hadidi, Ramyad ;

Cao, Jiashen ;

Ryoo, Micheal S. ;

Kim, Hyesoon .

IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (06) :4950-4960

[5] Serving deep learning models in a serverless platform [J].

Ishakian, Vatche ;

Muthusamy, Vinod ;

Slominski, Aleksander .

2018 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E 2018), 2018, :257-262

[6]

Keras, 2023, Keras api reference

[7] Serving distributed inference deep learning models in serverless computing [J].

Mahajan, Kunal ;

Desai, Rumit .

2022 IEEE 15TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2022), 2022, :109-111

[8]

Mao JC, 2017, DES AUT TEST EUROPE, P1396, DOI 10.23919/DATE.2017.7927211

[9] A Survey of the Usages of Deep Learning for Natural Language Processing [J].

Otter, Daniel W. ;

Medina, Julian R. ;

Kalita, Jugal K. .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (02) :604-624

[10]

Pak M, 2017, INT CONF COMP APPL I, P367

← 1 2 →