Risk-Aware Continuous Control with Neural Contextual Bandits

被引：0

作者：

Ayala-Romero, Jose A. ^{[1
]}

Garcia-Saavedra, Andres ^{[1
]}

Costa-Perez, Xavier ^{[1
,2
,3
]}

机构：

[1] NEC Labs Europe, Heidelberg, Germany

[2] i2CAT Fdn, Barcelona, Spain

[3] ICREA, Barcelona, Spain

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent advances in learning techniques have garnered attention for their applicability to a diverse range of real-world sequential decision-making problems. Yet, many practical applications have critical constraints for operation in real environments. Most learning solutions often neglect the risk of failing to meet these constraints, hindering their implementation in real-world contexts. In this paper, we propose a risk-aware decision-making framework for contextual bandit problems, accommodating constraints and continuous action spaces. Our approach employs an actor multi-critic architecture, with each critic characterizing the distribution of performance and constraint metrics. Our framework is designed to cater to various risk levels, effectively balancing constraint satisfaction against performance. To demonstrate the effectiveness of our approach, we first compare it against state-of-the-art baseline methods in a synthetic environment, highlighting the impact of intrinsic environmental noise across different risk configurations. Finally, we evaluate our framework in a real-world use case involving a 5G mobile network where only our approach consistently satisfies the system constraint (a signal processing reliability target) with a small performance toll (8.5% increase in power consumption).

引用

页码：20930 / 20938

页数：9

共 41 条

[1]

Abbasi-Yadkori Yasin, 2011, P ADV NEUR INF PROC, V24

[2]

Abeille M, 2017, PR MACH LEARN RES, V54, P176

[3]

Agrawal Shipra, 2014, P 15 ACM C EC COMPUT, P989, DOI [10.1145/2600057.2602844, DOI 10.1145/2600057.2602844]

[4]

Amani S, 2019, ADV NEUR IN, V32

[5]

Ayala-Romero J. A., 2021, IEEE Transactions on Mobile Computing

[6] vrAIn: A Deep Learning Approach Tailoring Computing and Radio Resources in Virtualized RANs [J].

Ayala-Romero, Jose A. ;

Garcia-Saavedra, Andres ;

Gramaglia, Marco ;

Costa-Perez, Xavier ;

Banchs, Albert ;

Alcaraz, Juan J. .

MOBICOM'19: PROCEEDINGS OF THE 25TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING, 2019,

[7]

Badanidiyuru A., 2014, PMLR, P1109

[8] Human-level play in the game of Diplomacy by combining language models with strategic reasoning [J].

Bakhtin, Anton ;

Brown, Noam ;

Dinan, Emily ;

Farina, Gabriele ;

Flaherty, Colin ;

Fried, Daniel ;

Goff, Andrew ;

Gray, Jonathan ;

Hu, Hengyuan ;

Jacob, Athul Paul ;

Komeili, Mojtaba ;

Konath, Karthik ;

Kwon, Minae ;

Lerer, Adam ;

Lewis, Mike ;

Miller, Alexander H. ;

Mitts, Sasha ;

Renduchintala, Adithya ;

Roller, Stephen ;

Rowe, Dirk ;

Shi, Weiyan ;

Spisak, Joe ;

Wei, Alexander ;

Wu, David ;

Zhang, Hugh ;

Zijlstra, Markus .

SCIENCE, 2022, 378 (6624) :1067-+

[9]

Ban Yikun, 2022, INT C LEARN REPR

[10]

Bellemare MG, 2017, PR MACH LEARN RES, V70

← 1 2 3 4 5 →