Adversarial poisoning attacks on reinforcement learning-driven energy pricing

被引：4

作者：

Gunn, Sam ^{[1
]}

Jang, Doseok ^{[1
]}

Paradise, Orr ^{[1
]}

Spangher, Lucas ^{[1
]}

Spanos, Costas J. ^{[1
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

PROCEEDINGS OF THE 2022 THE 9TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILDINGS, CITIES, AND TRANSPORTATION, BUILDSYS 2022 | 2022年

基金：

新加坡国家研究基金会;

关键词：

smart grids; deep reinforcement learning; data poisoning;

D O I：

10.1145/3563357.3564075

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Complex controls are increasingly common in power systems. Reinforcement learning (RL) has emerged as a strong candidate for implementing various controllers. One common use of RL in this context is for prosumer pricing aggregations, where prosumers consist of buildings with both solar generation and energy storage. Specifically, supply and demand data serve as the observation space for many microgrid controllers acting based on a policy passed from a central RL agent. Each controller outputs an action space consisting of hourly "buy" and "sell" prices for energy throughout the day; in turn, each prosumer can choose whether to transact with the RL agent or the utility. The RL agent, who is learning online, is rewarded through its ability to generate a profit. We ask: what happens when some of the microgrid controllers are compromised by a malicious entity? We demonstrate a novel attack in RL and a simple defense against the attack. Our attack perturbs each trajectory to reverse the direction of the estimated gradient. We demonstrate that if data from a small fraction of microgrid controllers is adversarially perturbed, the learning of the RL agent can be significantly slowed. With larger perturbations, the RL aggregator can be manipulated to learn a catastrophic pricing policy that causes the RL agent to operate at a loss. Other environmental characteristics are worsened too: prosumers face higher energy costs, use their batteries less, and suffer from higher peak demand when the pricing aggregator is adversarially poisoned. We address this vulnerability with a "defense" module; i.e., a "robustification" of RL algorithms against this attack. Our defense identifies the trajectories with the largest influence on the gradient and removes them from the training data. It is computationally light and reasonable to include in any RL algorithm.

引用

页码：262 / 265

页数：4

共 18 条

[11] Rakhsha A, 2021, Arxiv, DOI arXiv:2102.08492
[12] Spangher Lucas, 2020, e-Energy '20: Proceedings of the Eleventh ACM International Conference on Future Energy Systems, P438, DOI 10.1145/3396851.3402365
[13] Transactive Multi-Agent Reinforcement Learning for Distributed Energy Price Localization
Spangher, Lucas
[J]. BUILDSYS'21: PROCEEDINGS OF THE 2021 ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILT ENVIRONMENTS, 2021, : 244 - 245
[14] Demo Abstract: CityLearn v1.0-An OpenAI Gym Environment for Demand Response with Deep Reinforcement Learning
Vazquez-Canteli, Jose R.
Kampf, Jerome
Henze, Gregor
Nagy, Zoltan
[J]. BUILDSYS'19: PROCEEDINGS OF THE 6TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILDINGS, CITIES, AND TRANSPORTATION, 2019, : 356 - 357
[15] Reinforcement learning for demand response: A review of algorithms and modeling techniques
Vazquez-Canteli, Jose R.
Nagy, Zoltan
[J]. APPLIED ENERGY, 2019, 235 : 1072 - 1089
[16] Adversarial Attack for Deep Reinforcement Learning Based Demand Response
Wan, Zhiqiang
Li, Hepeng
Shuai, Hang
Sun, Yan
He, Haibo
[J]. 2021 IEEE POWER & ENERGY SOCIETY GENERAL MEETING (PESGM), 2021,
[17] Xiangyu Zhang, 2020, RLEM'20: Proceedings of the 1st International Workshop on Reinforcement Learning for Energy Management in Buildings & Cities, P43, DOI 10.1145/3427773.3427865
[18] Machine-learning based hybrid demand-side controller for high-rise office buildings with high energy flexibilities
Zhou, Yuekuan
Zheng, Siqian
[J]. APPLIED ENERGY, 2020, 262

← 1 2 →