Adversarial poisoning attacks on reinforcement learning-driven energy pricing

被引：4

作者：

Gunn, Sam ^{[1
]}

Jang, Doseok ^{[1
]}

Paradise, Orr ^{[1
]}

Spangher, Lucas ^{[1
]}

Spanos, Costas J. ^{[1
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

PROCEEDINGS OF THE 2022 THE 9TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILDINGS, CITIES, AND TRANSPORTATION, BUILDSYS 2022 | 2022年

基金：

新加坡国家研究基金会;

关键词：

smart grids; deep reinforcement learning; data poisoning;

D O I：

10.1145/3563357.3564075

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Complex controls are increasingly common in power systems. Reinforcement learning (RL) has emerged as a strong candidate for implementing various controllers. One common use of RL in this context is for prosumer pricing aggregations, where prosumers consist of buildings with both solar generation and energy storage. Specifically, supply and demand data serve as the observation space for many microgrid controllers acting based on a policy passed from a central RL agent. Each controller outputs an action space consisting of hourly "buy" and "sell" prices for energy throughout the day; in turn, each prosumer can choose whether to transact with the RL agent or the utility. The RL agent, who is learning online, is rewarded through its ability to generate a profit. We ask: what happens when some of the microgrid controllers are compromised by a malicious entity? We demonstrate a novel attack in RL and a simple defense against the attack. Our attack perturbs each trajectory to reverse the direction of the estimated gradient. We demonstrate that if data from a small fraction of microgrid controllers is adversarially perturbed, the learning of the RL agent can be significantly slowed. With larger perturbations, the RL aggregator can be manipulated to learn a catastrophic pricing policy that causes the RL agent to operate at a loss. Other environmental characteristics are worsened too: prosumers face higher energy costs, use their batteries less, and suffer from higher peak demand when the pricing aggregator is adversarially poisoned. We address this vulnerability with a "defense" module; i.e., a "robustification" of RL algorithms against this attack. Our defense identifies the trajectories with the largest influence on the gradient and removes them from the training data. It is computationally light and reasonable to include in any RL algorithm.

引用

页码：262 / 265

页数：4

共 18 条

[1] Agwan Utkarsha, 2021, e-Energy '21: Proceedings of the Twelfth International Conference on Future Energy Systems, P220, DOI 10.1145/3447555.3464853
[2] Reinforcement learning for whole-building HVAC control and demand response
Azuatalam, Donald
Lee, Wee-Lih
de Nijs, Frits
Liebman, Ariel
[J]. ENERGY AND AI, 2020, 2
[3] Bingqing Chen, 2021, e-Energy '21: Proceedings of the Twelfth International Conference on Future Energy Systems, P199, DOI 10.1145/3447555.3464874
[4] Gnu-RL: A Precocial Reinforcement Learning Solution for Building HVAC Control Using a Differentiable MPC Policy
Chen, Bingqing
Cai, Zicheng
Berges, Mario
[J]. BUILDSYS'19: PROCEEDINGS OF THE 6TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILDINGS, CITIES, AND TRANSPORTATION, 2019, : 316 - 325
[5] Distributed Optimal Power Flow for Smart Microgrids
Dall'Anese, Emiliano
Zhu, Hao
Giannakis, Georgios B.
[J]. IEEE TRANSACTIONS ON SMART GRID, 2013, 4 (03) : 1464 - 1475
[6] Goodfellow IJ, 2015, Arxiv, DOI [arXiv:1412.6572, 10.48550/arXiv.1412.6572]
[7] Jang Doseok, 2022, Decarbonizing Buildings via Energy Demand Response and Deep Reinforcement Learning: The Deployment Value of Supervisory Planning and Guardrails
[8] Hacking Power Grids: A Current Problem
Kshetri, Nir
Voas, Jeffrey
[J]. COMPUTER, 2017, 50 (12) : 91 - 95
[9] Madry A., 2018, ARXIV170606083
[10] Coordinated energy management for a cluster of buildings through deep reinforcement learning
Pinto, Giuseppe
Piscitelli, Marco Savino
Vazquez-Canteli, Jose Ramon
Nagy, Zoltan
Capozzoli, Alfonso
[J]. ENERGY, 2021, 229

← 1 2 →