Deep Reinforcement Learning Microgrid Optimization Strategy Considering Priority Flexible Demand Side

As an efficient way to integrate multiple distributed energy resources (DERs) and the user side, a microgrid is mainly faced with the problems of small-scale volatility, uncertainty, intermittency and demand-side uncertainty of DERs. The traditional microgrid has a single form and cannot meet the fl...

Full description

Bibliographic Details
Main Authors: Jinsong Sang, Hongbin Sun, Lei Kou
Format: Article
Language:English
Published: MDPI AG 2022-03-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/22/6/2256
_version_ 1797442382613970944
author Jinsong Sang
Hongbin Sun
Lei Kou
author_facet Jinsong Sang
Hongbin Sun
Lei Kou
author_sort Jinsong Sang
collection DOAJ
description As an efficient way to integrate multiple distributed energy resources (DERs) and the user side, a microgrid is mainly faced with the problems of small-scale volatility, uncertainty, intermittency and demand-side uncertainty of DERs. The traditional microgrid has a single form and cannot meet the flexible energy dispatch between the complex demand side and the microgrid. In response to this problem, the overall environment of wind power, thermostatically controlled loads (TCLs), energy storage systems (ESSs), price-responsive loads and the main grid is proposed. Secondly, the centralized control of the microgrid operation is convenient for the control of the reactive power and voltage of the distributed power supply and the adjustment of the grid frequency. However, there is a problem in that the flexible loads aggregate and generate peaks during the electricity price valley. The existing research takes into account the power constraints of the microgrid and fails to ensure a sufficient supply of electric energy for a single flexible load. This paper considers the response priority of each unit component of TCLs and ESSs on the basis of the overall environment operation of the microgrid so as to ensure the power supply of the flexible load of the microgrid and save the power input cost to the greatest extent. Finally, the simulation optimization of the environment can be expressed as a Markov decision process (MDP) process. It combines two stages of offline and online operations in the training process. The addition of multiple threads with the lack of historical data learning leads to low learning efficiency. The asynchronous advantage actor–critic (Memory A3C, M-A3C) with the experience replay pool memory library is added to solve the data correlation and nonstatic distribution problems during training. The multithreaded working feature of M-A3C can efficiently learn the resource priority allocation on the demand side of the microgrid and improve the flexible scheduling of the demand side of the microgrid, which greatly reduces the input cost. Comparison of the researched cost optimization results with the results obtained with the proximal policy optimization (PPO) algorithm reveals that the proposed algorithm has better performance in terms of convergence and optimization economics.
first_indexed 2024-03-09T12:41:04Z
format Article
id doaj.art-9f8d9b2ae9b14398891149cb96608dad
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-09T12:41:04Z
publishDate 2022-03-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-9f8d9b2ae9b14398891149cb96608dad2023-11-30T22:18:23ZengMDPI AGSensors1424-82202022-03-01226225610.3390/s22062256Deep Reinforcement Learning Microgrid Optimization Strategy Considering Priority Flexible Demand SideJinsong Sang0Hongbin Sun1Lei Kou2Changchun Institute of Technology, School of Electrical Engineering, Changchun 130012, ChinaChangchun Institute of Technology, School of Electrical Engineering, Changchun 130012, ChinaInstitute of Oceanographic Instrumentation, Qilu University of Technology (Shandong Academy of Sciences), Qingdao 266075, ChinaAs an efficient way to integrate multiple distributed energy resources (DERs) and the user side, a microgrid is mainly faced with the problems of small-scale volatility, uncertainty, intermittency and demand-side uncertainty of DERs. The traditional microgrid has a single form and cannot meet the flexible energy dispatch between the complex demand side and the microgrid. In response to this problem, the overall environment of wind power, thermostatically controlled loads (TCLs), energy storage systems (ESSs), price-responsive loads and the main grid is proposed. Secondly, the centralized control of the microgrid operation is convenient for the control of the reactive power and voltage of the distributed power supply and the adjustment of the grid frequency. However, there is a problem in that the flexible loads aggregate and generate peaks during the electricity price valley. The existing research takes into account the power constraints of the microgrid and fails to ensure a sufficient supply of electric energy for a single flexible load. This paper considers the response priority of each unit component of TCLs and ESSs on the basis of the overall environment operation of the microgrid so as to ensure the power supply of the flexible load of the microgrid and save the power input cost to the greatest extent. Finally, the simulation optimization of the environment can be expressed as a Markov decision process (MDP) process. It combines two stages of offline and online operations in the training process. The addition of multiple threads with the lack of historical data learning leads to low learning efficiency. The asynchronous advantage actor–critic (Memory A3C, M-A3C) with the experience replay pool memory library is added to solve the data correlation and nonstatic distribution problems during training. The multithreaded working feature of M-A3C can efficiently learn the resource priority allocation on the demand side of the microgrid and improve the flexible scheduling of the demand side of the microgrid, which greatly reduces the input cost. Comparison of the researched cost optimization results with the results obtained with the proximal policy optimization (PPO) algorithm reveals that the proposed algorithm has better performance in terms of convergence and optimization economics.https://www.mdpi.com/1424-8220/22/6/2256microgridenergy storageflexible loadreinforcement learningdeep learningenergy optimization
spellingShingle Jinsong Sang
Hongbin Sun
Lei Kou
Deep Reinforcement Learning Microgrid Optimization Strategy Considering Priority Flexible Demand Side
Sensors
microgrid
energy storage
flexible load
reinforcement learning
deep learning
energy optimization
title Deep Reinforcement Learning Microgrid Optimization Strategy Considering Priority Flexible Demand Side
title_full Deep Reinforcement Learning Microgrid Optimization Strategy Considering Priority Flexible Demand Side
title_fullStr Deep Reinforcement Learning Microgrid Optimization Strategy Considering Priority Flexible Demand Side
title_full_unstemmed Deep Reinforcement Learning Microgrid Optimization Strategy Considering Priority Flexible Demand Side
title_short Deep Reinforcement Learning Microgrid Optimization Strategy Considering Priority Flexible Demand Side
title_sort deep reinforcement learning microgrid optimization strategy considering priority flexible demand side
topic microgrid
energy storage
flexible load
reinforcement learning
deep learning
energy optimization
url https://www.mdpi.com/1424-8220/22/6/2256
work_keys_str_mv AT jinsongsang deepreinforcementlearningmicrogridoptimizationstrategyconsideringpriorityflexibledemandside
AT hongbinsun deepreinforcementlearningmicrogridoptimizationstrategyconsideringpriorityflexibledemandside
AT leikou deepreinforcementlearningmicrogridoptimizationstrategyconsideringpriorityflexibledemandside