CN111200285B - Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory - Google Patents

Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory Download PDF

Info

Publication number
CN111200285B
CN111200285B CN202010089205.7A CN202010089205A CN111200285B CN 111200285 B CN111200285 B CN 111200285B CN 202010089205 A CN202010089205 A CN 202010089205A CN 111200285 B CN111200285 B CN 111200285B
Authority
CN
China
Prior art keywords
state
designing
energy storage
strategy
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010089205.7A
Other languages
Chinese (zh)
Other versions
CN111200285A (en
Inventor
窦春霞
张立国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanshan University
Original Assignee
Yanshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanshan University filed Critical Yanshan University
Priority to CN202010089205.7A priority Critical patent/CN111200285B/en
Publication of CN111200285A publication Critical patent/CN111200285A/en
Application granted granted Critical
Publication of CN111200285B publication Critical patent/CN111200285B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/381Dispersed generators
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E70/00Other energy conversion or management systems reducing GHG emissions
    • Y02E70/30Systems combining energy storage with energy generation of non-fossil origin

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory, which comprises the following steps: designing a transition voltage layer control strategy based on a voltage layering mode, and designing a double-energy-storage role-division control strategy, wherein when an energy storage unit works in a voltage stabilizing mode, two energy storages work separately; when the auxiliary energy storage is needed to continuously absorb power or supplement power, the two energy storage working modes are converted into cooperative charge/discharge; constructing an action space and a state space based on Q-Learning: designing a reinforcement learning control framework based on multiple agents: basic updating rules comprising design state-action pairs and selecting corresponding cost functions; designing a basic action selection mechanism and a return value strategy: the method comprises the steps of designing a selection strategy adopted by a system in an initial state and reporting values in various states; designing a reinforcement learning algorithm flow: and designing a proper algorithm flow based on the strategy to realize the control strategy.

Description

Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory
Technical Field
The invention relates to the field of intelligent power grid control, in particular to a micro power grid hybrid coordination control method based on reinforcement learning and multi-agent theory.
Background
With the rapid development of economy, the energy consumption of China is gradually increased year by year, wherein the total consumption of non-renewable energy sources such as fossil energy sources is rapidly increased. The power supply of China is mainly from thermal power generation, but with the transitional exploitation of non-renewable energy sources such as fossil energy sources and the like and the increasing of negative effects on the environment in the traditional power generation process, the research of renewable energy sources such as wind, light, water and the like in China and even worldwide gradually brings up a schedule. The development and utilization of green clean energy not only can make a certain contribution to environmental protection, but also can provide a new energy supply form for economic development. Therefore, the development and utilization of clean energy become important tasks for the development of energy in China, and wind power generation and photovoltaic power generation in China are developed at a relatively rapid speed.
In recent years, compared with the traditional large-scale centralized power generation and distribution mode, the micro-grid based on the distributed power generation technology has gained wide attention and application at home and abroad due to the outstanding advantages of short construction period, less investment, flexible installation place, reliable power supply, easy maintenance, high energy utilization rate, small environmental pollution and the like. The micro-grid combines a distributed power supply, a load, an energy storage device, a control device and the like to form a single controllable unit, and simultaneously supplies electric energy and heat energy to a user. Advanced information technology, control technology and power technology are integrated in the micro-grid, stable power supply can be provided, diversified load demands are met, and maximization of energy benefit, economic benefit and environmental benefit can be guaranteed. At the same time, the micro-grid can also provide electrical support into the regular grid at the necessary moment. Micro-grids will be an integral part of the future grid construction process. In China, the distributed power generation technology is greatly promoted, the method is a specific embodiment of a sustainable development road, and the method is a powerful support for adjusting energy structures, solving electricity utilization problems in remote areas and protecting the environment.
With the rapid development of distributed power generation technology, the problem to be solved is gradually revealed. The micro-grid after being connected with the large power grid can easily meet the load demand, but when the micro-grid is in island operation, an effective control mechanism is needed in order to ensure the safe and stable operation of the system under a plurality of connection conditions. Firstly, the energy supply of distributed power generation mainly depends on renewable energy sources such as wind, light and the like, but the inherent characteristics of intermittence, uncontrollability and the like which depend on natural factors can cause certain instability to the energy supply of a power grid. Therefore, in order to ensure the stable operation of the load, the distributed energy source must be reasonably and effectively controlled, so that the distributed energy source can operate in different modes according to real-time natural conditions and load requirements. Moreover, if the system is frequently switched between different operation modes, the control difficulty is also increased, the operation stability is reduced, and how to reduce the mode switching times under the condition of ensuring the stable operation of the system is also a considerable problem. In addition to the above-mentioned different operation mode conversion problems, the control problems of important power indexes such as voltage, power and the like are not neglected.
The problem of controlling the multi-mode conversion of a micro grid based on distributed power generation technology, multi-agent system (MAS) technology is certainly one of the most effective and widely used means. So far, many control methods for the multi-mode conversion of the micro-grid based on the distributed power generation technology have also been studied. However, most people focus on MAS-based logic switching control or continuous dynamic regulation issues and do not adequately consider the switching conditions and switching behavior of the microgrid system. Therefore, how to reasonably plan the switching conditions and switching behaviors and reduce the number of mode switching times is a considerable problem under the condition of ensuring the stable operation of the system. In the detection schemes of bus voltage, most of the adopted ideas are to directly divide the voltage into three or five grades, when the bus voltage rises or falls to a certain grade due to a certain factor, the system adopts a control measure to maintain the bus voltage and the stability of the system, but the condition that the bus voltage just fluctuates between certain two grades is not considered, and the system does not stop converting the control measure. For energy storage units, there are generally two uses: the first is that when the renewable energy generating capacity is not supplied sufficiently, the energy storage unit is responsible for providing electric energy output for the load; and the other is to call the energy storage unit to 'peak clipping and valley filling' when the bus voltage fluctuates to a certain extent, so as to stabilize the bus voltage. However, most articles adopt a single energy storage scheme, the single energy storage scheme needs to continuously switch charging and discharging when stabilizing voltage fluctuation, the performance is general, the energy storage life can be greatly lost, and the other part adopts improved double energy storage control on the basis, but the double energy storage control strategy is single.
Disclosure of Invention
According to the problems existing in the prior art, the invention discloses a micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory, which is characterized in that a transition voltage layer control strategy based on a voltage layering control mode is designed, and a transition layer is added between adjacent voltage layers;
designing a double-energy-storage role-division control strategy, and separating two energy storages to work when the energy storage unit works in a voltage stabilizing mode; when the auxiliary energy storage is needed to continuously absorb power or supplement power, the two energy storage working modes are converted into cooperative charge/discharge; .
Constructing an action space and a state space based on Q-Learning: analyzing the respective necessary states and actions of the busbar voltage detection unit, the energy storage unit, the photovoltaic power generation unit, the wind power generation unit, the diesel power generation unit and the load control unit, and acquiring respective action spaces and state spaces;
designing a reinforcement learning control framework based on multiple agents: basic updating rules comprising design state-action pairs and selecting corresponding cost functions;
designing a basic action selection mechanism and a return value strategy: the method comprises the steps of designing a selection strategy adopted by a system in an initial state and reporting values in various states;
designing a reinforcement learning algorithm flow: and designing a proper algorithm flow based on the strategy to realize the control strategy.
The reinforcement learning control framework based on the multiple agents is designed in the following way: comparing the state information acquired currently by the system with the state information acquired at the previous moment, if the state is the same, not generating an action instruction, and continuing to acquire the state information at the next moment;
after all states and actions are determined, the system generates a multidimensional Q matrix, the Q value of the state of the reinforcement learning model system is obtained by learning iteration to approach the optimal action value function,
the basic update rule for the state-action pair is as follows:
where s is the current state of the agent, a is the action taken based on the current state, Q (s t ,a t ) Representing that the agent is in state s t Next action instruction a selected by the set learning strategy t The method comprises the steps of carrying out a first treatment on the surface of the Beta is a decay factor whose size determines whether the selected strategy is prone to current rewards or future rewards, R is a state and behavior based rewards;
determining the conditions of executing actions, changing the state of the intelligent agent after instructions and the previous state through the reward function, determining the rewarding or punishment degree which is applied to the actions based on the previous state, and maximizing the sum of expected values of the reward function by trying to combine all actions allowed based on the current state, wherein the reward function is as follows:
wherein:representing the reward of the system in the next j steps in the time t, wherein pi is the basic action selection mechanism and the return value strategy of the strategy selected by the system are designed in the following way:
setting the enabling priority of the unit: the load control unit cuts off loads, wherein wind power/photovoltaic power generation, energy storage power supply, diesel engine set power supply and load control unit;
and the three conditions of the renewable energy power generation supply quantity just meeting the requirement, the renewable energy power generation supply quantity being larger than the requirement and the renewable energy power generation supply quantity being smaller than the requirement are designed according to the conditions.
By adopting the technical scheme, the micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory provided by the invention takes the maximum value of total return expectations acquired by a plurality of micro-grid agent units as a target through introducing reinforcement learning algorithm, and autonomously learns iterative action cost functions according to historical data and current states. The optimal action strategy of the multi-agent mixed coordination control is the finally obtained converged state-action table, and the mixed coordination control rule in the controller is further optimized by the action cost function. And a transition voltage layer control strategy based on a voltage layering technology and a double-energy-storage coordination control strategy are designed to reduce unstable factors caused by the control strategy and stabilize bus voltage.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a voltage detection class diagram incorporating a transition layer;
FIG. 2 is a model diagram of a microgrid system;
FIG. 3 is a diagram of the environment, agent and control system relationship;
fig. 4 is a multi-agent coordinated control framework based on reinforcement learning.
Detailed Description
In order to make the technical scheme and advantages of the present invention more clear, the technical scheme in the embodiment of the present invention is clearly and completely described below with reference to the accompanying drawings in the embodiment of the present invention:
the micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory as shown in fig. 4 comprises the following steps:
step 1: in order to optimally control the bus voltage, the voltage layering control strategy divides the bus voltage into 6 detections and the likeStage: (-, 0.95U) ref ],(0.95U ref ,0.96U ref ],(0.96U ref ,0.98U ref ],(0.98U ref ,1.02U ref ],(1.02U ref ,1.05U ref ],(1.05U ref ,—]Wherein U is ref Is the reference voltage.
Due to the renewable energy generation and the randomness of the load demands, the bus voltage can fluctuate to a certain extent. When the bus voltage jumps from one range to another, the detection unit transmits the state change signal to the hybrid control unit. To prevent frequent transmission of state change signals by the detection unit due to bus voltage jumps back and forth between two voltage detection levels caused by uncertainty in power generation and demand, a transition voltage layer is innovatively added between the voltage detection levels. As shown in fig. 1, when the bus voltage fluctuates in a certain voltage layer area and does not trigger a state change signal, the lower boundary of the upper layer area and the upper boundary of the lower layer area contained in the voltage layer area are in an inactive state, and when the bus voltage continues to rise (fall) to exceed the upper boundary (lower boundary) of the current layer area, the voltage layer area fails, the upper voltage layer area (the lower voltage layer area) is activated, the detection unit detects the bus voltage state signal fluctuation, and specific state change information is transmitted to the hybrid control unit. Event-triggered functions (event-triggered functions, ETFs) of the voltage detection units are formed by ETF (U s ) The expression is:
wherein Sgn ()' is a sign function; 1 (t) step function; u is the current bus voltage;and->The upper and lower limits of the voltage layer in which the current voltage is located are respectively represented, and the voltage range of each layer is defined above; when ETF (U) s ) When=1, the bus voltage is shifted up by one layer region, when ETF (U s ) When= -1, the bus voltage is allowed to move down one layer region, ETF (U s ) When=0, the voltage layer area is not changed; t is t s Is the trigger time.
Step 2: in order to exert the capacity advantage of the two groups of energy storage to the maximum extent and maintain the service life of the energy storage, the energy storage control unit performs role-division control on the two groups of energy storage. When the bus voltage fluctuates due to unstable power generation of renewable energy sources or other factors, the energy storage works in a voltage stabilizing working mode (comprising two working modes of power absorption and power compensation), the double energy storage works separately, one group of energy storage is specially used for power absorption, absorbs redundant electric energy of the bus voltage and carries out peak clipping, the other group of energy storage is specially used for power compensation, and the bus is supplied with energy to inhibit the reduction of the bus voltage and carries out valley filling; when the energy generation amount of the renewable energy source continuously rises (falls) and the energy storage auxiliary continuously absorbs power (supplementing power), the two energy storage working modes are converted into cooperative charge/discharge. The event triggering function of the control strategy is defined by ETF (E s ) The expression is:
ETF(E s )=Sgn[I s ]×[1(t)-1(t-t s )](5)
I s for storing the current between the bus and the energy, I s > 0 represents energy storage charging, whereas I s < 0 denotes energy storage discharge, t s Is the trigger time. When ETF (E) s ) When=1, the two stored energy are in a cooperative charging mode, and when ETF (E s ) When the energy storage device is in the state of the = -1, the two energy storage devices are in a cooperative discharge working mode, and when the energy storage device is in the state of the 0, the energy storage devices are in a angular color working mode (voltage stabilization). In addition, in order to maintain the service life of the energy storage and better enable the energy storage to provide service for the system, the two groups of energy storage SOCs are set to have no excessively large difference value, and when the energy storage is controlledThe unit detects the SOC 1 -SOC 2 When the energy storage capacity is more than 0.3, namely the two energy storage capacities are different by 0.3, the roles of the two energy storage are switched. In summary, the control mode can eliminate time loss caused by frequent switching of charge and discharge modes in single energy storage, further improve the reaction speed of the energy storage unit, stabilize busbar voltage fluctuation more quickly and effectively prolong the service life of the energy storage.
Step 3: the control scheme of each unit is designed as follows based on the micro grid system model diagram shown in fig. 2.
1) Bus voltage detection unit: in order to facilitate optimal control of the bus voltage, the bus voltage has been divided into 6 detection levels (-, 0.95U) according to the conventional voltage hierarchical control strategy in embodiment (1) ref ],(0.95U ref ,0.96U ref ],(0.96U ref ,0.98U ref ],(0.98U ref ,1.02U ref ],(1.02U ref ,1.05U ref ],(1.05U ref ,—]Corresponding to the above, six states are provided, and the state space comprises: ultralow, too low, normal, too high; but its state change factor depends on other units or natural factors so it has no action space.
2) An energy storage unit: to maintain the stored energy life, the working space limiting its capacity is 0.1 SOC-0.9 SOC, i.e. if the stored energy is used as a discharging or charging unit, it is disconnected from operation when its capacity is less than 0.1SOC or greater than 0.9SOC, thus it has three capacity states of full (chargeable), full (dischargeable) and chargeable and dischargeable. And because the energy storage is stabilized, the charge/discharge is cooperated and waits for four working states, the state space comprises: the system is characterized by comprising the following steps of electricity deficiency (chargeable), electricity full (dischargeable), chargeable and dischargeable, voltage stabilizing, cooperative charging, cooperative discharging and waiting; each working state has the condition that the working state can be converted to other states, so twelve working states are converted, namely twelve actions are converted, and the action space comprises: voltage stabilization, cooperative charge, voltage stabilization, cooperative discharge, voltage stabilization, wait, cooperative charge, voltage stabilization, cooperative charge, cooperative discharge, cooperative charge, wait, cooperative discharge, voltage stabilization, cooperative discharge, cooperative charge, cooperative discharge, wait, voltage stabilization, wait, cooperative charge, wait, and cooperative discharge.
3) Photovoltaic power generation unit: because the maximum power of the photovoltaic power generation completely depends on the quality of natural conditions, when the power generation power is smaller than the load power, the working mode of the power generation unit adopts a maximum power point tracking (MTTP) mode, so that the power generation unit provides more power as much as possible, and the power generation cost is reduced; when the generated power is larger than the load power, the working mode adopts a constant power mode, so that the normal operation of the load is ensured; when the generated power is too low, the power generation unit is out of operation. Thus, there are three working states in total, the state space comprising: constant power mode, MTTP mode, stop running; different from the energy storage unit, the photovoltaic power generation unit has four kinds of operating conditions of conversion altogether, and the action space includes: constant power, MTTP, stop operation, MTTP, constant power.
4) A wind power generation unit: the photovoltaic power generation unit has three working states and four actions, namely, a state space comprises: constant power mode, MTTP mode, stop running; the action space includes: constant power, MTTP, stop operation, MTTP, constant power.
5) Diesel power generation unit: when renewable energy sources are insufficient in power generation and the stored energy electric quantity is consumed, the diesel generator set is started to supply power, and normal operation of the load is maintained. The cell thus has two states, two actions. State space: the operation and stop are carried out; action space: starting and stopping.
6) Load control unit: the load control unit is responsible for controlling the supply of the unimportant loads. When all the generated power is smaller than the load power, the load control unit cuts off the unimportant loads one by one so as to maintain the stable bus voltage and ensure the normal operation of the important loads. The unit has three states of all online, partial cutting and all cutting, and corresponds to two actions of cutting load and online load. State space: all on-line, partially resected, and all resected; action space: cutting off load and loading on line.
Step 4: reinforcement learning is an unsupervised learning method, in which an agent repeatedly interacts with the environment to learn continuously, and selects an optimal or near optimal action to achieve a system objective or maintain a system optimal state, and the basic model generally includes two parts of the environment and the system as shown in fig. 3.
The Q value of the system state of the reinforcement learning model of the control module approaches the optimal action value function through learning iteration, and has little correlation with the strategy being followed. After all states and actions are determined, the system generates a multidimensional Q matrix. It should be noted that, the collection and processing of the state information and the issuing of the next action command by the system are an uninterrupted process, and the action command has coverage, that is, the agent must be ready to accept the action change command transmitted by the system at any time. In order to reduce the calculation amount of system data processing, the state information acquired currently by the system is compared with the state information acquired at the previous moment, if the states are the same, no action instruction is generated, and the acquisition of the state information at the next moment is continued. The basic update rule for the state-action pair is as follows:
where s is the current state of the agent, a is the action taken based on the current state, Q (s t ,a t ) Then it indicates that the agent is in state s t Next action instruction a selected by the set learning strategy t The method comprises the steps of carrying out a first treatment on the surface of the Beta is a decay factor whose magnitude determines whether the selected strategy is more prone to current rewards or future rewards, and R is a state and behavior based reward, the result of which is given by equation (8). During the learning iteration of the agents, the rewards earned by each agent depend on both the own actions and the actions of the other agents caused by the own actions. A multi-agent coordinated control framework based on reinforcement learning is shown in fig. 4.
The system decides the rewarding or punishment degree which is applied to the action based on the previous state by judging the state of the agent after executing the action changing instruction and the state before through the rewarding function, so the system maximizes the sum of expected values of the rewarding function by trying to combine all actions allowed based on the current state, and the rewarding function is as follows:
wherein:representing the reward of the system for the next j steps in time t, pi being the strategy selected by the system.
Step 5: on the premise of meeting the voltage safety evaluation index, the control system regulates and controls each power generation unit and the load control unit based on the state information of the bus voltage, and takes the problems of cost increase caused by energy storage and a diesel engine set and environmental pollution caused by the diesel engine set into consideration, the starting priority of each unit is ordered as follows: wind power/photovoltaic power generation (energy storage and voltage regulation) > energy storage and power supply > diesel unit power supply > load control unit cuts load.
When the renewable energy power generation supply quantity just meets the requirement:
the bus voltage is (0.98U) ref ,1.02U ref ]The range fluctuates, the power generation source mainly depends on renewable energy sources such as wind and light to supply power, the working mode is a maximum power point tracking mode, and the double energy storage units provide power compensation and real-time voltage stabilization;
when the renewable energy power generation supply amount is larger than the demand:
when wind power or photovoltaic power generation is sufficient, the bus voltage rises to (1.02U) ref ,1.05U ref ]Range (both default across the voltage buffer layer below). Energy storage action: the energy storage no longer provides power compensation, the working mode is switched to the charging mode, the self-power is supplemented, and the bus voltage is stabilized within the range (namely, the return value is r=r when the system selects the action + The return value when selecting other actions is r=r - The action return values expressed below are positive, and other action return values not expressed are negative, and are not described in detail;
if the bus voltage continues to riseHigh reach (1.05U ref ,—]Range, renewable energy action: the working mode of the wind power or photovoltaic power generation unit is converted from a maximum power point tracking mode to a constant power mode operation, and meanwhile, the bus voltage is stabilized in the range;
when the renewable energy power generation supply quantity is smaller than the demand:
when the renewable energy source power generation power is reduced, the bus voltage is reduced to (0.98U) ref ,1.02U ref ]During the range, the energy storage action: the energy storage unit enables power compensation and stabilizes the bus voltage in the range;
if renewable energy generation power continues to drop, bus voltage drops to (0.96U) ref ,0.98U ref ]And when the energy storage is in the range, the energy storage action is as follows: the energy storage starts to supply energy to maintain the voltage stability of the bus;
when the energy storage electric quantity is insufficient, the energy storage action is as follows: and the stored energy exits the operation. The bus voltage continues to drop to (0.95U) ref ,0.96U ref ]The diesel generating set acts: starting a diesel generating set;
if the load is large, and the power generated by the diesel generator set is difficult to meet, the bus voltage enters (-) 0.95U ref ]Range, load controller action: the load controller cuts off the loads one by one to maintain the voltage at 0.95U ref In the vicinity (note that the load controller adopts partition management for the cleavable load, each partition is composed of a plurality of loads, the load capacity is approximately the same, and the minimum unit of load shedding by the load controller is "zone"); when the renewable energy source is recovered, the bus voltage rises to (0.96U) ref ,0.98U ref ]When the load controller acts: the load controller attempts to ablate the load line by line (the process is reversed from ablation).
Step 6: based on the above strategies, a suitable algorithm flow is designed to implement the control strategy:
1) Initializing the setting: reading a system structure, loading a state-action matrix, setting an objective function and a reward function, and initializing a Q table;
2) Parameter setting: setting an action strategy and an attenuation factor beta;
3) Detecting a state s in a current environment t
4) Judging the current state s t And the last state s t-1 If so, carrying out the step 5); otherwise, returning to the step 3
5) Selecting a state s according to an action policy t Action a corresponding to t
6) Execution a t Returning the prize value R and the next state s t+1
7) And (3) updating the Q value according to the formula (5), storing the data into a knowledge base, and returning to the step (3).
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims (2)

1. A micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory is characterized by comprising the following steps:
designing a transition voltage layer control strategy based on a voltage layering control mode, and adding a transition layer between adjacent voltage layers;
designing a double-energy-storage role-division control strategy, and separating two energy storages to work when the energy storage unit works in a voltage stabilizing mode; when the auxiliary energy storage is needed to continuously absorb power or supplement power, the two energy storage working modes are converted into cooperative charge/discharge; the two energy storage separation works comprise that one group of energy storage is specially used for power absorption, absorbing redundant electric energy of bus voltage, carrying out peak clipping, and the other group of energy storage is specially used for power compensation, supplying energy to the bus to inhibit the reduction of the bus voltage, and carrying out valley filling;
constructing an action space and a state space based on Q-Learning: analyzing the respective necessary states and actions of the busbar voltage detection unit, the energy storage unit, the photovoltaic power generation unit, the wind power generation unit, the diesel power generation unit and the load control unit, and acquiring respective action spaces and state spaces;
designing a reinforcement learning control framework based on multiple agents: basic updating rules comprising design state-action pairs and selecting corresponding cost functions;
designing a basic action selection mechanism and a return value strategy: the method comprises the steps of designing a selection strategy adopted by a system in an initial state and reporting values in various states;
designing a reinforcement learning algorithm flow: and designing a proper algorithm flow based on the strategy to realize the control strategy.
2. The reinforcement learning and multi-agent theory-based micro-grid hybrid coordination control method of claim 1, further characterized by: the reinforcement learning control framework based on the multiple agents is designed in the following way: comparing the state information acquired currently by the system with the state information acquired at the previous moment, if the state is the same, not generating an action instruction, and continuing to acquire the state information at the next moment;
after all states and actions are determined, the system generates a multidimensional Q matrix, the Q value of the state of the reinforcement learning model system is obtained by learning iteration to approach the optimal action value function,
the basic update rule for the state-action pair is as follows:
wherein s is t A is the current state of the intelligent agent t For actions taken in accordance with the current state, Q (s t ,a t ) Representing that the agent is in state s t Next action instruction a selected by the set learning strategy t The method comprises the steps of carrying out a first treatment on the surface of the Beta is a decay factor whose size determines whether the selected strategy is prone to current rewards or future rewards, R is a state and behavior based rewards;
determining the conditions of executing actions, changing the state of the intelligent agent after instructions and the previous state through the reward function, determining the rewarding or punishment degree which is applied to the actions based on the previous state, and maximizing the sum of expected values of the reward function by trying to combine all actions allowed based on the current state, wherein the reward function is as follows:
wherein:representing the reward of the system for the next j steps in time t, pi being the strategy selected by the system.
CN202010089205.7A 2020-02-12 2020-02-12 Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory Active CN111200285B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010089205.7A CN111200285B (en) 2020-02-12 2020-02-12 Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010089205.7A CN111200285B (en) 2020-02-12 2020-02-12 Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory

Publications (2)

Publication Number Publication Date
CN111200285A CN111200285A (en) 2020-05-26
CN111200285B true CN111200285B (en) 2023-12-19

Family

ID=70747290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010089205.7A Active CN111200285B (en) 2020-02-12 2020-02-12 Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory

Country Status (1)

Country Link
CN (1) CN111200285B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112117760A (en) * 2020-08-13 2020-12-22 国网浙江省电力有限公司台州供电公司 Micro-grid energy scheduling method based on double-Q-value network deep reinforcement learning
CN112180730B (en) * 2020-10-10 2022-03-01 中国科学技术大学 Hierarchical optimal consistency control method and device for multi-agent system
CN113097994A (en) * 2021-03-15 2021-07-09 国网浙江省电力有限公司 Power grid operation mode adjusting method and device based on multiple reinforcement learning agents
CN113312839B (en) * 2021-05-25 2022-05-06 武汉大学 Power grid emergency auxiliary load shedding decision method and device based on reinforcement learning
CN115333143B (en) * 2022-07-08 2024-05-07 国网黑龙江省电力有限公司大庆供电公司 Deep learning multi-agent micro-grid cooperative control method based on double neural networks

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103001225A (en) * 2012-11-14 2013-03-27 合肥工业大学 MAS-based (multi-agent system) multi-microgrid energy management system simulation method
WO2013040837A1 (en) * 2011-09-25 2013-03-28 国网电力科学研究院 Computer monitoring method for microgrid system
WO2013104120A1 (en) * 2012-01-11 2013-07-18 中国人民解放军理工大学 Frequency-power joint distribution method based on multi-agent reinforcement learning in dynamic spectrum environment
CN103679292A (en) * 2013-12-17 2014-03-26 中国科学院自动化研究所 Electricity collaborative optimization method for double batteries of intelligent micro power grid
CN104505867A (en) * 2015-01-04 2015-04-08 南京国臣信息自动化技术有限公司 Alternating current and direct current hybrid micro-grid system and control strategy thereof
CN104967112A (en) * 2015-06-26 2015-10-07 上海电力学院 Direct current micro-grid coordination control method of light storage electric car charging station
CN105226632A (en) * 2015-10-30 2016-01-06 上海电力学院 A kind of multi-mode of DC micro power grid system switches control method for coordinating
CN105305480A (en) * 2015-07-13 2016-02-03 陕西省地方电力(集团)有限公司 Hybrid energy-storage DC micro grid hierarchical control method
CN107681650A (en) * 2017-10-10 2018-02-09 安徽理工大学 Direct-current grid energy management and control method for coordinating
CN108574411A (en) * 2018-05-22 2018-09-25 安徽工业大学 Two-way DC/DC power inverters dual-port stable control method and its control circuit
CN109347149A (en) * 2018-09-20 2019-02-15 国网河南省电力公司电力科学研究院 Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning
CN110276698A (en) * 2019-06-17 2019-09-24 国网江苏省电力有限公司淮安供电分公司 Distribution type renewable energy trade decision method based on the study of multiple agent bilayer cooperative reinforcing
CN110445122A (en) * 2019-09-06 2019-11-12 安徽工业大学 A kind of direct-current grid distributed freedom control method for coordinating that can significantly improve busbar voltage deviation
CN110649590A (en) * 2019-10-21 2020-01-03 上海电力大学 Networking type direct-current micro-grid energy cooperative control method

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013040837A1 (en) * 2011-09-25 2013-03-28 国网电力科学研究院 Computer monitoring method for microgrid system
WO2013104120A1 (en) * 2012-01-11 2013-07-18 中国人民解放军理工大学 Frequency-power joint distribution method based on multi-agent reinforcement learning in dynamic spectrum environment
CN103001225A (en) * 2012-11-14 2013-03-27 合肥工业大学 MAS-based (multi-agent system) multi-microgrid energy management system simulation method
CN103679292A (en) * 2013-12-17 2014-03-26 中国科学院自动化研究所 Electricity collaborative optimization method for double batteries of intelligent micro power grid
CN104505867A (en) * 2015-01-04 2015-04-08 南京国臣信息自动化技术有限公司 Alternating current and direct current hybrid micro-grid system and control strategy thereof
CN104967112A (en) * 2015-06-26 2015-10-07 上海电力学院 Direct current micro-grid coordination control method of light storage electric car charging station
CN105305480A (en) * 2015-07-13 2016-02-03 陕西省地方电力(集团)有限公司 Hybrid energy-storage DC micro grid hierarchical control method
CN105226632A (en) * 2015-10-30 2016-01-06 上海电力学院 A kind of multi-mode of DC micro power grid system switches control method for coordinating
CN107681650A (en) * 2017-10-10 2018-02-09 安徽理工大学 Direct-current grid energy management and control method for coordinating
CN108574411A (en) * 2018-05-22 2018-09-25 安徽工业大学 Two-way DC/DC power inverters dual-port stable control method and its control circuit
CN109347149A (en) * 2018-09-20 2019-02-15 国网河南省电力公司电力科学研究院 Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning
CN110276698A (en) * 2019-06-17 2019-09-24 国网江苏省电力有限公司淮安供电分公司 Distribution type renewable energy trade decision method based on the study of multiple agent bilayer cooperative reinforcing
CN110445122A (en) * 2019-09-06 2019-11-12 安徽工业大学 A kind of direct-current grid distributed freedom control method for coordinating that can significantly improve busbar voltage deviation
CN110649590A (en) * 2019-10-21 2020-01-03 上海电力大学 Networking type direct-current micro-grid energy cooperative control method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Chunxia Dou.Event-triggered hybrid control strategy based on hybrid automata and decision tree for microgrid.The Institution of Engineering and Technology.2019,3066-3077. *
张立国.基于强化学习的微电网多模态协调切换控制策略.CNKI优秀硕士论文全文库.2022,全文. *
张继红.混合储能型微电网多模式下垂控制策略.电器与能效管理技术 .2018,第78-83页. *
郭力.考虑电网分时电价的直流微电网分层协调控制.电网技术.2016,1992-2000. *

Also Published As

Publication number Publication date
CN111200285A (en) 2020-05-26

Similar Documents

Publication Publication Date Title
CN111200285B (en) Micro-grid hybrid coordination control method based on reinforcement learning and multi-agent theory
US9985438B2 (en) Optimization method for independent micro-grid system
CN109462253B (en) Off-grid type alternating current and direct current hybrid micro-grid system and control method thereof
CN102710013B (en) Park energy-network energy optimizing management system based on microgrids and implementing method thereof
CN110601248B (en) Multi-mode coordination control method of annular alternating current-direct current hybrid micro-grid system
WO2018103232A1 (en) Control method for new energy micro-grid electric vehicle charging station
CN111509743B (en) Control method for improving stability of power grid by using energy storage device
CN111244988B (en) Electric automobile considering distributed power supply and energy storage optimization scheduling method
CN101777769A (en) Multi-agent optimized coordination control method of electric network
CN113765130A (en) Operation control method of micro-grid
CN110416991B (en) Modularized multi-terminal flexible direct-current micro-grid networking and layered control method thereof
CN108493986B (en) Distributed generation coordination optimization scheduling method based on upper and lower double-layer optimization theory
CN112510756A (en) Micro-grid optical storage and charging coordinated operation method and system based on power level
de Bosio et al. Analysis and improvement of the energy management of an isolated microgrid in Lencois island based on a linear optimization approach
CN113809733A (en) Direct-current bus voltage and super capacitor charge management control method of light storage system
CN109617052B (en) Intelligent layered control method for large-scale electric heat storage units
CN110718933A (en) Multilevel coordinated wind storage isolated network system power balance control strategy
Huang et al. Optimal design of an island microgrid with considering scheduling optimization
CN115224704B (en) Time-sharing multiplexing peak regulation and frequency modulation power station constructed based on hybrid energy storage and control method
CN115793452A (en) Optimized control method of heat and hydrogen co-production system considering starting and stopping characteristics of multiple electrolytic tanks
CN212412772U (en) Energy storage type microgrid
CN114400704A (en) Island micro-grid multi-mode switching strategy based on double Q learning consideration economic regulation
CN113629758A (en) Multi-energy grid-connected operation control method and system
CN106230014A (en) A kind of emergent energy management strategies being applicable to light storage type building microgrid
Song et al. Unit commitment optimization model of wind storage combined system considering peak load regulation of energy storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant