CN114362218B - Scheduling method and device for multi-type energy storage in micro-grid based on deep Q learning - Google Patents

Scheduling method and device for multi-type energy storage in micro-grid based on deep Q learning Download PDF

Info

Publication number
CN114362218B
CN114362218B CN202111654110.6A CN202111654110A CN114362218B CN 114362218 B CN114362218 B CN 114362218B CN 202111654110 A CN202111654110 A CN 202111654110A CN 114362218 B CN114362218 B CN 114362218B
Authority
CN
China
Prior art keywords
energy storage
grid
micro
storage device
period energy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111654110.6A
Other languages
Chinese (zh)
Other versions
CN114362218A (en
Inventor
毛超利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanhu Research Institute Of Electronic Technology Of China
Original Assignee
Nanhu Research Institute Of Electronic Technology Of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanhu Research Institute Of Electronic Technology Of China filed Critical Nanhu Research Institute Of Electronic Technology Of China
Priority to CN202111654110.6A priority Critical patent/CN114362218B/en
Publication of CN114362218A publication Critical patent/CN114362218A/en
Application granted granted Critical
Publication of CN114362218B publication Critical patent/CN114362218B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06313Resource planning in a project environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/008Circuit arrangements for ac mains or ac distribution networks involving trading of energy or energy transmission rights
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Power Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a scheduling method and a scheduling device for multi-type energy storage in a micro-grid based on deep Q learning, wherein the micro-grid comprises a power supply, long-period energy storage equipment and short-period energy storage equipment, the scheduling problem of the multi-type energy storage in the micro-grid is described as a Markov decision process, a neural network model of deep Q learning is built and trained, the trained neural network model of deep Q learning is adopted, the current state of the micro-grid is input, corresponding actions are output, and scheduling is performed according to the output actions. According to the invention, the optimal energy storage scheduling strategy is obtained through continuous interaction of the intelligent agent and the micro-grid scheduling environment, the influence caused by model inaccuracy is avoided, the method is suitable for the micro-grid comprising various energy storage technologies, the support can be provided for the optimal operation of the micro-grid, the energy storage charge quantity can be adjusted by adopting the technical scheme of the invention, the electricity utilization time and the electricity consumption quantity are reasonably arranged, and the aim of maximizing the operation income of the micro-grid is fulfilled.

Description

Scheduling method and device for multi-type energy storage in micro-grid based on deep Q learning
Technical Field
The application belongs to the technical field of energy automation, and particularly relates to a scheduling method for multi-type energy storage in a micro-grid based on deep Q learning.
Background
Energy is used as source power to promote the continuous development of human society. However, the large exploitation and use of traditional fossil energy sources, on the one hand, makes humans face the crisis of energy exhaustion in the future, and, on the other hand, causes the problem of the current environment becoming increasingly worse. The development of renewable energy sources has become an important measure for the international society to cope with the two problems. The renewable energy sources are utilized for distributed power generation and are supplied for nearby load dissipation to form a micro-grid, so that the micro-grid is an important mode. The micro-grid is not only more flexible, but also an important technical means for solving the renewable energy power generation grid connection problem, and has recently gained general attention at home and abroad. Due to randomness and uncertainty of renewable energy power generation, various energy storage technologies are often required to be introduced into a micro-grid so as to ensure stable operation of the micro-grid. For example, long-period energy storage technologies (including but not limited to water electrolysis hydrogen-hydrogen storage-hydrogen-oxygen fuel cell technologies) are used to stabilize the long-period volatility of renewable energy power generation, such as seasonal fluctuations in wind and solar power generation; short-cycle energy storage technologies (including but not limited to energy storage lithium battery technology) are used to stabilize the short-cycle volatility of renewable energy generation, such as diurnal fluctuations in photovoltaic power generation.
The energy storage system is used as an important component of the micro-grid, and the stable operation of the micro-grid is effectively ensured through the storage and release of the redundant electric energy. Meanwhile, by means of a reasonable scheduling strategy, the energy storage system is also an important means for improving the operation economic benefit of the micro-grid.
The traditional random optimization algorithm is a main technology for solving the energy storage scheduling problem of the micro-grid, the technology needs to establish a scheduling mathematical model and introduce a large number of assumptions to solve, and the range of solving the problem is limited. The micro-grid energy storage scheduling mathematical model and parameters thereof have certain errors under the influence of the system dynamics and the intermittent generation of renewable energy sources, which can bring some unexpected influence to the solving result. In the case of multi-type energy storage contained in the micro-grid, the dispatching mathematical model is more complex, and the problem caused by errors is more serious.
Disclosure of Invention
The purpose of the application is to provide a scheduling method and a scheduling device for multi-type energy storage in a micro-grid based on deep Q learning, so as to reduce the technical problem that a scheduling strategy is not optimal caused by errors.
In order to achieve the above purpose, the technical scheme of the application is as follows:
a scheduling method of multi-type energy storage in a micro-grid based on deep Q learning, wherein the micro-grid comprises a power supply, a long-period energy storage device and a short-period energy storage device, and the scheduling method of multi-type energy storage in the micro-grid based on deep Q learning comprises the following steps:
describing the scheduling problem of multi-type energy storage in a micro-grid as a Markov decision process, wherein the state s at the moment t in the Markov decision process t The expression is as follows:
wherein,indicating the power generation capacity of the power supply>Representing the internal load consumption of the micro-grid, +.>Representing the charge of the short-period energy storage device, +.>Representing the charge of a long-period energy storage device, +.>Respectively representing the service life of a power supply, the service life of short-period energy storage equipment and the service life of long-period energy storage equipment, P t (SS) Representing the upper limit of the charge and discharge power of the short-period energy storage device, wherein the lower limit of the charge and discharge power of the short-period energy storage device is-P t (SS)
W t (LS) Representing the upper capacity limit of the long-period energy storage device,the power generation efficiency, the charging efficiency of the short-period energy storage device and the charging efficiency of the long-period energy storage device are represented;
representing the discharge efficiency of the short-period energy storage device and the discharge efficiency of the long-period energy storage device;
r t (SS) 、r t (LS) representing capacity retention rates of repeated charge and discharge of the short-period energy storage device and the long-period energy storage device;
μ t representing electricity price and demand side response information of the transaction with the main network;
the actions in the Markov decision process are represented as follows:
wherein,the electric quantity exchanged between the short-period energy storage and the long-period energy storage and the micro-grid alternating current bus is respectively;
in the Markov decision process, the instant reward function at the moment t is expressed as follows:
wherein delta t The balance power is balanced, beta is the electricity price when the micro-grid sells electricity to the main grid, k is the electricity price when the micro-grid buys electricity from the main grid, delta t represents the time interval of signal acquisition, and the penalty is the cost when the micro-grid cuts off the general load;
and constructing and training a deep Q learning neural network model, outputting corresponding actions according to the current state of the input micro-grid, adopting the trained deep Q learning neural network model, inputting the current state of the micro-grid, outputting corresponding actions, and scheduling according to the output actions.
Further, the constraints of actions in the markov decision process are as follows:
discharge constraint:
charging constraint:
further, the balance power delta t The calculation formula is as follows:
further, when the power supply includes a plurality of power supply devices, the state further includes a power supply power generation amount, a power supply lifetime, and a power supply power generation efficiency corresponding to each power supply.
Further, when the long-period energy storage device and the short-period energy storage device respectively include a plurality of energy storage devices, the state further includes a charge amount, a service life of the energy storage devices, an upper limit of charge and discharge power, an upper limit of capacity, charging efficiency, discharging efficiency and capacity retention rate of repeated charge and discharge corresponding to each energy storage device.
The application also provides a scheduling device of multi-type energy storage in little electric wire netting based on degree of depth Q study, little electric wire netting includes power, long period energy storage equipment and short period energy storage equipment, scheduling device of multi-type energy storage in little electric wire netting based on degree of depth Q study includes:
the configuration module is used for describing the scheduling problem of the multi-type energy storage in the micro-grid as a Markov decision process, and the state s at the moment t in the Markov decision process t The expression is as follows:
wherein,indicating the power generation capacity of the power supply>Representing the internal load consumption of the micro-grid, +.>Representing the charge of the short-period energy storage device, +.>Representing the charge of a long-period energy storage device, +.>Respectively representing the service life of a power supply, the service life of short-period energy storage equipment and the service life of long-period energy storage equipment, P t (SS) Representing the upper limit of the charge and discharge power of the short-period energy storage device, wherein the lower limit of the charge and discharge power of the short-period energy storage device is-P t (SS)
W t (LS) Representing the upper capacity limit of the long-period energy storage device,the power generation efficiency, the charging efficiency of the short-period energy storage device and the charging efficiency of the long-period energy storage device are represented;
representing the discharge efficiency of the short-period energy storage device and the discharge efficiency of the long-period energy storage device;
r t (SS) 、r t (LS) representing capacity retention rates of repeated charge and discharge of the short-period energy storage device and the long-period energy storage device;
μ t representing electricity price and demand side response information of the transaction with the main network;
the actions in the Markov decision process are represented as follows:
wherein,respectively short-period energy storage and long-period energy storage and micro-electricityThe electric quantity exchanged by the network alternating current bus;
in the Markov decision process, the instant reward function at the moment t is expressed as follows:
wherein delta t The balance power is balanced, beta is the electricity price when the micro-grid sells electricity to the main grid, k is the electricity price when the micro-grid buys electricity from the main grid, delta t represents the time interval of signal acquisition, and the penalty is the cost when the micro-grid cuts off the general load;
the deep learning scheduling module is used for constructing and training a deep Q learning neural network model, outputting corresponding actions according to the current state of the input micro-grid, adopting the trained deep Q learning neural network model, inputting the current state of the micro-grid, outputting the corresponding actions, and scheduling according to the output actions.
Further, the constraints of actions in the markov decision process are as follows:
discharge constraint:
charging constraint:
further, the balance power delta t The calculation formula is as follows:
further, when the power supply includes a plurality of power supply devices, the state further includes a power supply power generation amount, a power supply lifetime, and a power supply power generation efficiency corresponding to each power supply.
Further, when the long-period energy storage device and the short-period energy storage device respectively include a plurality of energy storage devices, the state further includes a charge amount, a service life of the energy storage devices, an upper limit of charge and discharge power, an upper limit of capacity, charging efficiency, discharging efficiency and capacity retention rate of repeated charge and discharge corresponding to each energy storage device.
The application provides a scheduling method and device for multi-type energy storage in a micro-grid based on deep Q learning, which can provide support for the optimized operation of the micro-grid comprising a distributed power supply, the multi-type energy storage, an electricity utilization side and a main grid. The micro-grid user can adjust the energy storage charge quantity by adopting the strategy according to the real-time electricity price information, the self electricity consumption demand and the distributed power supply generating quantity, and reasonably arrange the electricity consumption time and the electricity consumption quantity, thereby realizing the aim of maximizing the operation income of the micro-grid. The strategy is based on Deep-Q-Learning algorithm, is a model-free control method, namely, no priori knowledge and model of a system are needed, and the optimal energy storage scheduling strategy is obtained through continuous interaction of an intelligent agent and a micro-grid scheduling environment, so that influence caused by model inaccuracy is avoided. Another major feature of this strategy is that it is applicable to micro-grids that contain multiple types of energy storage technologies (e.g., energy storage lithium batteries, fuel cells, flywheel energy storage, supercapacitors, etc.).
Drawings
Fig. 1 is a schematic diagram of a networking structure of a micro-grid of the present application;
fig. 2 is a flowchart of a scheduling method of multi-type energy storage in the micro-grid based on deep Q learning.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The method solves the scheduling optimization problem of multi-type energy storage in the micro-grid based on Deep Q Learning (Deep-Q-Learning), and can avoid the influence caused by model inaccuracy.
As shown in fig. 1, the micro grid system includes: distributed power sources, multi-type energy storage devices, loads, energy management systems, ac bus networks, and communication control networks. Distributed power sources include photovoltaic power generation, wind power generation, medium and small gas turbine power generation, biomass power generation, and the like. The multi-type energy storage device comprises a long-period energy storage device and a short-period energy storage device, wherein the long-period energy storage device comprises but is not limited to a hydrogen production-hydrogen storage-hydrogen-oxygen fuel cell device through water electrolysis, and the short-period energy storage device comprises but is not limited to an energy storage lithium battery. The load is often divided into a general load and a sensitive load. 1 denotes a common connection point, 2 denotes an inverter, 3 denotes a circuit breaker, 4 denotes an in-situ control agent, wherein the dashed connection denotes a communication and control network, and the solid connection denotes a power network.
The distributed power supply is connected with the alternating current bus through an inverter, and the energy storage equipment is connected with the alternating current bus through a bidirectional inverter. The alternating current bus is connected with a main power grid through a public connection point. The energy management system receives various data sent by the local controller of the micro-grid through a communication control network, processes and stores the data, analyzes the running condition of the micro-grid, graphically displays the running condition, and monitors equipment; and simultaneously, based on analysis results, issuing a scheduling control command or issuing a power exchange plan curve or issuing an operation control strategy to each relevant on-site controller of the micro-grid, and coordinating and managing the operation of each device of the micro-grid. For example: and controlling the removal of the general load, the charging and discharging of the energy storage equipment and the like based on the renewable energy generating capacity, the load consumption electric quantity and the main power grid electricity price. The scheduling method for multi-type energy storage in the micro-grid based on deep Q learning is applied to an energy management system to obtain an optimal scheduling strategy.
The application describes the composite energy storage scheduling problem for the distributed micro-grid as a markov decision process (Markov Decision Process, MDP) so that it can be optimally solved using a reinforcement learning algorithm. MDP has Markov properties (Markov properties), which means that the state of the system at the next moment is only related to the current state and not to the state at an earlier moment. On the basis of markov, MDP considers the effect of actions taken by the system in the current state on the next time state of the system, namely: the state of the system at the next moment is related to the current state and the actions taken in this state. MDP is a time-discrete random control process.
Expressed in mathematical language, an MDP consists of five tuples:
M=(S,A,T,R,γ)
s represents all state sets of the environment where the agent is located, and each state is the perception of the agent on the current environment; a represents a set of all actions that the agent can take; t: S.times.A.times.S.fwdarw.0, 1]Representing a state transition probability function; r:representing an instant prize function; gamma e 0, 1) represents a discount factor. In MDP, the system is fully observable, meaning that the agent's observations of the environment are equal to the system state. For each instant, the system is from state s t Transition to state s t+1 Is determined by a state transition function T (s t ,a t ,s t+1 ) Given, wherein: s is(s) t ,s t+1 ∈S,a t E A; in state s t Take action a down t Transition to state s t+1 The instant rewards are generated by an instant rewarding function R (s t ,a t ,s t+1 ) Given. Which action a is selected in a given state S is determined by the selected policy. In the reinforcement learning algorithm, common strategies are divided into a static strategy and a dynamic strategy, wherein the static strategy refers to a strategy for selecting actions is not changed along with the learning process, and the dynamic strategy is opposite to the static strategy; according to another classification criterion, the policies of selecting actions can be further classified into deterministic policies and stochastic policies, the deterministic policies giving the selected actions based on the input state, namely:
π(s):S→A
a mapping relationship is established between the state space and the action space.
The randomness strategy calculates the probability of selecting an action according to the input state, namely:
π(s,a):S×A→[0,1]
where pi (S, a) represents the probability that action a is selected in state S.
The goal of reinforcement learning algorithms is to find the optimal action selection strategy, namely: determining a policy pi epsilon pi (pi is a set of all policies) so that the expected value of rewards in the whole decision process is the largest, namely:
wherein alpha is k Representing the weight coefficient associated with each time step in the learning process, alpha for MDP containing discounting factors k =γ k 。s t =s denotes learning from state s. H is the size of the state space, and may be infinite or finite.
Solving the MDP problem directly based on statistical theory faces many limitations and difficulties. Alternatively, a value-based approach may be used to solve by defining a value function, which is a prediction of the total future rewards, approaching an optimal strategy, the Q-learning algorithm being the simplest and most well-known one. Value-based methods typically do not directly utilize V π (s) but uses the Q value function Q π (S, a) SxA.fwdarw.R, defined as follows:
since MDP has Markov properties, if a deterministic action strategy is used, the first equation above can be written as:
the Q-value function is characterized in that the optimal strategy can be directly obtained from:
without relying on a model.
In one embodiment, as shown in fig. 2, a scheduling method of multi-type energy storage in a micro-grid based on deep Q learning is provided, where the micro-grid includes a power source, a long-period energy storage device and a short-period energy storage device, and the scheduling method of multi-type energy storage in the micro-grid based on deep Q learning is characterized by comprising:
and S1, describing the scheduling problem of multi-type energy storage in the micro-grid as a Markov decision process, and determining states, actions and rewards in the Markov decision process.
In particular to the problem of multi-type energy storage scheduling of a distributed micro-grid, firstly, a state space is to be defined, and the state space of the distributed intelligent micro-grid at the moment t of the application is expressed as follows:
wherein,
generating capacity of a power supply, and KW;
the internal load of the micro-grid consumes electric quantity, and the unit KW;
the charge quantity of the short-period energy storage device and the long-period energy storage device is KW.h;
power supply life, short cycle energy storage device life, long cycle energy storage device life, unit seconds;
P t (SS) : the upper limit of charge and discharge power of the short-period energy storage device, the unit KW and the lower limit of the charge and discharge power are-P t (SS)
W t (LS) : the upper limit of the capacity of the long-period energy storage device is KW.h;
the power generation efficiency of the power supply, the charging efficiency of the short-period energy storage device and the charging efficiency of the long-period energy storage device;
short-period energy storage device discharge efficiency and long-period energy storage device discharge efficiency;
r t (SS) ,r t (LS) : capacity retention rate of repeated charge and discharge of the short-period energy storage device and the long-period energy storage device;
μ t : the unit is the unit of electricity price (comprising electricity purchasing price) and demand side response information (punishment of general load is cut off) which are transacted with the main network.
It should be noted that, the above state control lists only one parameter of each type of device, and under the framework of the present application, the number of each type of device can be conveniently expanded into a plurality of devices.
For example, the power source may be a photovoltaic power generation power source, a hydroelectric power source, a wind power generation power source, or the like, and when the power source includes a plurality of power source devices, the state also includes power source power generation amount, power source lifetime, power source power generation efficiency, which correspond to the respective power sources.
For another example, the long-period energy storage device may be a hydrogen production fuel cell, a hydrogen storage fuel cell, an oxyhydrogen fuel cell, etc., and the short-period energy storage device may be a lithium battery, a lead-acid livestock battery, etc.
When the long-period energy storage device and the short-period energy storage device respectively comprise a plurality of energy storage devices, the state further comprises the charge quantity, the service life of the energy storage devices, the upper limit of charge and discharge power, the upper limit of capacity, the charging efficiency, the discharging efficiency and the capacity retention rate of repeated charge and discharge of each energy storage device.
Next, an action space of the micro-grid agent is defined as follows:
wherein,the electric quantity is the electric quantity exchanged between the short-period energy storage and the long-period energy storage and the micro-grid alternating current bus, the unit is KW, the value is positive during charging, the value is negative during discharging, and the value is zero during idle. The action space needs to satisfy the following constraint conditions:
discharge constraint:
the charge-up constraints are set up such that,
finally, the instant prize function at time t is defined as:
wherein delta t Is the balance power, expressed as:
beta is the electricity price when the micro-grid sells electricity to the main grid. k is the electricity price when the micro grid purchases electricity from the main grid. The penalty is the cost paid when the microgrid cuts off the general load. Δt represents the time interval of signal acquisition. When delta t And when the energy generation amount of the renewable energy source represented by the photovoltaic of the micro-grid is more than or equal to 0, the allowance exists. In this case, if grid-connected operation is performed, the micro-grid sells the surplus to the main grid, and the obtained benefits are used as instant rewards; if the micro-grid is in island operation, the micro-grid can only reduce the energy generation capacity of renewable energy, and the instant reward is 0. When delta t When the power generation amount of the renewable energy source represented by photovoltaic is less than 0, the load power consumption requirement of the micro-grid cannot be met. In this case, if grid-connected operation is performed, the micro-grid buys electricity from the main grid, and the paying cost is taken as a negative timely reward; in the case of island operation, the microgrid cuts off the general load and penalizes it for this purpose.
And S2, constructing and training a deep Q learning neural network model, outputting corresponding actions according to the current state of the input micro-grid, adopting the trained deep Q learning neural network model, inputting the current state of the micro-grid, outputting the corresponding actions, and scheduling according to the output actions.
It is easy to understand that using a table to represent the mapping relationship between Q values and agent states, taking actions, the size of the problem that can be solved is very limited, because as the problem is continuously approaching engineering practice, the dimensions of the state space and the action space that can be taken tend to become very large, such as: when the processing object is an image, a sound, or a time series, the input data tends to have a high dimension, and conventional reinforcement learning is difficult to process.
The method adopts the deep Q learning neural network model to acquire the scheduling strategy, constructs and trains the deep Q learning neural network model, and outputs corresponding actions according to the current state of the input micro-grid. The deep Q learning neural network model combines deep learning with reinforcement learning with high-dimensional input processing capability. Deep convolutional neural networks are one of the most successful areas for deep learning algorithm application. The basic structure of the deep convolutional neural network comprises two layers, namely a feature extraction layer and a feature mapping layer. The method comprises the steps of firstly filtering high-latitude input by using convolution check, extracting a feature map, adding a nonlinear factor by using an activation function, and finally further reducing the dimension by pooling, wherein the high-dimension data flows through a deep convolution layer formed by a plurality of layers of convolution-activation function-pooling or convolution-activation function- … … -pooling, and the dimension is often reduced to an acceptable range. And inputting the output of the low-dimensional depth convolution layer into a full connection layer to finish the final mapping.
Training and application of neural network models based on deep Q learning have been relatively sophisticated techniques. The method adopts a neural network model of deep Q learning, and outputs corresponding actions according to the current state of the input micro-grid.
The training of the network model can be performed offline, and then the network model is deployed into an energy management system of the distributed intelligent micro-grid, or can be performed online, namely, the training of the network model and other modules (such as a real-time data monitoring module and a prediction module) of the energy management system are coupled together, and the deep convolutional neural network mapping the Q value is adjusted in real time.
After training is completed, the current state of the micro-grid is input into a trained deep Q-learning neural network model, the action with the optimal Q function is output, and the action is adopted to schedule the energy storage equipment to store energy.
In another embodiment, the application further provides a scheduling apparatus for multi-type energy storage in a micro-grid based on deep Q learning, where the micro-grid includes a power source, a long-period energy storage device and a short-period energy storage device, and the scheduling apparatus for multi-type energy storage in the micro-grid based on deep Q learning includes:
the configuration module is used for describing the scheduling problem of the multi-type energy storage in the micro-grid as a Markov decision process, and the state s at the moment t in the Markov decision process t The expression is as follows:
wherein,indicating the power generation capacity of the power supply>Representing the internal load consumption of the micro-grid, +.>Representing the charge of the short-period energy storage device, +.>Representing the charge of a long-period energy storage device, +.>Respectively representing the service life of a power supply, the service life of short-period energy storage equipment and the service life of long-period energy storage equipment, P t (SS) Representing the upper limit of the charge and discharge power of the short-period energy storage device, wherein the lower limit of the charge and discharge power of the short-period energy storage device is-P t (SS)
W t (LS) Representing the upper capacity limit of the long-period energy storage device,the power generation efficiency, the charging efficiency of the short-period energy storage device and the charging efficiency of the long-period energy storage device are represented;
representing the discharge efficiency of the short-period energy storage device and the discharge efficiency of the long-period energy storage device;
r t (SS) 、r t (LS) representing capacity retention rates of repeated charge and discharge of the short-period energy storage device and the long-period energy storage device;
μ t representing electricity price and demand side response information of the transaction with the main network;
the actions in the Markov decision process are represented as follows:
wherein,the electric quantity exchanged between the short-period energy storage and the long-period energy storage and the micro-grid alternating current bus is respectively;
in the Markov decision process, the instant reward function at the moment t is expressed as follows:
wherein delta t The balance power is balanced, beta is the electricity price when the micro-grid sells electricity to the main grid, k is the electricity price when the micro-grid buys electricity from the main grid, delta t represents the time interval of signal acquisition, and the penalty is the cost when the micro-grid cuts off the general load;
the deep learning scheduling module is used for constructing and training a deep Q learning neural network model, outputting corresponding actions according to the current state of the input micro-grid, adopting the trained deep Q learning neural network model, inputting the current state of the micro-grid, outputting the corresponding actions, and scheduling according to the output actions.
For specific limitations regarding the scheduling apparatus of multi-type energy storage in the micro-grid based on deep Q learning, reference may be made to the above limitation regarding the scheduling method of multi-type energy storage in the micro-grid based on deep Q learning, and the description thereof will not be repeated here. The modules in the scheduling device for multi-type energy storage in the micro-grid based on deep Q learning can be all or partially realized by software, hardware and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
The memory and the processor are electrically connected directly or indirectly to each other for data transmission or interaction. For example, the components may be electrically connected to each other by one or more communication buses or signal lines. The memory stores a computer program that can be executed on a processor that implements the network topology layout method in the embodiment of the present invention by executing the computer program stored in the memory.
The Memory may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc. The memory is used for storing a program, and the processor executes the program after receiving an execution instruction.
The processor may be an integrated circuit chip having data processing capabilities. The processor may be a general-purpose processor including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), and the like. The methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (10)

1. The method for dispatching the multi-type energy storage in the micro-grid based on deep Q learning is characterized by comprising the following steps of:
describing the scheduling problem of multi-type energy storage in a micro-grid as a Markov decision process, wherein the state s at the moment t in the Markov decision process t The expression is as follows:
wherein,indicating the power generation capacity of the power supply>Representing the internal load consumption of the micro-grid, +.>Representing the charge of the short-period energy storage device, +.>Representing the charge of a long-period energy storage device, +.>Respectively representing the service life of a power supply, the service life of short-period energy storage equipment and the service life of long-period energy storage equipment, P t (SS) Representing the upper limit of the charge and discharge power of the short-period energy storage device, wherein the lower limit of the charge and discharge power of the short-period energy storage device is-P t (SS)
W t (LS) Representing the upper capacity limit of the long-period energy storage device,the power generation efficiency, the charging efficiency of the short-period energy storage device and the charging efficiency of the long-period energy storage device are represented;
representing the discharge efficiency of the short-period energy storage device and the discharge efficiency of the long-period energy storage device;
r t (SS) 、r t (LS) representing capacity retention rates of repeated charge and discharge of the short-period energy storage device and the long-period energy storage device;
μ t representing electricity price and demand side response information of the transaction with the main network;
the actions in the Markov decision process are represented as follows:
wherein,the electric quantity exchanged between the short-period energy storage and the long-period energy storage and the micro-grid alternating current bus is respectively;
in the Markov decision process, the instant reward function at the moment t is expressed as follows:
wherein delta t The balance power is balanced, beta is the electricity price when the micro-grid sells electricity to the main grid, k is the electricity price when the micro-grid buys electricity from the main grid, delta t represents the time interval of signal acquisition, and the penalty is the cost when the micro-grid cuts off the general load;
and constructing and training a deep Q learning neural network model, outputting corresponding actions according to the current state of the input micro-grid, adopting the trained deep Q learning neural network model, inputting the current state of the micro-grid, outputting corresponding actions, and scheduling according to the output actions.
2. The deep Q learning based scheduling method for multi-type energy storage within a micro grid of claim 1, wherein constraints of actions in the markov decision process are as follows:
discharge constraint:
charging constraint:
3. the deep Q learning based scheduling method for multiple types of energy storage within a microgrid of claim 1, wherein the balance power δ t The calculation formula is as follows:
4. the deep Q learning-based scheduling method for multi-type energy storage in a micro grid according to claim 1, wherein when the power supply includes a plurality of power supply devices, the state further includes a power supply power generation amount, a power supply lifetime, and a power supply power generation efficiency corresponding to each power supply.
5. The method for scheduling multi-type energy storage in a micro-grid based on deep Q learning according to claim 1, wherein when the long-period energy storage device and the short-period energy storage device respectively comprise a plurality of energy storage devices, the states further comprise the charge amount, the service life of the energy storage devices, the upper limit of charge and discharge power, the upper limit of capacity, the charging efficiency, the discharging efficiency and the capacity retention rate of repeated charge and discharge of each energy storage device.
6. The utility model provides a scheduling device of multi-type energy storage in little electric wire netting based on degree of depth Q study, little electric wire netting includes power, long period energy storage equipment and short period energy storage equipment, its characterized in that, the scheduling device of multi-type energy storage in little electric wire netting based on degree of depth Q study includes:
the configuration module is used for describing the scheduling problem of the multi-type energy storage in the micro-grid as a Markov decision process, and the state s at the moment t in the Markov decision process t The expression is as follows:
wherein,indicating the power generation capacity of the power supply>Representing the internal load consumption of the micro-grid, +.>Representing the charge of the short-period energy storage device, +.>Representing the charge of a long-period energy storage device, +.>Respectively representing the service life of a power supply, the service life of short-period energy storage equipment and the service life of long-period energy storage equipment, P t (SS) Representing the upper limit of the charge and discharge power of the short-period energy storage device, wherein the lower limit of the charge and discharge power of the short-period energy storage device is-P t (SS)
W t (LS) Representing the upper capacity limit of the long-period energy storage device,the power generation efficiency, the charging efficiency of the short-period energy storage device and the charging efficiency of the long-period energy storage device are represented;
representing the discharge efficiency of the short-period energy storage device and the discharge efficiency of the long-period energy storage device;
r t (SS) 、r t (LS) representing capacity retention rates of repeated charge and discharge of the short-period energy storage device and the long-period energy storage device;
μ t representing electricity price and demand side response information of the transaction with the main network;
the actions in the Markov decision process are represented as follows:
wherein,the electric quantity exchanged between the short-period energy storage and the long-period energy storage and the micro-grid alternating current bus is respectively;
in the Markov decision process, the instant reward function at the moment t is expressed as follows:
wherein delta t The balance power is balanced, beta is the electricity price when the micro-grid sells electricity to the main grid, k is the electricity price when the micro-grid buys electricity from the main grid, delta t represents the time interval of signal acquisition, and the penalty is the cost when the micro-grid cuts off the general load;
the deep learning scheduling module is used for constructing and training a deep Q learning neural network model, outputting corresponding actions according to the current state of the input micro-grid, adopting the trained deep Q learning neural network model, inputting the current state of the micro-grid, outputting the corresponding actions, and scheduling according to the output actions.
7. The deep Q learning based scheduling apparatus for multi-type energy storage within a microgrid of claim 6, wherein constraints on actions in the markov decision process are as follows:
discharge constraint:
charging constraint:
8. the deep Q learning based scheduling apparatus for multi-type energy storage within a micro-grid of claim 6, wherein the balance power δ t The calculation formula is as follows:
9. the deep Q learning based scheduling apparatus for multi-type energy storage in a micro-grid of claim 6, wherein when the power source comprises a plurality of power source devices, the state further comprises a power source power generation amount, a power source lifetime, and a power source power generation efficiency corresponding to each power source.
10. The deep Q learning-based scheduling apparatus for multi-type energy storage in a micro-grid according to claim 6, wherein when the long-period energy storage device and the short-period energy storage device respectively include a plurality of energy storage devices, the states further include a charge amount, a lifetime of the energy storage devices, an upper limit of charge and discharge power, an upper limit of capacity, a charge efficiency, a discharge efficiency, and a capacity retention rate of repeated charge and discharge corresponding to each energy storage device.
CN202111654110.6A 2021-12-30 2021-12-30 Scheduling method and device for multi-type energy storage in micro-grid based on deep Q learning Active CN114362218B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111654110.6A CN114362218B (en) 2021-12-30 2021-12-30 Scheduling method and device for multi-type energy storage in micro-grid based on deep Q learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111654110.6A CN114362218B (en) 2021-12-30 2021-12-30 Scheduling method and device for multi-type energy storage in micro-grid based on deep Q learning

Publications (2)

Publication Number Publication Date
CN114362218A CN114362218A (en) 2022-04-15
CN114362218B true CN114362218B (en) 2024-03-19

Family

ID=81104201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111654110.6A Active CN114362218B (en) 2021-12-30 2021-12-30 Scheduling method and device for multi-type energy storage in micro-grid based on deep Q learning

Country Status (1)

Country Link
CN (1) CN114362218B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116345577B (en) * 2023-05-12 2023-08-08 国网天津市电力公司营销服务中心 Wind-light-storage micro-grid energy regulation and optimization method, device and storage medium
CN117335439B (en) * 2023-11-30 2024-02-27 国网浙江省电力有限公司 Multi-load resource joint scheduling method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109347149A (en) * 2018-09-20 2019-02-15 国网河南省电力公司电力科学研究院 Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning
CN112529727A (en) * 2020-11-06 2021-03-19 台州宏远电力设计院有限公司 Micro-grid energy storage scheduling method, device and equipment based on deep reinforcement learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109347149A (en) * 2018-09-20 2019-02-15 国网河南省电力公司电力科学研究院 Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning
CN112529727A (en) * 2020-11-06 2021-03-19 台州宏远电力设计院有限公司 Micro-grid energy storage scheduling method, device and equipment based on deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于深度强化学习的微电网复合储能协调控制方法";张自东等;《电网技术》;第43卷(第6期);全文 *

Also Published As

Publication number Publication date
CN114362218A (en) 2022-04-15

Similar Documents

Publication Publication Date Title
Babu et al. A comprehensive review of hybrid energy storage systems: Converter topologies, control strategies and future prospects
Li et al. Sizing of a stand-alone microgrid considering electric power, cooling/heating, hydrogen loads and hydrogen storage degradation
Acevedo-Arenas et al. MPC for optimal dispatch of an AC-linked hybrid PV/wind/biomass/H2 system incorporating demand response
Dufo-Lopez et al. Optimization of control strategies for stand-alone renewable energy systems with hydrogen storage
Qiu et al. Tri-level mixed-integer optimization for two-stage microgrid dispatch with multi-uncertainties
CN114362218B (en) Scheduling method and device for multi-type energy storage in micro-grid based on deep Q learning
Mei et al. Stochastic optimization of multi-energy system operation considering hydrogen-based vehicle applications
CN114021390A (en) Random robust optimization method for urban comprehensive energy system and application thereof
CN106684915A (en) Wind-hydrogen coupling power generation system optimization method and device thereof
Li et al. A stochastic programming strategy in microgrid cyber physical energy system for energy optimal operation
CN105305423A (en) Method for determining optimal error boundary considering intermittent energy uncertainty
Dimopoulou et al. A Markov decision process for managing a hybrid energy storage system
CN103326388A (en) Power prediction based micro-grid energy storage system and capacity configuration method
CN107017625A (en) The method and apparatus that energy dynamics for independent micro-capacitance sensor are dispatched
CN103326389A (en) Power prediction based micro-grid energy storage system and capacity configuration method
Wen et al. Data-driven energy management system for flexible operation of hydrogen/ammonia-based energy hub: A deep reinforcement learning approach
Zhang et al. Optimal energy and reserve scheduling in a renewable-dominant power system
Hashmi et al. Power energy management for a grid-connected PV system using rule-base fuzzy logic
CN116061742B (en) Charging control method and system for electric automobile in time-of-use electricity price photovoltaic park
Ran et al. Economic dispatch of off-grid photovoltaic generation system with hybrid energy storage
CN115189409A (en) Power system production simulation method and device, computer equipment and storage medium
Rezk et al. Hydrogen reduction-based energy management strategy of hybrid fuel cell/PV/battery/supercapacitor renewable energy system
CN110929908B (en) Collaborative optimization method and system for capacity allocation and economic scheduling of multi-microgrid system
Gong et al. Economic dispatching strategy of double lead-acid battery packs considering various factors
Alturki et al. Sizing and Cost Minimization of Standalone Hybrid WT/PV/Biomass/Pump-Hydro Storage-Based Energy Systems. Energies 2021, 14, 489

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant