CN117277327A - Grid-connected micro-grid optimal energy management method based on intelligent agent - Google Patents

Grid-connected micro-grid optimal energy management method based on intelligent agent Download PDF

Info

Publication number
CN117277327A
CN117277327A CN202311206909.8A CN202311206909A CN117277327A CN 117277327 A CN117277327 A CN 117277327A CN 202311206909 A CN202311206909 A CN 202311206909A CN 117277327 A CN117277327 A CN 117277327A
Authority
CN
China
Prior art keywords
grid
micro
power
constraint
optimal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311206909.8A
Other languages
Chinese (zh)
Inventor
杨志淳
姚志荣
沈煜
杨帆
李进扬
崔世常
闵怀东
雷杨
胡伟
吴畏
姚金林
操燕春
方石磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd
Original Assignee
Huazhong University of Science and Technology
Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology, Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd filed Critical Huazhong University of Science and Technology
Priority to CN202311206909.8A priority Critical patent/CN117277327A/en
Publication of CN117277327A publication Critical patent/CN117277327A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/04Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
    • H02J3/06Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/007Arrangements for selectively connecting the load or loads to one or several among a plurality of power lines or power sources
    • H02J3/0075Arrangements for selectively connecting the load or loads to one or several among a plurality of power lines or power sources for providing alternative feeding paths between load and source according to economic or energy efficiency considerations, e.g. economic dispatch
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/008Circuit arrangements for ac mains or ac distribution networks involving trading of energy or energy transmission rights
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/381Dispersed generators
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/48Controlling the sharing of the in-phase component
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/50Controlling the sharing of the out-of-phase component
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/22The renewable source being solar energy
    • H02J2300/24The renewable source being solar energy of photovoltaic origin
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/28The renewable source being wind energy

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Power Engineering (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Primary Health Care (AREA)
  • Biophysics (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

An agent-based grid-connected micro-grid optimal energy management method, comprising: (1) Modeling a sequential energy management process of a grid-connected micro-grid which comprises different types of distributed power sources, energy storage devices and user loads and can purchase and sell electric quantity to a power distribution network as a Markov decision process; (2) Designing a reward function of a Markov decision process based on exchange power constraint, tide constraint and voltage constraint of the micro-grid and the power distribution network; (3) And solving the optimal stability strategy of the established Markov decision process by adopting a deep Q learning method, namely the optimal energy management strategy of the micro-grid. Compared with the prior art, the invention has the beneficial effects that: from historical data, uncertainty is learned, an optimal strategy is learned through continuous interaction with the environment, the output action is quite close to the running cost of an optimal solution obtained by mixed integer quadratic programming when the uncertainty factor is completely and accurately predicted, the calculation time is shorter, and the running cost of the micro-grid can be effectively reduced.

Description

Grid-connected micro-grid optimal energy management method based on intelligent agent
Technical Field
The invention relates to the field of electrical engineering, in particular to an optimal energy management method for a micro-grid.
Background
The renewable energy source is a necessary way for developing the energy source in China, and by the end of 2022, the renewable energy source installation in China breaks through 12 hundred million kilowatts to 12.13 hundred million kilowatts, and the renewable energy source installation accounts for 47.3 percent of the total installation of the national power generation and is improved by 2.5 percent compared with 2021. Wherein, wind power is 3.65 hundred million kilowatts, solar energy is 3.93 hundred million kilowatts. However, the output of the distributed energy sources such as the photovoltaic energy sources, the fan energy sources and the like depends on the distribution characteristics of the renewable energy sources, the randomness and the fluctuation are obvious, and certain challenges are brought to the planning, the operation and the management of the power distribution network if the power distribution network is accessed in a large-scale and distributed manner. The distributed power sources such as the photovoltaic power source, the fan power source and the like are connected into a power grid in the form of a micro-grid, so that the method is an effective way for effectively solving the problem of large-scale application of the distributed renewable energy sources and further improving the installed capacity of the renewable energy sources.
The micro-grid is a small power generation, distribution and utilization system formed by integrating a distributed power supply, an energy storage system, an energy conversion device, a monitoring and protecting device, a load and the like. The micro-grid can be regarded as a small-sized power system, has complete power generation and distribution functions, and can effectively realize energy optimization in the system. The micro-grid can be applied to independent operation in remote areas or islands, can also be connected to grid-connected operation in a power distribution network, and can provide auxiliary services such as power support and standby for the power distribution network while meeting self-load requirements.
The effective energy management of the micro-grid system can optimize operation and reduce cost, and the existing methods such as a mixed integer quadratic programming method are highly dependent on the prediction accuracy of uncertain factors in the system, so that the future fan, photovoltaic power and load demands cannot be accurately predicted in practice, and therefore, the solving results of the methods are difficult to directly apply. In addition, the scale of the micro-grid is further enlarged, the uncertainty of the system is changed, and the conventional method is difficult to provide a universal solution framework.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides an intelligent body-based grid-connected type micro-grid optimal energy management method which can effectively reduce the running cost of the micro-grid through constantly interacting with the environment to learn an optimal strategy.
The invention discloses a grid-connected micro-grid optimal energy management method based on an intelligent agent, which comprises the following steps of:
(1) Modeling a grid-connected micro-grid energy management process as a Markov decision process, wherein state variables of an intelligent agent comprise output power of different distributed power supplies in the micro-grid, active power and reactive power requirements of resident loads, node electricity prices and stored energy of an energy storage device; the action of the intelligent body consists of active power, reactive power of a conventional distributed power supply and charge and discharge power of an energy storage device;
(2) Considering a reward function of the operation constraint of the micro-grid, wherein the reward function of the intelligent agent comprises the operation cost of the micro-grid, namely the power generation cost of a conventional distributed power supply and the electricity purchasing and selling cost of the micro-grid to the power distribution network; meanwhile, the operation constraint of the micro-grid is considered in the reward function, wherein the operation constraint comprises exchange power constraint, tide constraint, voltage constraint and energy storage constraint of the micro-grid and the power distribution network, and an energy management scheme which violates constraint conditions is not output according to an optimal strategy obtained through learning of the reward function;
(3) And solving the optimal stable strategy of the established Markov decision process by adopting a deep Q learning method, wherein an intelligent agent and a micro-grid environment interact once to obtain a group of samples comprising the current state, the action of the intelligent agent, the obtained rewards and the state at the next moment, and learning an optimal action value network by utilizing the samples to obtain an optimal strategy, wherein the optimal strategy outputs an optimal energy management scheme.
In the step (1) of the invention, the state variable of the intelligent agent in the energy management problem of the grid-connected micro grid meets the markov property; aiming at a micro-grid system comprising a conventional distributed power supply, a wind driven generator, a distributed photovoltaic, an energy storage device and a resident load;
the state of the constructed Markov decision process isWherein->Respectively represents the output power of photovoltaic and fan in the past 24 hours, +.>Respectively representing the power requirements of the past 24 hours load, R t Represents node electricity prices for the past 24 hours, E t Representing the stored energy of the energy storage device over the past 24 hours. Action as-> Is a vector of active power output by a t-period conventional distributed generator, +.>Respectively representing the active power output by the kth conventional distributed motor in t period, +.>The charge and discharge power of the energy storage device at the time t is represented, the charge state is represented when the charge and discharge power is positive, and the discharge state is represented when the charge and discharge power is negative; furthermore, conventional distributed generators and energy storage devices need to satisfy the following constraints, respectively:
and->Representing maximum and minimum output power, respectively, < > of a conventional distributed power supply>Maximum charge and discharge power of the energy storage device; the action space of the markov decision process is therefore: />
In the step (2) of the invention, aiming at the micro-grid comprising the conventional distributed power supply, the wind driven generator, the distributed photovoltaic, the energy storage device and the residential load, the operation cost of the micro-grid is contained in the reward function, and the operation constraint of the micro-grid is considered in the reward function; when the action of the intelligent body output cannot meet the constraint condition, a smaller rewarding value is obtained, so that the optimal action of the intelligent body output trained by adopting the rewarding function cannot violate the constraint condition;
the invention optimizes the operation cost of the micro-grid, and when the operation constraint of the micro-grid is satisfied, the rewarding function of the intelligent agent is as follows:
wherein r is t Indicating the rewards of the t-th decision,and->The cost of the kth conventional distributed power supply and the electricity purchasing cost of the micro-grid in the t period are respectively calculated according to the following formulas:
wherein a is d ,b d ,c d As a factor of the cost of the material,to exchange power with the distribution network, when->For positive value, purchasing electricity to the power distribution network, and for negative value, selling electricity to the power distribution network, R t For real-time electricity prices, Δt is the running step. r is (r) t Is a negative number of costs, thus maximizing r t Meaning that the cost is minimized;
the formula is a reward function when the constraint condition is met, and the following constraint condition is specifically considered when the constraint reward function is designed and considered:
(1) Tidal current constraint
Wherein,and->Respectively representing the active power and the reactive power flowing through the branch ij in the t period,/>Indicating the maximum apparent power allowed by branch ij.
(2) Switching power constraints
Wherein,maximum power exchange allowed for the connection line of the micro-grid to the distribution network.
(3) Voltage constraint
Wherein,the voltage of the node i in the t period and the lowest value and the maximum value of the allowable voltage of the node are respectively represented.
(4) Energy storage constraint
E min ≤E t ≤E max (10)
Wherein E is t For the stored energy of the energy storage device during the period t,for the charge and discharge power of the energy storage device in the t period, when u t When 1, the energy storage device is charged, and when u is t When the energy storage device is in a discharge state, the charging and the discharging cannot be performed simultaneously in the same period; η (eta) c And eta d Respectively representing the charging efficiency and the discharging efficiency of the system; e (E) min And E is max Respectively the minimum value and the maximum value of the energy stored by the energy storage device;
when the agent outputs the actionAfter that, it is first checked whether the constraint is satisfied, if satisfied, the reward is calculated according to the formula, and if not, the reward is calculated according to the following formula:
r t =-ζ (11)
wherein ζ is a very large positive number.
Step 3: solving an optimal strategy of the established Markov decision process by adopting a deep Q learning method;
the cumulative return of the Markov decision process is:
wherein, gamma E [0,1] is discount rate, which is used for reducing the return of long-term income; the state cost function of the markov decision process is:
the intelligent agent and the micro-grid environment in the deep Q learning are interactively sampled to obtain samples (state, action, rewarding and next time state), and parameters of a cost function are updated by using the samples, so that an optimal strategy with the maximum state cost function for all states s is obtained;
action cost function Q π (s, a) represents the expected return that would be expected to be obtainable by selecting action a in state s and then following policy pi; in deep Q learning, the action cost function is modeled as a multi-layer neural network Q w (s, a), the input of the neural network is the state s, and the output is the Q value of each action; to solve instability of neural network training, target networkThe method is used for calculating TD errors, the target network and the training network have the same structure and different parameters; the playback buffer area is used for storing four-tuple data obtained by sampling from the environment, so that training data is better facilitated;
when the optimal strategy of the constructed Markov decision process is solved by deep Q learning, the state is firstly calculatedInput to the agent, agentOutputting an action according to the current policy>Applied to the micro-grid environment, the micro-grid environment firstly judges whether the constraint is met and calculates a rewarding value, and the environment rewards r t And next state s t+1 Is fed back to the agent, thus obtaining a set of samples (s t ,a t ,r t ,s t+1 ) And stores the samples in the playback buffer. The intelligent agent is according to s t+1 Continuing to sample the environment interactively;
when the data in the playback buffer is sufficient, the Q network is started to be updated, and N groups of samples {(s) are extracted from the Q network each time i ,a i ,r i ,s i+1 )}} i=1,...,N The TD error is calculated for each set of samples:
the target loss for this set of samples is:
learning parameters by minimizing target loss;
gradient descent final learning g using sampled data w The parameters of (s, a) may eventually converge to an optimal value. State s t Resulting actions input to the optimal state cost functionThe optimal energy management scheme of the micro-grid in the period t is obtained.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects: the invention provides a data-driven model-free method, which learns uncertainty from historical data, learns an optimal strategy through continuous interaction with the environment, outputs actions very close to the running cost of an optimal solution obtained by mixed integer quadratic programming when uncertainty factors are completely and accurately predicted, has shorter calculation time, and can effectively reduce the running cost of a micro-grid.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a microgrid system diagram;
FIG. 3 is a schematic diagram of an agent interacting with a microgrid environment;
FIG. 4 is a comparison of the operating costs of an agent-based method and mixed integer quadratic programming to find the optimal solution.
The invention will be further described in detail below with reference to the accompanying drawings.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples
As shown in fig. 1, the method of the present invention includes a markov decision process that establishes a microgrid energy management process model; designing a reward function considering the constraint of the micro-grid; and solving an optimal strategy by adopting a deep Q learning method. Specifically, the following is described.
1. Modeling a grid-connected micro-grid energy management process as a markov decision process:
the present invention is directed to a micro-grid system comprising a conventional distributed power source, wind generator, distributed photovoltaic, energy storage device, and residential load. The micro grid system used in this example is shown in fig. 2, where there are a diesel generator, a photovoltaic, a fan, an energy storage system, and a load. The state of the Markov decision process isWherein->Respectively representing the output power of a photovoltaic and a fan in a micro-grid system of the past 24 hours, +.>Representing the power demand of the load over the past 24 hours, R t Represents node electricity prices for the past 24 hours, E t Representing the stored energy of the energy storage device over the past 24 hours. Action as Is a vector of active power output by a t-period conventional distributed generator, +.>Representing the active power output by the kth conventional distributed motor in t period,/for the period of time>The charge and discharge power of the energy storage device at the time t is represented, the charge state is represented when the charge and discharge power is positive, and the discharge state is represented when the charge and discharge power is negative;
furthermore, conventional distributed generators and energy storage devices need to satisfy the following constraints, respectively:
and->Representing maximum and minimum output power, respectively, < > of a conventional distributed power supply>The maximum charge and discharge power of the energy storage device is obtained. The action space of the markov decision process is therefore: />
2. Based on exchange power constraint, tide constraint and voltage constraint and energy storage constraint of the micro-grid and the power distribution network, a reward function of an agent Markov decision process is designed: the system has the following constraints:
(1) Tidal current constraint
Wherein,and->Respectively representing the active power and the reactive power flowing through the branch ij in the t period,/>Indicating the maximum apparent power allowed by branch ij.
(2) Switching power constraints
Wherein,allowed for connection lines of micro-grid and distribution networkMaximum switching power.
(3) Voltage constraint
Wherein,the voltage of the node i in the t period and the lowest value and the maximum value of the allowable voltage of the node are respectively represented.
(4) Energy storage constraint
E min ≤E t ≤E max (7)
Wherein E is t For the stored energy of the energy storage device during the period t,for the charge and discharge power of the energy storage device in the t period, when u t When 1, the energy storage device is charged, and when u is t When the energy storage device is in a discharge state, the charging and the discharging cannot be performed simultaneously in the same period; η (eta) c And eta d Respectively representing the charging efficiency and the discharging efficiency of the system; e (E) min And E is max Respectively the minimum value and the maximum value of the energy stored by the energy storage device;
when the agent outputs the actionAfter that, it is first checked whether the constraint is satisfied, and if so, the reward is:
wherein r is i Indicating the rewards of the t-th decision,and->The cost of the kth conventional distributed power supply and the electricity purchasing cost of the micro-grid in the t period are respectively calculated according to the following formulas:
wherein a is d ,b d ,c d As a factor of the cost of the material,to exchange power with the distribution network, when->For positive value, purchasing electricity to the power distribution network, and for negative value, selling electricity to the power distribution network, R t For real-time electricity prices Δt is the running step length, here 1 hour.
If not, then calculate as:
r t =-ζ (11)
wherein ζ is set to 10 6
3. Optimal strategy for solving Markov decision process by deep Q learning method
The cumulative return of the Markov decision process is:
wherein, gamma E [0,1] is discount rate, which is used for reducing the return of long-term income and is set to 0.9. The state cost function of the markov decision process is:
deep Q learning obtains samples (states, actions, rewards and next states) through interactive sampling with the environment and utilizes sample learning parameters, so that an optimal strategy with the largest state value function for all states s is obtained;
action cost function Q π (s, a) represents the expected return that would be expected to be obtainable if action a was selected in state s, and then policy pi was followed. Modeling an action cost function as a multi-layer neural network Q w The input of the neural network is the state s, the output is the Q value of each action, the larger the Q value is, the better the action is, the neural network comprises 3 hidden layers and input layer output layers, the number of the hidden layer neurons is 512, and the target network and the training network have the same structure.
The interaction process of the intelligent agent and the micro-grid environment is shown in fig. 3, and the state is firstly setInput to agent, agent output action->Acting on the environment, then the micro-grid environment firstly judges whether the constraint is met and calculates the rewarding value, and the environment rewards r t And next state s t+1 Is fed back to the agent, thus obtaining a set of samples (s t ,a t ,r t ,s t+1 ) The intelligent agent is according to s t+1 Continuing to sample with the environment, the agent stores the samples sampled each time in a playback buffer.
When the data in the playback buffer is sufficient, the Q network is started to be updated, and N groups of samples {(s) are extracted from the Q network each time i ,a i ,r i ,s i+1 )}} i=1,...,N The TD error is calculated for each set of samples:
the target loss for this set of samples is:
parameters are learned by minimizing target losses. The playback buffer size is set to 50000, parameter learning is performed when the number of samples exceeds 1000, and N is set to 256.
Fig. 4 shows comparison of the running cost of the optimal solution obtained by adopting the agent method and the optimal solution obtained by adopting the mixed integer quadratic programming, and the difference between the two is very small, but the prediction information of accurate wind power, photovoltaic output and load is required to be obtained when the mixed integer quadratic programming is adopted for solving, so that the uncertainty does not exist.
However, in practice, the future fan, photovoltaic power and load demands cannot be predicted accurately, so that the solving result of the mixed integer quadratic programming algorithm is difficult to apply directly. The method based on the intelligent agent is a model-free method driven by data, the uncertainty is learned from historical data, the output action is very close to the running cost of an optimal solution obtained by mixed integer quadratic programming when the uncertainty factor is completely and accurately predicted, the calculation time is shorter, and the method is an effective method capable of reducing the running cost of a micro-grid.
The foregoing is merely illustrative embodiments of the present invention, and the present invention is not limited thereto, and any changes or substitutions that may be easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (7)

1. The utility model provides a grid-connected micro-grid optimal energy management method based on an agent, which is characterized by comprising the following steps: (1) Modeling an energy management process of a grid-connected micro-grid which comprises different types of distributed power supplies, energy storage devices and user loads and can purchase and sell electric quantity to a power distribution network as a Markov decision process;
(2) Designing a reward function of a Markov decision process based on the operation constraint of the micro-grid and the power distribution network;
(3) And solving the optimal stability strategy of the established Markov decision process by adopting a deep Q learning method, namely the optimal energy management strategy of the micro-grid.
2. The method for optimal energy management of an agent-based grid-connected micro-grid of claim 1, wherein in step (1), the state variables of the agent include output power of different distributed power sources in the micro-grid, active power and reactive power requirements of residential loads, node electricity prices, and stored energy of an energy storage device; the actions of the intelligent body consist of active power, reactive power of a conventional distributed power supply and charging and discharging power of an energy storage device.
3. The agent-based grid-connected microgrid optimal energy management method of claim 1 or 2, wherein step (1) in the constructed markov decision process framework, the agent state isWherein (1)>Respectively representing the output power of the last 24 hours distributed photovoltaic and wind generator, +.>Representing the power demand of the resident load for the past 24 hours, R t Represents node electricity prices for the past 24 hours, E t Representing stored energy of the past 24 hours energy storage device; the action of the intelligent body is-> Is a vector of active power output by a t-period conventional distributed generator, +.>Representing the active power output by the kth conventional distributed motor in t period,/for the period of time>The charging and discharging power of the energy storage device at the time t is shown to be in a charging state when the charging and discharging power is positive, and is shown to be in a discharging state when the charging and discharging power is negative;
furthermore, conventional distributed generators and energy storage devices need to satisfy the following constraints, respectively:
wherein,and->Representing maximum and minimum output power, respectively, < > of a conventional distributed power supply>Representing the active power output by the d-th conventional distributed motor in the t period; />Maximum charge and discharge power of the energy storage device; the action space of the markov decision process is therefore: />
4. The intelligent agent-based grid-connected micro-grid optimal energy management method according to claim 1, wherein the step (2) includes running cost of the micro-grid, namely power generation cost of a conventional distributed power supply and cost of purchasing and selling electricity from the micro-grid to the power distribution network; meanwhile, the operation constraint of the micro-grid is considered in the reward function, wherein the operation constraint comprises exchange power constraint, tide constraint, voltage constraint and energy storage constraint of the micro-grid and the power distribution network; the optimal strategy learned according to the reward function does not output an energy management scheme that violates the constraint.
5. The method for optimal energy management of an agent-based grid-connected micro-grid of claim 1 or 4, wherein the step (2) considers a bonus function of the micro-grid operation constraint, and when the micro-grid operation constraint is satisfied, the bonus function of the agent is:
wherein r is t Indicating the rewards earned by the agent after the t-th decision,and->The power generation cost of the kth conventional distributed power supply in the t period and the electricity purchasing and selling cost of the micro-grid to the power distribution network are respectively calculated according to the following formulas:
wherein a is d ,b d ,c d As a factor of the cost of the material,to exchange power with the distribution network, when->For positive value, purchasing electricity to the power distribution network, and for negative value, selling electricity to the power distribution network, R t The real-time electricity price is given, and deltat is the running step length; r is (r) t Is a negative number of costs, thus maximizing r t Meaning that the cost is minimized;
the formula is a reward function when the constraint condition is met, and the following constraint condition is specifically considered when the reward function of the micro grid operation constraint is designed and considered:
(1) Tidal current constraint
Wherein,and->Respectively representing the active power and the reactive power flowing through the branch ij in the t period,/>Representing the maximum apparent power allowed by branch ij;
(2) Switching power constraints
Wherein,maximum exchange power allowed for a connecting line of the micro-grid and the power distribution network;
(3) Voltage constraint
Wherein,respectively representing the minimum value and the maximum value of the voltage of the node i in the t period and the allowable voltage of the node;
(4) Energy storage constraint
E min ≤E t ≤E max (10)
Wherein E is t For the stored energy of the energy storage device during the period t,for the charge and discharge power of the energy storage device in the t period, when u t When 1, the energy storage device is charged, and when u is t When the energy storage device is in a discharge state, the charging and the discharging cannot be performed simultaneously in the same period; η (eta) c And eta d Respectively representing the charging efficiency and the discharging efficiency of the system; e (E) min And E is max Respectively the minimum value and the maximum value of the energy stored by the energy storage device;
when intelligenceBody output actionAfter that, it is first checked whether the constraint is satisfied, if satisfied, the reward is calculated according to the formula, and if not, the reward is calculated according to the following formula:
r t =-ζ (11)
wherein ζ is a very large positive number.
6. The agent-based grid-connected microgrid optimal energy management method of claim 1, wherein: and (3) solving the optimal stable strategy of the established Markov decision process by adopting a deep Q learning method, wherein a group of samples comprising the current state, the action of the intelligent agent, the obtained rewards and the state at the next moment can be obtained by performing one-time interaction between the intelligent agent and the micro-grid environment, and the optimal action value network is learned by utilizing the samples to obtain an optimal strategy which outputs an optimal energy management scheme.
7. The intelligent agent-based grid-connected micro grid optimal energy management method according to claim 1 or 6, wherein the step (3) adopts a deep Q learning method to solve an optimal smooth strategy of an established markov decision process, and the accumulated return of the markov decision process is as follows:
wherein, gamma is E [0,1]]For discounted rate, for reducing return on long-term revenue; gamma ray t Discount rate for period t; the state cost function of the markov decision process is:
the intelligent agent and the micro-grid environment in the deep Q learning are interactively sampled to obtain samples (state, action, rewarding and next time state), and parameters of a cost function are updated by using the samples, so that an optimal strategy with the maximum state cost function for all states s is obtained;
action cost function Q π (s, a) represents the expected return that would be expected to be obtainable by selecting action a in state s and then following policy pi; in deep Q learning, the action cost function is modeled as a multi-layer neural network Q w (s, a), the input of the neural network is the state s, and the output is the Q value of each action; to solve instability of neural network training, target networkThe method is used for calculating TD errors, and the playback buffer zone is used for storing four-tuple data obtained by sampling from the environment, so that training data is better facilitated;
when the optimal strategy of the constructed Markov decision process is solved by deep Q learning, the state is firstly calculatedInput to the agent, which outputs the action according to the current strategy>Applied to the micro-grid environment, the micro-grid environment firstly judges whether the constraint is met and calculates a rewarding value, and the environment rewards r t And next state s t+1 Is fed back to the agent, thus obtaining a set of samples (s t ,a t ,r t+1 ) And stores the samples in the playback buffer. The intelligent agent is according to s t+1 Continuing to sample the environment interactively;
when the data in the playback buffer is sufficient, the Q network is started to be updated, and N groups of samples {(s) are extracted from the Q network each time i ,a i ,r i ,s i+1 )}} i=1,…,N The TD error is calculated for each set of samples:
the target loss for this set of samples is:
parameters are learned by minimizing target losses.
CN202311206909.8A 2023-09-18 2023-09-18 Grid-connected micro-grid optimal energy management method based on intelligent agent Pending CN117277327A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311206909.8A CN117277327A (en) 2023-09-18 2023-09-18 Grid-connected micro-grid optimal energy management method based on intelligent agent

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311206909.8A CN117277327A (en) 2023-09-18 2023-09-18 Grid-connected micro-grid optimal energy management method based on intelligent agent

Publications (1)

Publication Number Publication Date
CN117277327A true CN117277327A (en) 2023-12-22

Family

ID=89200106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311206909.8A Pending CN117277327A (en) 2023-09-18 2023-09-18 Grid-connected micro-grid optimal energy management method based on intelligent agent

Country Status (1)

Country Link
CN (1) CN117277327A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726143A (en) * 2024-02-07 2024-03-19 山东大学 Environment-friendly micro-grid optimal scheduling method and system based on deep reinforcement learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726143A (en) * 2024-02-07 2024-03-19 山东大学 Environment-friendly micro-grid optimal scheduling method and system based on deep reinforcement learning
CN117726143B (en) * 2024-02-07 2024-05-17 山东大学 Environment-friendly micro-grid optimal scheduling method and system based on deep reinforcement learning

Similar Documents

Publication Publication Date Title
Fontenot et al. Modeling and control of building-integrated microgrids for optimal energy management–a review
CN112186743B (en) Dynamic power system economic dispatching method based on deep reinforcement learning
CN109347149B (en) Micro-grid energy storage scheduling method and device based on deep Q-value network reinforcement learning
CN110119886B (en) Active distribution network dynamic planning method
Machlev et al. A review of optimal control methods for energy storage systems-energy trading, energy balancing and electric vehicles
Varzaneh et al. Optimal energy management for PV‐integrated residential systems including energy storage system
CN117277327A (en) Grid-connected micro-grid optimal energy management method based on intelligent agent
CN115940289A (en) Operation method of light storage and charging integrated station for power balance and new energy consumption of power grid
Shi et al. Research on energy management of hydrogen electric coupling system based on deep reinforcement learning
CN115169723A (en) Power generation power prediction method, load prediction method and model training method
CN112072643A (en) Light-storage system online scheduling method based on depth certainty gradient strategy
CN117154778A (en) Distributed energy storage optimal configuration method and system for power distribution network
Miah et al. Energy storage controllers and optimization schemes integration to microgrid: an analytical assessment towards future perspectives
Zhu et al. Optimal scheduling of a wind energy dominated distribution network via a deep reinforcement learning approach
Chang et al. Model predictive control based energy collaborative optimization management for energy storage system of virtual power plant
Yang et al. Data-driven optimal dynamic dispatch for Hydro-PV-PHS integrated power systems using deep reinforcement learning approach
Dou et al. Double‐deck optimal schedule of micro‐grid based on demand‐side response
Azarhooshang et al. Energy management of distribution network with inverter‐based renewable virtual power plant considering voltage security index
Mohammadi et al. Ai-based optimal scheduling of renewable ac microgrids with bidirectional lstm-based wind power forecasting
Saadaoui et al. Hybridization and energy storage high efficiency and low cost
Seane et al. Modelling and optimizing microgrid systems with the utilization of real-time residential data: a case study for Palapye, Botswana
Kabir Data-Driven Integration of Renewable Energy in Smart Grid
CN117833285A (en) Micro-grid energy storage optimization scheduling method based on deep reinforcement learning
ELamin et al. Enhancing energy trading between different islanded microgrids a reinforcement learning algorithm case study in northern kordofan state
CN117767369B (en) Energy storage site selection and hierarchical configuration method considering medium-long term planning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination