CN113807029B - Deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method - Google Patents
Deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method Download PDFInfo
- Publication number
- CN113807029B CN113807029B CN202111217697.4A CN202111217697A CN113807029B CN 113807029 B CN113807029 B CN 113807029B CN 202111217697 A CN202111217697 A CN 202111217697A CN 113807029 B CN113807029 B CN 113807029B
- Authority
- CN
- China
- Prior art keywords
- time
- network
- node
- scale
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/18—Arrangements for adjusting, eliminating or compensating reactive power in networks
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/24—Arrangements for preventing or reducing oscillations of power in networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/04—Constraint-based CAD
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2113/00—Details relating to the application field
- G06F2113/04—Power grid distribution networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E40/00—Technologies for an efficient electrical power generation, transmission or distribution
- Y02E40/30—Reactive power compensation
Landscapes
- Engineering & Computer Science (AREA)
- Power Engineering (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
A double-time-scale power grid voltage optimization method based on deep reinforcement learning comprises the following steps: dividing a long time scale interval and a short time scale interval in a double time scale method respectively; performing long-time-scale power grid voltage optimization based on a DQN algorithm to obtain a long-time-scale parallel capacitor bank switching plan; and (4) performing short-time-scale reactive voltage optimization based on a DDPG algorithm to obtain a short-time-scale continuous reactive power compensation device output plan. The invention realizes the advantage complementation of various reactive compensation devices, has stronger reactive voltage optimization capability, can carry out overall arrangement on the capacitor switching plan at each optimized time point in one day, and effectively realizes quick optimization.
Description
Technical Field
The invention relates to a double-time-scale power grid voltage optimization method. In particular to a double-time-scale new energy power grid voltage optimization method based on deep reinforcement learning.
Background
In order to construct a novel power system taking new energy as a main body, the permeability of various renewable energy sources is further improved, and the randomness and the dynamics of load demand response are further enhanced, so that great challenges are brought to the operation and the control of a modern power grid.
The reactive voltage optimization of the power grid can effectively and economically solve the problem that the voltage of a power system fluctuates in a large range due to disturbance to a certain extent, and the reactive voltage optimization problem of the power grid can be regarded as a nonlinear complex optimization problem with numerous targets, variables and constraints.
At present, methods for processing dynamic reactive voltage optimization mainly comprise a traditional operation research optimization method, a heuristic search method and the like. However, these methods often have the problems of slow convergence speed, large calculation amount, easy falling into local optimum, and the like. Furthermore, most existing methods are based on model solution, highly dependent on model accuracy, which is impractical for power systems with large access to new energy sources. In order to reduce the influence of model precision on control performance, the artificial intelligence algorithm is applied to the field of reactive voltage optimization, so that the power system can timely and accurately make countermeasures under various conditions, and a new thought is undoubtedly provided for the operation control of the power system.
Disclosure of Invention
The invention aims to solve the technical problem of providing a deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method capable of simultaneously considering discrete and continuous reactive compensation devices.
The technical scheme adopted by the invention is as follows: the double-time-scale new energy power grid voltage optimization method based on deep reinforcement learning comprises the following steps:
1) respectively dividing a long time scale interval and a short time scale interval in the double time scale method:
divide a day into K l The initial time of each long time interval in the day is tau equal to 0, …, K l -1, subdividing each long time interval into K s The initial time of each short time interval in a long time interval is t 0, …, K s -1;
2) Carrying out long-time scale power grid voltage optimization based on a DQN algorithm, comprising the following steps: establishing a long-time scale power grid voltage optimization model, integrating multiple targets by using a membership function, designing a reward function aiming at the long-time scale power grid voltage optimization model, and solving the long-time scale power grid voltage optimization model by using a DQN algorithm to obtain a long-time scale parallel capacitor bank switching plan;
3) short-time scale reactive voltage optimization is carried out based on a DDPG algorithm, and the method comprises the following steps: establishing a short-time-scale power grid voltage optimization model, designing a reward function aiming at the short-time-scale power grid voltage optimization model, and solving the short-time-scale power grid voltage optimization model by using a DDPG algorithm to obtain a short-time-scale continuous reactive power compensation device output plan.
The double-time-scale new energy power grid voltage optimization method based on deep reinforcement learning has the following advantages:
1. according to the invention, through the mutual matching of the two intelligent agents on the long and short time scales, the complementary advantages of various reactive compensation devices are realized, and the reactive voltage optimization capability is stronger.
2. The design method provided by the invention aims at inhibiting large-range voltage fluctuation caused by conventional load demand change and minimizing the whole system network loss on a long-time scale, and the DQN algorithm is used as an optimization kernel, so that the capacitor switching plan can be comprehensively arranged at each optimization time point in one day.
3. The design method provided by the invention aims at solving the problem of rapid and frequent fluctuation of the grid voltage caused by large-scale grid connection of new energy on a short time scale, and the DDPG algorithm is used as an optimization kernel to effectively realize rapid optimization.
4. The power grid voltage optimization method provided by the invention can effectively solve the problem of frequent fluctuation of the power grid voltage under the condition of high-proportion new energy access, and has practical significance for engineering application.
Drawings
FIG. 1 is a flow chart of a double-time scale new energy power grid voltage optimization method based on deep reinforcement learning according to the invention;
FIG. 2 is a schematic diagram of an improved IEEE39 node test system in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram of a typical long-time-scale capacitor bank switching plan within a day in an example of the present invention;
FIG. 4 is a graph of long time scale daily average network loss in an example of the invention;
FIG. 5 is a schematic diagram of reactive power output of a typical short-time-scale continuous reactive power compensation device in an example of the present invention;
FIG. 6a is a schematic diagram of the voltage optimization effect of a typical in-day node 6 in an example of the present invention;
FIG. 6b is a schematic diagram illustrating the voltage optimization effect of the exemplary in-day node 23 in the example of the present invention;
fig. 6c is a schematic diagram of the voltage optimization effect of the typical in-day node 26 in the example of the present invention.
Detailed Description
The invention provides a deep reinforcement learning-based dual-time scale new energy grid voltage optimization method, which is described in detail below with reference to embodiments and drawings.
As shown in fig. 1, the method for optimizing the voltage of the dual-time-scale new energy power grid based on deep reinforcement learning of the present invention includes the following steps:
1) respectively dividing a long time scale interval and a short time scale interval in the double time scale method:
divide a day into K l The initial time of each long time interval in the day is tau equal to 0, …, K l -1, subdividing each long time interval into K s The initial time of each short time interval in a long time interval is t 0, …, K s -1;
2) Carrying out long-time scale power grid voltage optimization based on a DQN algorithm; the method comprises the following steps:
(2.1) establishing a long-time scale power grid voltage optimization model:
long-time scale power grid voltage optimization model objective function F l (T) is:
minF l (T)=|f 1 (T),f 2 (T)|
wherein the content of the first and second substances,t is the switching state vector of all parallel capacitor banks, f 1 (T) and f 2 (T) are respectively an objective function F l (T) a first sub-goal and a second sub-goal; t (tau) represents all parallel capacitor bank switching state vectors at the time of tau, and the switching state of each parallel capacitor bank is represented by a switching gear; n is the number of nodes in the power grid; v. of p Representing the voltage amplitude corresponding to the pivot node; v. of ref Corresponding the voltage reference value for the hub node; p is a radical of ij Representing the active power flowing from the node i to the node j;
considering the power grid operation flow constraint and the voltage constraint:
wherein p is i Representing the active power injected into node i; q. q.s i Is representative of the reactive power injected into node i; v. of i Represents the voltage amplitude of the node i;andrespectively representing the upper limit and the lower limit of the voltage amplitude of the node i; g ij Represents the conductance between node i and node j; b is ij Representing susceptance between node i and node j; omega ij Represents the voltage phase angle difference between node i and node j;
switching of the mth parallel capacitor bank is switched by a switching gear T m Upper and lower limitsAndthe limitation of (2):
Wherein, C m And the switching times of the mth parallel capacitor bank in one day are shown.
(2.2) integrating multiple targets by using a membership function;
wherein the content of the first and second substances,representing the possible optimal value on a single dimension corresponding to the beta sub-target, wherein beta is 1, 2; delta β Is sub-target f β Tolerance of value is used to define the boundary that the target function can reach, and for any sub-target, when the corresponding target value is in the tolerance range, the membership function mu (f) β ) Will then sub-target f β Mapping of values to [0,1 ]]In sub-goal f β When the tolerance of the value is out, the value of the membership function is set to 0, and when a new sub-target optimal value is found, the value of the membership function is set to 1;
the new objective function after the membership function mapping is adopted is as follows:
min[-μ(F l )]=k 1 μ(f 1 )+k 2 μ(f 2 )
wherein, mu (F) l ) Representation and objective function F l A corresponding membership function; f. of 1 And f 2 Respectively representing a first sub-target and a second sub-target of the objective function; k is a radical of 1 And k 2 The weight coefficients of the two targets are respectively.
(2.3) designing a reward function r for a long-time scale grid voltage optimization model l (τ):
Wherein, mu (F) l ) Representation and objective function F l A corresponding membership function; sigma l Penalty factors for long-time scale voltage out-of-limit and capacitor switching times out-of-limit; v. of i (τ) represents the voltage magnitude at node i at time t; c m And (tau) represents the switching times of the mth parallel capacitor bank at the time t in one day.
(2.4) solving the long-time-scale power grid voltage optimization model by using a DQN algorithm to obtain a long-time-scale parallel capacitor bank switching plan; the method comprises the following steps:
(2.4.1) calculating the DQN network loss function L (θ):
wherein r is l (τ) represents a reward function of the long-time scale grid voltage optimization model; s (tau) and s (tau-1) respectively represent the states of the intelligent agent at the time tau and the time tau-1 and are composed of an information matrix set { v, p, Q, T, C, Q }, wherein v, p and Q are voltage amplitude vectors of all nodes, active power vectors and reactive power vectors injected into all nodes respectively, and T, C, Q are switching state vectors of all parallel capacitor banks, switched frequency vectors of all parallel capacitor banks in one day and reactive power vectors of all continuous reactive power compensation devices respectively; a is the action space of the agent; a is l Representing the action selected by the agent; a is l (t) is the state s (τ) of the agent at time τ -1 -1) derived from an implementation strategy; q. q.s π Output by the estimated value network;the target network parameters are input by the estimated value network at fixed step length, so that the target network has a certain lag relative to the estimated value network, theta and theta target Respectively estimating a value network parameter and a target network parameter; gamma represents an attenuation factor;
(2.4.2) updating the estimated value network parameters by adopting a random gradient descent method, wherein the updating method comprises the following steps:
wherein, theta τ+1 And theta τ Respectively representing estimated value network parameters at the tau +1 moment and the tau moment; alpha represents the learning rate of the estimated value network in the updating process; ^ represents gradient determination.
3) Optimizing the short-time-scale power grid voltage based on a DDPG algorithm; the method comprises the following steps:
(3.1) establishing a short-time-scale power grid voltage optimization model:
short-time-scale grid voltage optimization model objective function F s Comprises the following steps:
q is a vector formed by reactive power output of each continuous reactive power compensation device; q (t) represents a vector formed by the reactive power output of each continuous reactive power compensation device at the time t; v. of p Representing the voltage amplitude corresponding to the pivot node; v. of ref Corresponding the voltage reference value for the hub node; k s Representing the number of short time intervals within a long time scale;
considering the power grid operation flow constraint and the voltage constraint, the constraint conditions are as follows:
wherein p is i Representing the active power injected into node i; q. q.s i Is representative of the reactive power injected into node i; v. of i Represents the voltage amplitude of the node i;andrespectively representing the upper limit and the lower limit of the voltage amplitude of the node i; g ij Represents the conductance between node i and node j; b is ij Representing susceptance between node i and node j; omega ij Represents the voltage phase angle difference between node i and node j;
in order to deal with the emergency in the power system, a certain reserve amount needs to be reserved in the adjusting process of the continuous reactive power compensation device, and the constraint conditions of the continuous reactive power compensation device in the adjusting process are as follows:
wherein q is con,n The reactive power output value of the nth continuous reactive power compensation device is obtained;andare each q con,n Upper and lower reactive power output limits of;
(3.2) designing a reward function r for a short-time-scale grid voltage optimization model s (t):
Wherein, mu (F) s ) Representation and objective function F s A corresponding membership function; sigma s A penalty factor for short timescale voltage violations; v. of i (t) represents the voltage magnitude at node i at time t.
(3.3) solving the short-time-scale power grid voltage optimization model by using a DDPG algorithm to obtain a short-time-scale continuous reactive power compensation device output plan; comprises that
(3.3.1) calculating the Critic network loss function L (θ) q ):
Wherein, the first and the second end of the pipe are connected with each other,representing the expectation of corresponding target values over all values of t; s (T) is the state of the intelligent agent at the moment T, and consists of an information matrix set { v, p, Q, T, C, Q }, wherein v, p and Q are voltage amplitude vectors of all nodes and active power vectors and reactive power vectors injected into all nodes respectively, and T, C, Q are switching state vectors of all parallel capacitor banks, switched frequency vectors of all parallel capacitor banks in one day and reactive power output vectors of all continuous reactive power compensation devices respectively; a is s (t) ═ t (t) represents the action taken by the agent at time t, derived from the policy enforcement based on state s (t-1) at time t-1; theta q Estimating parameters of a value network for the Critic network; q is output by an estimated value network in the Critic network; because the DDPG algorithm belongs to a gradient solving method based on a deterministic strategy, the probability distribution selected by each action under different states cannot be determined, the expected value solving is converted into M times of sampling to obtain an average value, and the M times of sampling dataRandom non-repetitive extraction from memory banks, s m (t) and a s m (t) agent status and agent actions taken as the mth sample, respectively; y is m (t) a label to be considered as the mth sample; the label y (t) expression is:
y(t)=r s (t)+γq target {s(t+1),ψ target [s(t)∣θ ψ' ]∣θ q' }
wherein r is s (t) is a reward function of a short-time scale grid voltage optimization model; s (t) and s (t +1) are states of the agent at the time t and the time t +1 respectively; psi target Output by a target network in the Actor network; theta ψ' Parameters of a target network in an Actor network; q. q.s target Output by a target network in the Critic network; theta q' Parameters of a target network in the Critic network; gamma is an attenuation factor;
(3.3.2) the Actor network is judged based on the Critic network, and a gradient updating method is adopted to update the Actor network, wherein the gradient updating calculation method comprises the following steps:
wherein the content of the first and second substances,representing the expectation of corresponding target values over all values of t; q is output by an estimated value network in the Critic network; psi is output by the estimated value network in the Actor network; s and a represent agent status and action taken by the agent, respectively; theta q Estimating parameters of a value network for the Critic network; theta ψ Estimating parameters of a value network for the Actor network;representing the gradient of the corresponding target value to the action a;representing the corresponding target value pair parameter theta ψ LadderCalculating the degree; m is the sampling times; s m (t) andagent status and agent actions taken as the m-th sample, respectively;
(3.3.3) updating parameters of the Actor network and the criticic network respectively, wherein the updating method comprises the following steps:
wherein, the first and the second end of the pipe are connected with each other,andrespectively representing parameters of an estimated value network in an Actor network at the time t +1 and the time t;andrespectively representing parameters of an estimated value network in a Critic network at the moment t +1 and the moment t; alpha is alpha 1 And alpha 2 Respectively representing the learning rates of the estimated value network in the Actor network and the estimated value network in the Critic network in the updating process; ^ represents gradient determination.
Examples are given below:
according to the flow chart of the deep reinforcement learning-based dual-time scale power grid voltage optimization method disclosed by the invention and shown in fig. 1, the improved IEEE39 node test system shown in fig. 2 is subjected to voltage optimization under the conditions of uncertain new energy output and uncertain load. Node 6, node 23, and node 26 are set as the hub nodes for the region. The node 33 and the node 37 are wind power plants, and the rated capacity is 500 MW. The parallel capacitor bank 1 and the parallel capacitor bank 2 are respectively installed on the No. 4 node and the No. 8 node of the original system, the parameters are the same, the maximum gear is 6, each gear is 50Mvar, and the maximum adjusting times are 6 times every day. The node 6, the node 23 and the node 26 are respectively connected with a continuous device, and the adjustable range is-120 Mvar. Considering the influence of emergency on the voltage of the power grid, the SVG sets a reactive power reserve area for reactive power support in emergency, and in the reactive power voltage method provided by the embodiment, the adjustable range is adjusted to-80 Mvar. In order to adapt to the action time of the reactive power compensation device, the long-time scale decision time selected in the embodiment is 1h, and the short-time scale decision time is 5 min. And constructing 420-day power system operation data according to the typical daily load curve and the typical wind power output curve, wherein the 420-day power system operation data is used as training data for driving the double intelligent agents. The comprehensive operation performance of the comparison algorithm is analyzed by comparing the optimization effect and the operation time obtained by the deep reinforcement learning algorithm and the genetic algorithm.
The capacitor bank is limited by the switching times in the switching process, and as can be seen from fig. 3, whether the switching plan at the next moment is executed can be determined according to the current switching times of the capacitor and the operation condition of the power grid at the moment by adopting the method provided by the invention. It can be seen that, at about 4:00, when the wind power output is sufficient and the load demand is low, the parallel capacitor bank is placed at a lower gear, less reactive support is provided for the power grid, and at about 12:00, when the wind power output is low and the load demand is high, the parallel capacitor bank is adjusted to a higher gear, and the voltage fluctuation in a large range is inhibited. Under the condition that the switching times of the capacitor bank are limited, if the long-time scale is optimized only according to the current power grid operation condition, the optimization effect cannot be fully exerted after the times of a single day reach the upper limit.
As can be seen from fig. 4, compared with the genetic algorithm, the average loss reduction rate of the method provided by the invention is 5.24% in 200-400 days, and the average loss reduction rate under the optimization of the genetic algorithm is 4.66%, which fully shows the superiority of the method provided by the invention.
The reactive power output of each successive device on a typical short time scale of the day can be seen from fig. 5. The continuous reactive power compensation device is mainly used for inhibiting the rapid and frequent fluctuation of voltage caused by the uncertainty of new energy, so that the output of the continuous reactive power compensation device changes frequently in one day.
According to fig. 6a, fig. 6b and fig. 6c, it can be seen that the method provided by the invention has a good voltage optimization effect compared with a genetic algorithm in a typical day. The cumulative time for optimization of the method provided by the invention in a typical day is only 137.58s, while the cumulative time for optimization by a genetic algorithm is 685.44s, which proves the rapidity of the method in solving a decision problem.
In conclusion, the advantages of various reactive compensation devices are complemented through the mutual matching of the two intelligent agents on the long and short time scales, and the reactive compensation device has stronger reactive voltage optimization capability and good feasibility.
Claims (7)
1. A double-time-scale new energy power grid voltage optimization method based on deep reinforcement learning is characterized by comprising the following steps:
1) respectively dividing a long time scale interval and a short time scale interval in the double time scale method:
divide a day into K l The initial time of each long time interval in the day is tau equal to 0, …, K l -1, subdividing each long time interval into K s The initial time of each short time interval in a long time interval is t 0, …, K s -1;
2) Carrying out long-time scale power grid voltage optimization based on a DQN algorithm, comprising the following steps: establishing a long-time scale power grid voltage optimization model, integrating multiple targets by using a membership function, designing a reward function aiming at the long-time scale power grid voltage optimization model, and solving the long-time scale power grid voltage optimization model by using a DQN algorithm to obtain a long-time scale parallel capacitor bank switching plan; the long-time scale power grid voltage optimization model comprises the following steps:
(1) long-time scale power grid voltage optimization model objective function F l (T) is:
min F l (T)=|f 1 (T),f 2 (T)|
wherein T is the switching state vector of all parallel capacitor banks, f 1 (T) and f 2 (T) are respectively an objective function F l (T) a first sub-goal and a second sub-goal; t (tau) represents all parallel capacitor bank switching state vectors at the time of tau, and the switching state of each parallel capacitor bank is represented by a switching gear; n is the number of nodes in the power grid; v. of p Representing the voltage amplitude corresponding to the pivot node; v. of ref Corresponding the voltage reference value for the hub node; p is a radical of ij Representing the active power of the node i flowing to the node j; k is l Representing the number of long time intervals in a day;
(2) considering the power grid operation flow constraint and the voltage constraint:
wherein p is i Representing the active power injected into node i; q. q.s i Is representative of the reactive power injected into node i; v. of i Represents the voltage amplitude of the node i;andrespectively representing the upper limit and the lower limit of the voltage amplitude of the node i; g ij Represents the conductance between node i and node j; b is ij Representing susceptance between node i and node j; omega ij Represents the voltage phase angle difference between node i and node j;
(3) switching of the mth parallel capacitor bank is switched by a switching gear T m Upper and lower limitsAndthe limitation of (2):
Wherein, C m Representing the switching times of the mth parallel capacitor bank in one day;
3) short-time scale reactive voltage optimization is carried out based on a DDPG algorithm, and the method comprises the following steps: establishing a short-time-scale power grid voltage optimization model, designing a reward function aiming at the short-time-scale power grid voltage optimization model, and solving the short-time-scale power grid voltage optimization model by using a DDPG algorithm to obtain a short-time-scale continuous reactive power compensation device output plan.
2. The deep reinforcement learning-based double-time-scale new energy grid voltage optimization method according to claim 1, wherein the step 2) of integrating multiple targets by using membership functions adopts the following formula:
Wherein, the first and the second end of the pipe are connected with each other,representing the possible optimal value on a single dimension corresponding to the beta sub-target, wherein beta is 1, 2; delta. for the preparation of a coating β As sub-objective f β Tolerance of value is used to define the boundary that the target function can reach, and for any sub-target, when the corresponding target value is in the tolerance range, the membership function mu (f) β ) Will then sub-target f β Mapping of values to [0,1 ]]In sub-goal f β When the tolerance of the value is out, the value of the membership function is set to 0, and when a new sub-target optimal value is found, the value of the membership function is set to 1;
the new objective function after the membership function mapping is adopted is as follows:
min[-μ(F l )]=k 1 μ(f 1 )+k 2 μ(f 2 )
wherein, mu (F) l ) Representation and objective function F l A corresponding membership function; f. of 1 And f 2 Respectively representing a first sub-target and a second sub-target of the objective function; k is a radical of 1 And k 2 The weight coefficients of the two targets are respectively.
3. The deep reinforcement learning-based dual-time scale new energy grid voltage optimization method according to claim 1, wherein the method is characterized in thatIn that, the reward function r is designed aiming at the long-time scale grid voltage optimization model in the step 2) l (τ):
Wherein, mu (F) l ) Representation and objective function F l A corresponding membership function; sigma l Penalty factors for long-time scale voltage out-of-limit and capacitor switching times out-of-limit; v. of i (τ) represents the voltage magnitude at node i at time t; c m And (tau) represents the switching times of the mth parallel capacitor bank at the time t in one day.
4. The deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method according to claim 1, wherein the step 2) of solving the long-time-scale power grid voltage optimization model by using the DQN algorithm to obtain the long-time-scale parallel capacitor bank switching plan comprises the following steps:
(1) calculating a DQN network loss function L (theta):
wherein r is l (τ) represents a reward function of the long-time scale grid voltage optimization model; s (tau) and s (tau-1) respectively represent the states of the intelligent agent at the time tau and the time tau-1 and are composed of an information matrix set { v, p, Q, T, C, Q }, wherein v, p and Q are voltage amplitude vectors of all nodes, active power vectors and reactive power vectors injected into all nodes respectively, and T, C, Q are switching state vectors of all parallel capacitor banks, switched frequency vectors of all parallel capacitor banks in one day and reactive power vectors of all continuous reactive power compensation devices respectively; a is the action space of the agent; a is l Representing the action selected by the agent; a is l (t) ═ t (t) is obtained by the agent implementing the policy based on the state s (τ -1) at time τ -1; q. q.s π Output from estimated value network;The target network parameters are input by the estimated value network at fixed step length, so that the target network has a certain lag relative to the estimated value network, theta and theta target Respectively estimating a value network parameter and a target network parameter; γ represents an attenuation factor;
(2) updating the estimated value network parameters by adopting a random gradient descent method, wherein the updating method comprises the following steps:
5. The deep reinforcement learning-based dual-time scale new energy grid voltage optimization method according to claim 1, wherein the establishing of the short-time scale grid voltage optimization model in step 3) comprises:
short-time-scale grid voltage optimization model objective function F s Comprises the following steps:
q is a vector formed by reactive power output of each continuous reactive power compensation device; q (t) represents a vector formed by the reactive power output of each continuous reactive power compensation device at the time t; v. of p Representing the voltage amplitude corresponding to the pivot node; v. of ref Corresponding the voltage reference value for the hub node; k s Representing the number of short time intervals within a long time scale;
considering the power grid operation flow constraint and the voltage constraint, the constraint conditions are as follows:
wherein p is i Representing the active power injected into node i; q. q.s i Is representative of the reactive power injected into node i; v. of i Represents the voltage amplitude of the node i;andrespectively representing the upper limit and the lower limit of the voltage amplitude of the node i; g ij Represents the conductance between node i and node j; b is ij Representing susceptance between node i and node j; omega ij Represents the voltage phase angle difference between node i and node j;
in order to deal with the emergency in the power system, a certain reserve amount needs to be reserved in the adjusting process of the continuous reactive power compensation device, and the constraint conditions of the continuous reactive power compensation device in the adjusting process are as follows:
6. The deep reinforcement learning-based dual-time scale new energy grid voltage optimization method according to claim 1, wherein the reward function r is designed for the short-time scale grid voltage optimization model in the step 3) s (t) the following:
wherein, mu (F) s ) Representation and objective function F s A corresponding membership function; sigma s A penalty factor for short timescale voltage violations; v. of i (t) represents the voltage magnitude at node i at time t.
7. The deep reinforcement learning-based dual-time-scale new energy grid voltage optimization method according to claim 1, wherein the step 3) of solving the short-time-scale grid voltage optimization model by using a DDPG algorithm to obtain a short-time-scale continuous reactive power compensation device output plan comprises the following steps:
(1) calculating a Critic network loss function L (theta) q ):
Wherein, the first and the second end of the pipe are connected with each other,representing the expectation of corresponding target values over all values of t; s (T) is the state of the agent at time T, and is composed of a set of information matrices { v, p, Q, T, C, Q }V, p and q are voltage amplitude vectors of each node and active power vectors and reactive power vectors injected into each node respectively, and T, C, Q is a switching state vector of all parallel capacitor banks, a switched frequency vector of all parallel capacitor banks in one day and a reactive power output vector of all continuous reactive power compensation devices respectively; a is s (t) ═ t (t) represents the action taken by the agent at time t, derived from the policy enforcement based on state s (t-1) at time t-1; theta q Estimating parameters of a value network for the Critic network; q is output by an estimated value network in the Critic network; because the DDPG algorithm belongs to a gradient solving method based on a deterministic strategy, the probability distribution selected by each action under different states cannot be determined, the expected value solving is converted into M times of sampling to obtain an average value, the M times of sampling data are randomly and repeatedly extracted from a memory base, and s is m (t) andagent status and agent actions taken as the m-th sample, respectively; y is m (t) a label to be considered as the mth sample; the label y (t) expression is:
y(t)=r s (t)+γq target {s(t+1),ψ target [s(t)∣θ ψ' ]∣θ q' }
wherein r is s (t) is a reward function of a short-time scale grid voltage optimization model; s (t) and s (t +1) are states of the agent at the time t and the time t +1 respectively; psi target Output by a target network in the Actor network; theta ψ' Parameters of a target network in an Actor network; q. q.s target Output by a target network in the Critic network; theta q' Parameters of a target network in the Critic network; gamma is an attenuation factor;
(2) the Actor network is judged based on the Critic network, and is updated by adopting a gradient updating method, wherein the gradient updating calculation method comprises the following steps:
wherein the content of the first and second substances,representing the expectation of corresponding target values over all values of t; q is output by an estimated value network in the Critic network; psi is output by the estimated value network in the Actor network; s and a represent agent status and action taken by the agent, respectively; theta q Estimating parameters of a value network for the Critic network; theta ψ Estimating parameters of a value network for the Actor network; a representing the gradient of the corresponding target value to the action a;representing the corresponding target value pair parameter theta ψ The gradient of (2) is calculated; m is the sampling times; s m (t) andagent status and agent actions taken as the m-th sample, respectively;
(3) respectively updating parameters of the Actor network and the Critic network, wherein the updating method comprises the following steps:
wherein the content of the first and second substances,andrespectively representEstimating parameters of the value network in the Actor network at the t +1 moment and the t moment;andrespectively representing parameters of an estimated value network in a Critic network at the moment t +1 and the moment t; alpha is alpha 1 And alpha 2 Respectively representing the learning rates of the estimated value network in the Actor network and the estimated value network in the Critic network in the updating process;representing gradient finding.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111217697.4A CN113807029B (en) | 2021-10-19 | 2021-10-19 | Deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111217697.4A CN113807029B (en) | 2021-10-19 | 2021-10-19 | Deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113807029A CN113807029A (en) | 2021-12-17 |
CN113807029B true CN113807029B (en) | 2022-07-29 |
Family
ID=78898027
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111217697.4A Active CN113807029B (en) | 2021-10-19 | 2021-10-19 | Deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113807029B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114243718B (en) * | 2021-12-23 | 2023-08-01 | 华北电力大学(保定) | Reactive voltage coordination control method for power grid based on DDPG algorithm |
CN114336667B (en) * | 2022-01-22 | 2023-06-27 | 华北电力大学(保定) | Reactive voltage intelligent optimization method for high-proportion wind-solar new energy power grid |
CN116054185B (en) * | 2023-03-30 | 2023-06-02 | 武汉新能源接入装备与技术研究院有限公司 | Control method of reactive power compensator |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113408962A (en) * | 2021-07-28 | 2021-09-17 | 贵州大学 | Power grid multi-time scale and multi-target energy optimal scheduling method |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104993522B (en) * | 2015-06-30 | 2018-01-19 | 中国电力科学研究院 | A kind of active distribution network Multiple Time Scales coordination optimization dispatching method based on MPC |
CN106487042B (en) * | 2016-11-22 | 2019-01-15 | 合肥工业大学 | A kind of Multiple Time Scales micro-capacitance sensor voltage power-less optimized controlling method |
CN106953359B (en) * | 2017-04-21 | 2019-08-27 | 中国农业大学 | A kind of active reactive coordinating and optimizing control method of power distribution network containing distributed photovoltaic |
CN108964042B (en) * | 2018-07-24 | 2021-10-15 | 合肥工业大学 | Regional power grid operating point scheduling optimization method based on deep Q network |
US20200327411A1 (en) * | 2019-04-14 | 2020-10-15 | Di Shi | Systems and Method on Deriving Real-time Coordinated Voltage Control Strategies Using Deep Reinforcement Learning |
CN110535146B (en) * | 2019-08-27 | 2022-09-23 | 哈尔滨工业大学 | Electric power system reactive power optimization method based on depth determination strategy gradient reinforcement learning |
CN112600218B (en) * | 2020-11-30 | 2022-07-29 | 华北电力大学(保定) | Multi-time-scale optimization control method for reactive voltage of power grid comprising photovoltaic energy storage system |
CN112711902A (en) * | 2020-12-15 | 2021-04-27 | 国网江苏省电力有限公司淮安供电分公司 | Power grid voltage calculation method based on Monte Carlo sampling and deep learning |
CN113363997B (en) * | 2021-05-28 | 2022-06-14 | 浙江大学 | Reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning |
CN113363998B (en) * | 2021-06-21 | 2022-06-28 | 东南大学 | Power distribution network voltage control method based on multi-agent deep reinforcement learning |
-
2021
- 2021-10-19 CN CN202111217697.4A patent/CN113807029B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113408962A (en) * | 2021-07-28 | 2021-09-17 | 贵州大学 | Power grid multi-time scale and multi-target energy optimal scheduling method |
Also Published As
Publication number | Publication date |
---|---|
CN113807029A (en) | 2021-12-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113807029B (en) | Deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method | |
Wang et al. | Toward the prediction level of situation awareness for electric power systems using CNN-LSTM network | |
CN113363998B (en) | Power distribution network voltage control method based on multi-agent deep reinforcement learning | |
CN114217524A (en) | Power grid real-time self-adaptive decision-making method based on deep reinforcement learning | |
CN112733462A (en) | Ultra-short-term wind power plant power prediction method combining meteorological factors | |
CN109255726A (en) | A kind of ultra-short term wind power prediction method of Hybrid Intelligent Technology | |
CN108599180A (en) | A kind of electric distribution network reactive-voltage optimization method considering power randomness | |
CN111525587A (en) | Reactive load situation-based power grid reactive voltage control method and system | |
CN112564125B (en) | Dynamic reactive power optimization method for power distribution network based on variable-step-length longhorn beetle whisker algorithm | |
CN112418496B (en) | Power distribution station energy storage configuration method based on deep learning | |
CN114336632A (en) | Method for correcting alternating current power flow based on model information assisted deep learning | |
Zhang et al. | Deep reinforcement learning for load shedding against short-term voltage instability in large power systems | |
CN112818588A (en) | Optimal power flow calculation method and device for power system and storage medium | |
CN115313403A (en) | Real-time voltage regulation and control method based on deep reinforcement learning algorithm | |
CN113872213B (en) | Autonomous optimization control method and device for power distribution network voltage | |
CN112001066A (en) | Deep learning-based method for calculating limit transmission capacity | |
CN113344283A (en) | Energy internet new energy consumption capacity assessment method based on edge intelligence | |
CN117200213A (en) | Power distribution system voltage control method based on self-organizing map neural network deep reinforcement learning | |
Kang et al. | Stability analysis of TSK fuzzy systems | |
CN112994016B (en) | Method and system for adjusting restoration resolvable property of power flow of power system | |
CN115829258A (en) | Electric power system economic dispatching method based on polynomial chaotic approximate dynamic programming | |
CN114937999A (en) | Machine learning-based steady-state reactive power optimization method for synchronous generator to improve voltage transient stability | |
Obert et al. | Efficient distributed energy resource voltage control using ensemble deep reinforcement learning | |
Lin et al. | Reactive power optimization in area power grid based on improved Tabu search algorithm | |
Zhu et al. | Electric vehicle load forecasting based on improved neural network based on differential evolution algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |