CN113807029B - Deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method - Google Patents

Deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method Download PDF

Info

Publication number
CN113807029B
CN113807029B CN202111217697.4A CN202111217697A CN113807029B CN 113807029 B CN113807029 B CN 113807029B CN 202111217697 A CN202111217697 A CN 202111217697A CN 113807029 B CN113807029 B CN 113807029B
Authority
CN
China
Prior art keywords
time
network
node
scale
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111217697.4A
Other languages
Chinese (zh)
Other versions
CN113807029A (en
Inventor
李鹏
姜磊
王加浩
夏辉
高一航
李建宜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China Electric Power University
Original Assignee
North China Electric Power University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Electric Power University filed Critical North China Electric Power University
Priority to CN202111217697.4A priority Critical patent/CN113807029B/en
Publication of CN113807029A publication Critical patent/CN113807029A/en
Application granted granted Critical
Publication of CN113807029B publication Critical patent/CN113807029B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/18Arrangements for adjusting, eliminating or compensating reactive power in networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/24Arrangements for preventing or reducing oscillations of power in networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/04Power grid distribution networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/30Reactive power compensation

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

A double-time-scale power grid voltage optimization method based on deep reinforcement learning comprises the following steps: dividing a long time scale interval and a short time scale interval in a double time scale method respectively; performing long-time-scale power grid voltage optimization based on a DQN algorithm to obtain a long-time-scale parallel capacitor bank switching plan; and (4) performing short-time-scale reactive voltage optimization based on a DDPG algorithm to obtain a short-time-scale continuous reactive power compensation device output plan. The invention realizes the advantage complementation of various reactive compensation devices, has stronger reactive voltage optimization capability, can carry out overall arrangement on the capacitor switching plan at each optimized time point in one day, and effectively realizes quick optimization.

Description

Deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method
Technical Field
The invention relates to a double-time-scale power grid voltage optimization method. In particular to a double-time-scale new energy power grid voltage optimization method based on deep reinforcement learning.
Background
In order to construct a novel power system taking new energy as a main body, the permeability of various renewable energy sources is further improved, and the randomness and the dynamics of load demand response are further enhanced, so that great challenges are brought to the operation and the control of a modern power grid.
The reactive voltage optimization of the power grid can effectively and economically solve the problem that the voltage of a power system fluctuates in a large range due to disturbance to a certain extent, and the reactive voltage optimization problem of the power grid can be regarded as a nonlinear complex optimization problem with numerous targets, variables and constraints.
At present, methods for processing dynamic reactive voltage optimization mainly comprise a traditional operation research optimization method, a heuristic search method and the like. However, these methods often have the problems of slow convergence speed, large calculation amount, easy falling into local optimum, and the like. Furthermore, most existing methods are based on model solution, highly dependent on model accuracy, which is impractical for power systems with large access to new energy sources. In order to reduce the influence of model precision on control performance, the artificial intelligence algorithm is applied to the field of reactive voltage optimization, so that the power system can timely and accurately make countermeasures under various conditions, and a new thought is undoubtedly provided for the operation control of the power system.
Disclosure of Invention
The invention aims to solve the technical problem of providing a deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method capable of simultaneously considering discrete and continuous reactive compensation devices.
The technical scheme adopted by the invention is as follows: the double-time-scale new energy power grid voltage optimization method based on deep reinforcement learning comprises the following steps:
1) respectively dividing a long time scale interval and a short time scale interval in the double time scale method:
divide a day into K l The initial time of each long time interval in the day is tau equal to 0, …, K l -1, subdividing each long time interval into K s The initial time of each short time interval in a long time interval is t 0, …, K s -1;
2) Carrying out long-time scale power grid voltage optimization based on a DQN algorithm, comprising the following steps: establishing a long-time scale power grid voltage optimization model, integrating multiple targets by using a membership function, designing a reward function aiming at the long-time scale power grid voltage optimization model, and solving the long-time scale power grid voltage optimization model by using a DQN algorithm to obtain a long-time scale parallel capacitor bank switching plan;
3) short-time scale reactive voltage optimization is carried out based on a DDPG algorithm, and the method comprises the following steps: establishing a short-time-scale power grid voltage optimization model, designing a reward function aiming at the short-time-scale power grid voltage optimization model, and solving the short-time-scale power grid voltage optimization model by using a DDPG algorithm to obtain a short-time-scale continuous reactive power compensation device output plan.
The double-time-scale new energy power grid voltage optimization method based on deep reinforcement learning has the following advantages:
1. according to the invention, through the mutual matching of the two intelligent agents on the long and short time scales, the complementary advantages of various reactive compensation devices are realized, and the reactive voltage optimization capability is stronger.
2. The design method provided by the invention aims at inhibiting large-range voltage fluctuation caused by conventional load demand change and minimizing the whole system network loss on a long-time scale, and the DQN algorithm is used as an optimization kernel, so that the capacitor switching plan can be comprehensively arranged at each optimization time point in one day.
3. The design method provided by the invention aims at solving the problem of rapid and frequent fluctuation of the grid voltage caused by large-scale grid connection of new energy on a short time scale, and the DDPG algorithm is used as an optimization kernel to effectively realize rapid optimization.
4. The power grid voltage optimization method provided by the invention can effectively solve the problem of frequent fluctuation of the power grid voltage under the condition of high-proportion new energy access, and has practical significance for engineering application.
Drawings
FIG. 1 is a flow chart of a double-time scale new energy power grid voltage optimization method based on deep reinforcement learning according to the invention;
FIG. 2 is a schematic diagram of an improved IEEE39 node test system in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram of a typical long-time-scale capacitor bank switching plan within a day in an example of the present invention;
FIG. 4 is a graph of long time scale daily average network loss in an example of the invention;
FIG. 5 is a schematic diagram of reactive power output of a typical short-time-scale continuous reactive power compensation device in an example of the present invention;
FIG. 6a is a schematic diagram of the voltage optimization effect of a typical in-day node 6 in an example of the present invention;
FIG. 6b is a schematic diagram illustrating the voltage optimization effect of the exemplary in-day node 23 in the example of the present invention;
fig. 6c is a schematic diagram of the voltage optimization effect of the typical in-day node 26 in the example of the present invention.
Detailed Description
The invention provides a deep reinforcement learning-based dual-time scale new energy grid voltage optimization method, which is described in detail below with reference to embodiments and drawings.
As shown in fig. 1, the method for optimizing the voltage of the dual-time-scale new energy power grid based on deep reinforcement learning of the present invention includes the following steps:
1) respectively dividing a long time scale interval and a short time scale interval in the double time scale method:
divide a day into K l The initial time of each long time interval in the day is tau equal to 0, …, K l -1, subdividing each long time interval into K s The initial time of each short time interval in a long time interval is t 0, …, K s -1;
2) Carrying out long-time scale power grid voltage optimization based on a DQN algorithm; the method comprises the following steps:
(2.1) establishing a long-time scale power grid voltage optimization model:
long-time scale power grid voltage optimization model objective function F l (T) is:
minF l (T)=|f 1 (T),f 2 (T)|
Figure GDA0003642153320000021
Figure GDA0003642153320000022
wherein the content of the first and second substances,t is the switching state vector of all parallel capacitor banks, f 1 (T) and f 2 (T) are respectively an objective function F l (T) a first sub-goal and a second sub-goal; t (tau) represents all parallel capacitor bank switching state vectors at the time of tau, and the switching state of each parallel capacitor bank is represented by a switching gear; n is the number of nodes in the power grid; v. of p Representing the voltage amplitude corresponding to the pivot node; v. of ref Corresponding the voltage reference value for the hub node; p is a radical of ij Representing the active power flowing from the node i to the node j;
considering the power grid operation flow constraint and the voltage constraint:
Figure GDA0003642153320000031
Figure GDA0003642153320000032
Figure GDA0003642153320000033
wherein p is i Representing the active power injected into node i; q. q.s i Is representative of the reactive power injected into node i; v. of i Represents the voltage amplitude of the node i;
Figure GDA0003642153320000034
and
Figure GDA0003642153320000035
respectively representing the upper limit and the lower limit of the voltage amplitude of the node i; g ij Represents the conductance between node i and node j; b is ij Representing susceptance between node i and node j; omega ij Represents the voltage phase angle difference between node i and node j;
switching of the mth parallel capacitor bank is switched by a switching gear T m Upper and lower limits
Figure GDA0003642153320000036
And
Figure GDA0003642153320000037
the limitation of (2):
Figure GDA0003642153320000038
the upper limit of the switching times of the parallel capacitor bank in one day is set as
Figure GDA0003642153320000039
Figure GDA00036421533200000310
Wherein, C m And the switching times of the mth parallel capacitor bank in one day are shown.
(2.2) integrating multiple targets by using a membership function;
Figure GDA00036421533200000311
wherein the content of the first and second substances,
Figure GDA00036421533200000312
representing the possible optimal value on a single dimension corresponding to the beta sub-target, wherein beta is 1, 2; delta β Is sub-target f β Tolerance of value is used to define the boundary that the target function can reach, and for any sub-target, when the corresponding target value is in the tolerance range, the membership function mu (f) β ) Will then sub-target f β Mapping of values to [0,1 ]]In sub-goal f β When the tolerance of the value is out, the value of the membership function is set to 0, and when a new sub-target optimal value is found, the value of the membership function is set to 1;
the new objective function after the membership function mapping is adopted is as follows:
min[-μ(F l )]=k 1 μ(f 1 )+k 2 μ(f 2 )
wherein, mu (F) l ) Representation and objective function F l A corresponding membership function; f. of 1 And f 2 Respectively representing a first sub-target and a second sub-target of the objective function; k is a radical of 1 And k 2 The weight coefficients of the two targets are respectively.
(2.3) designing a reward function r for a long-time scale grid voltage optimization model l (τ):
Figure GDA00036421533200000313
Wherein, mu (F) l ) Representation and objective function F l A corresponding membership function; sigma l Penalty factors for long-time scale voltage out-of-limit and capacitor switching times out-of-limit; v. of i (τ) represents the voltage magnitude at node i at time t; c m And (tau) represents the switching times of the mth parallel capacitor bank at the time t in one day.
(2.4) solving the long-time-scale power grid voltage optimization model by using a DQN algorithm to obtain a long-time-scale parallel capacitor bank switching plan; the method comprises the following steps:
(2.4.1) calculating the DQN network loss function L (θ):
Figure GDA0003642153320000041
wherein r is l (τ) represents a reward function of the long-time scale grid voltage optimization model; s (tau) and s (tau-1) respectively represent the states of the intelligent agent at the time tau and the time tau-1 and are composed of an information matrix set { v, p, Q, T, C, Q }, wherein v, p and Q are voltage amplitude vectors of all nodes, active power vectors and reactive power vectors injected into all nodes respectively, and T, C, Q are switching state vectors of all parallel capacitor banks, switched frequency vectors of all parallel capacitor banks in one day and reactive power vectors of all continuous reactive power compensation devices respectively; a is the action space of the agent; a is l Representing the action selected by the agent; a is l (t) is the state s (τ) of the agent at time τ -1 -1) derived from an implementation strategy; q. q.s π Output by the estimated value network;
Figure GDA0003642153320000042
the target network parameters are input by the estimated value network at fixed step length, so that the target network has a certain lag relative to the estimated value network, theta and theta target Respectively estimating a value network parameter and a target network parameter; gamma represents an attenuation factor;
(2.4.2) updating the estimated value network parameters by adopting a random gradient descent method, wherein the updating method comprises the following steps:
Figure GDA0003642153320000043
wherein, theta τ+1 And theta τ Respectively representing estimated value network parameters at the tau +1 moment and the tau moment; alpha represents the learning rate of the estimated value network in the updating process; ^ represents gradient determination.
3) Optimizing the short-time-scale power grid voltage based on a DDPG algorithm; the method comprises the following steps:
(3.1) establishing a short-time-scale power grid voltage optimization model:
short-time-scale grid voltage optimization model objective function F s Comprises the following steps:
Figure GDA0003642153320000044
q is a vector formed by reactive power output of each continuous reactive power compensation device; q (t) represents a vector formed by the reactive power output of each continuous reactive power compensation device at the time t; v. of p Representing the voltage amplitude corresponding to the pivot node; v. of ref Corresponding the voltage reference value for the hub node; k s Representing the number of short time intervals within a long time scale;
considering the power grid operation flow constraint and the voltage constraint, the constraint conditions are as follows:
Figure GDA0003642153320000045
Figure GDA0003642153320000046
Figure GDA0003642153320000047
wherein p is i Representing the active power injected into node i; q. q.s i Is representative of the reactive power injected into node i; v. of i Represents the voltage amplitude of the node i;
Figure GDA0003642153320000051
and
Figure GDA0003642153320000052
respectively representing the upper limit and the lower limit of the voltage amplitude of the node i; g ij Represents the conductance between node i and node j; b is ij Representing susceptance between node i and node j; omega ij Represents the voltage phase angle difference between node i and node j;
in order to deal with the emergency in the power system, a certain reserve amount needs to be reserved in the adjusting process of the continuous reactive power compensation device, and the constraint conditions of the continuous reactive power compensation device in the adjusting process are as follows:
Figure GDA0003642153320000053
wherein q is con,n The reactive power output value of the nth continuous reactive power compensation device is obtained;
Figure GDA0003642153320000054
and
Figure GDA0003642153320000055
are each q con,n Upper and lower reactive power output limits of;
(3.2) designing a reward function r for a short-time-scale grid voltage optimization model s (t):
Figure GDA0003642153320000056
Wherein, mu (F) s ) Representation and objective function F s A corresponding membership function; sigma s A penalty factor for short timescale voltage violations; v. of i (t) represents the voltage magnitude at node i at time t.
(3.3) solving the short-time-scale power grid voltage optimization model by using a DDPG algorithm to obtain a short-time-scale continuous reactive power compensation device output plan; comprises that
(3.3.1) calculating the Critic network loss function L (θ) q ):
Figure GDA0003642153320000057
Wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003642153320000058
representing the expectation of corresponding target values over all values of t; s (T) is the state of the intelligent agent at the moment T, and consists of an information matrix set { v, p, Q, T, C, Q }, wherein v, p and Q are voltage amplitude vectors of all nodes and active power vectors and reactive power vectors injected into all nodes respectively, and T, C, Q are switching state vectors of all parallel capacitor banks, switched frequency vectors of all parallel capacitor banks in one day and reactive power output vectors of all continuous reactive power compensation devices respectively; a is s (t) ═ t (t) represents the action taken by the agent at time t, derived from the policy enforcement based on state s (t-1) at time t-1; theta q Estimating parameters of a value network for the Critic network; q is output by an estimated value network in the Critic network; because the DDPG algorithm belongs to a gradient solving method based on a deterministic strategy, the probability distribution selected by each action under different states cannot be determined, the expected value solving is converted into M times of sampling to obtain an average value, and the M times of sampling dataRandom non-repetitive extraction from memory banks, s m (t) and a s m (t) agent status and agent actions taken as the mth sample, respectively; y is m (t) a label to be considered as the mth sample; the label y (t) expression is:
y(t)=r s (t)+γq target {s(t+1),ψ target [s(t)∣θ ψ' ]∣θ q' }
wherein r is s (t) is a reward function of a short-time scale grid voltage optimization model; s (t) and s (t +1) are states of the agent at the time t and the time t +1 respectively; psi target Output by a target network in the Actor network; theta ψ' Parameters of a target network in an Actor network; q. q.s target Output by a target network in the Critic network; theta q' Parameters of a target network in the Critic network; gamma is an attenuation factor;
(3.3.2) the Actor network is judged based on the Critic network, and a gradient updating method is adopted to update the Actor network, wherein the gradient updating calculation method comprises the following steps:
Figure GDA0003642153320000061
wherein the content of the first and second substances,
Figure GDA0003642153320000062
representing the expectation of corresponding target values over all values of t; q is output by an estimated value network in the Critic network; psi is output by the estimated value network in the Actor network; s and a represent agent status and action taken by the agent, respectively; theta q Estimating parameters of a value network for the Critic network; theta ψ Estimating parameters of a value network for the Actor network;
Figure GDA0003642153320000063
representing the gradient of the corresponding target value to the action a;
Figure GDA0003642153320000064
representing the corresponding target value pair parameter theta ψ LadderCalculating the degree; m is the sampling times; s m (t) and
Figure GDA0003642153320000065
agent status and agent actions taken as the m-th sample, respectively;
(3.3.3) updating parameters of the Actor network and the criticic network respectively, wherein the updating method comprises the following steps:
Figure GDA0003642153320000066
Figure GDA0003642153320000067
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003642153320000068
and
Figure GDA0003642153320000069
respectively representing parameters of an estimated value network in an Actor network at the time t +1 and the time t;
Figure GDA00036421533200000610
and
Figure GDA00036421533200000611
respectively representing parameters of an estimated value network in a Critic network at the moment t +1 and the moment t; alpha is alpha 1 And alpha 2 Respectively representing the learning rates of the estimated value network in the Actor network and the estimated value network in the Critic network in the updating process; ^ represents gradient determination.
Examples are given below:
according to the flow chart of the deep reinforcement learning-based dual-time scale power grid voltage optimization method disclosed by the invention and shown in fig. 1, the improved IEEE39 node test system shown in fig. 2 is subjected to voltage optimization under the conditions of uncertain new energy output and uncertain load. Node 6, node 23, and node 26 are set as the hub nodes for the region. The node 33 and the node 37 are wind power plants, and the rated capacity is 500 MW. The parallel capacitor bank 1 and the parallel capacitor bank 2 are respectively installed on the No. 4 node and the No. 8 node of the original system, the parameters are the same, the maximum gear is 6, each gear is 50Mvar, and the maximum adjusting times are 6 times every day. The node 6, the node 23 and the node 26 are respectively connected with a continuous device, and the adjustable range is-120 Mvar. Considering the influence of emergency on the voltage of the power grid, the SVG sets a reactive power reserve area for reactive power support in emergency, and in the reactive power voltage method provided by the embodiment, the adjustable range is adjusted to-80 Mvar. In order to adapt to the action time of the reactive power compensation device, the long-time scale decision time selected in the embodiment is 1h, and the short-time scale decision time is 5 min. And constructing 420-day power system operation data according to the typical daily load curve and the typical wind power output curve, wherein the 420-day power system operation data is used as training data for driving the double intelligent agents. The comprehensive operation performance of the comparison algorithm is analyzed by comparing the optimization effect and the operation time obtained by the deep reinforcement learning algorithm and the genetic algorithm.
The capacitor bank is limited by the switching times in the switching process, and as can be seen from fig. 3, whether the switching plan at the next moment is executed can be determined according to the current switching times of the capacitor and the operation condition of the power grid at the moment by adopting the method provided by the invention. It can be seen that, at about 4:00, when the wind power output is sufficient and the load demand is low, the parallel capacitor bank is placed at a lower gear, less reactive support is provided for the power grid, and at about 12:00, when the wind power output is low and the load demand is high, the parallel capacitor bank is adjusted to a higher gear, and the voltage fluctuation in a large range is inhibited. Under the condition that the switching times of the capacitor bank are limited, if the long-time scale is optimized only according to the current power grid operation condition, the optimization effect cannot be fully exerted after the times of a single day reach the upper limit.
As can be seen from fig. 4, compared with the genetic algorithm, the average loss reduction rate of the method provided by the invention is 5.24% in 200-400 days, and the average loss reduction rate under the optimization of the genetic algorithm is 4.66%, which fully shows the superiority of the method provided by the invention.
The reactive power output of each successive device on a typical short time scale of the day can be seen from fig. 5. The continuous reactive power compensation device is mainly used for inhibiting the rapid and frequent fluctuation of voltage caused by the uncertainty of new energy, so that the output of the continuous reactive power compensation device changes frequently in one day.
According to fig. 6a, fig. 6b and fig. 6c, it can be seen that the method provided by the invention has a good voltage optimization effect compared with a genetic algorithm in a typical day. The cumulative time for optimization of the method provided by the invention in a typical day is only 137.58s, while the cumulative time for optimization by a genetic algorithm is 685.44s, which proves the rapidity of the method in solving a decision problem.
In conclusion, the advantages of various reactive compensation devices are complemented through the mutual matching of the two intelligent agents on the long and short time scales, and the reactive compensation device has stronger reactive voltage optimization capability and good feasibility.

Claims (7)

1. A double-time-scale new energy power grid voltage optimization method based on deep reinforcement learning is characterized by comprising the following steps:
1) respectively dividing a long time scale interval and a short time scale interval in the double time scale method:
divide a day into K l The initial time of each long time interval in the day is tau equal to 0, …, K l -1, subdividing each long time interval into K s The initial time of each short time interval in a long time interval is t 0, …, K s -1;
2) Carrying out long-time scale power grid voltage optimization based on a DQN algorithm, comprising the following steps: establishing a long-time scale power grid voltage optimization model, integrating multiple targets by using a membership function, designing a reward function aiming at the long-time scale power grid voltage optimization model, and solving the long-time scale power grid voltage optimization model by using a DQN algorithm to obtain a long-time scale parallel capacitor bank switching plan; the long-time scale power grid voltage optimization model comprises the following steps:
(1) long-time scale power grid voltage optimization model objective function F l (T) is:
min F l (T)=|f 1 (T),f 2 (T)|
Figure FDA0003642153310000011
Figure FDA0003642153310000012
wherein T is the switching state vector of all parallel capacitor banks, f 1 (T) and f 2 (T) are respectively an objective function F l (T) a first sub-goal and a second sub-goal; t (tau) represents all parallel capacitor bank switching state vectors at the time of tau, and the switching state of each parallel capacitor bank is represented by a switching gear; n is the number of nodes in the power grid; v. of p Representing the voltage amplitude corresponding to the pivot node; v. of ref Corresponding the voltage reference value for the hub node; p is a radical of ij Representing the active power of the node i flowing to the node j; k is l Representing the number of long time intervals in a day;
(2) considering the power grid operation flow constraint and the voltage constraint:
Figure FDA0003642153310000013
Figure FDA0003642153310000014
Figure FDA0003642153310000015
wherein p is i Representing the active power injected into node i; q. q.s i Is representative of the reactive power injected into node i; v. of i Represents the voltage amplitude of the node i;
Figure FDA0003642153310000016
and
Figure FDA0003642153310000017
respectively representing the upper limit and the lower limit of the voltage amplitude of the node i; g ij Represents the conductance between node i and node j; b is ij Representing susceptance between node i and node j; omega ij Represents the voltage phase angle difference between node i and node j;
(3) switching of the mth parallel capacitor bank is switched by a switching gear T m Upper and lower limits
Figure FDA0003642153310000018
And
Figure FDA0003642153310000019
the limitation of (2):
Figure FDA00036421533100000110
(4) the upper limit of the switching times of the parallel capacitor bank in one day is set as
Figure FDA0003642153310000021
Figure FDA0003642153310000022
Wherein, C m Representing the switching times of the mth parallel capacitor bank in one day;
3) short-time scale reactive voltage optimization is carried out based on a DDPG algorithm, and the method comprises the following steps: establishing a short-time-scale power grid voltage optimization model, designing a reward function aiming at the short-time-scale power grid voltage optimization model, and solving the short-time-scale power grid voltage optimization model by using a DDPG algorithm to obtain a short-time-scale continuous reactive power compensation device output plan.
2. The deep reinforcement learning-based double-time-scale new energy grid voltage optimization method according to claim 1, wherein the step 2) of integrating multiple targets by using membership functions adopts the following formula:
Figure FDA0003642153310000023
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003642153310000024
representing the possible optimal value on a single dimension corresponding to the beta sub-target, wherein beta is 1, 2; delta. for the preparation of a coating β As sub-objective f β Tolerance of value is used to define the boundary that the target function can reach, and for any sub-target, when the corresponding target value is in the tolerance range, the membership function mu (f) β ) Will then sub-target f β Mapping of values to [0,1 ]]In sub-goal f β When the tolerance of the value is out, the value of the membership function is set to 0, and when a new sub-target optimal value is found, the value of the membership function is set to 1;
the new objective function after the membership function mapping is adopted is as follows:
min[-μ(F l )]=k 1 μ(f 1 )+k 2 μ(f 2 )
wherein, mu (F) l ) Representation and objective function F l A corresponding membership function; f. of 1 And f 2 Respectively representing a first sub-target and a second sub-target of the objective function; k is a radical of 1 And k 2 The weight coefficients of the two targets are respectively.
3. The deep reinforcement learning-based dual-time scale new energy grid voltage optimization method according to claim 1, wherein the method is characterized in thatIn that, the reward function r is designed aiming at the long-time scale grid voltage optimization model in the step 2) l (τ):
Figure FDA0003642153310000025
Wherein, mu (F) l ) Representation and objective function F l A corresponding membership function; sigma l Penalty factors for long-time scale voltage out-of-limit and capacitor switching times out-of-limit; v. of i (τ) represents the voltage magnitude at node i at time t; c m And (tau) represents the switching times of the mth parallel capacitor bank at the time t in one day.
4. The deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method according to claim 1, wherein the step 2) of solving the long-time-scale power grid voltage optimization model by using the DQN algorithm to obtain the long-time-scale parallel capacitor bank switching plan comprises the following steps:
(1) calculating a DQN network loss function L (theta):
Figure FDA0003642153310000026
wherein r is l (τ) represents a reward function of the long-time scale grid voltage optimization model; s (tau) and s (tau-1) respectively represent the states of the intelligent agent at the time tau and the time tau-1 and are composed of an information matrix set { v, p, Q, T, C, Q }, wherein v, p and Q are voltage amplitude vectors of all nodes, active power vectors and reactive power vectors injected into all nodes respectively, and T, C, Q are switching state vectors of all parallel capacitor banks, switched frequency vectors of all parallel capacitor banks in one day and reactive power vectors of all continuous reactive power compensation devices respectively; a is the action space of the agent; a is l Representing the action selected by the agent; a is l (t) ═ t (t) is obtained by the agent implementing the policy based on the state s (τ -1) at time τ -1; q. q.s π Output from estimated value network;
Figure FDA0003642153310000031
The target network parameters are input by the estimated value network at fixed step length, so that the target network has a certain lag relative to the estimated value network, theta and theta target Respectively estimating a value network parameter and a target network parameter; γ represents an attenuation factor;
(2) updating the estimated value network parameters by adopting a random gradient descent method, wherein the updating method comprises the following steps:
Figure FDA0003642153310000032
wherein, theta τ+1 And theta τ Respectively representing estimated value network parameters at the tau +1 moment and the tau moment; alpha represents the learning rate of the estimated value network in the updating process;
Figure FDA0003642153310000033
representing gradient finding.
5. The deep reinforcement learning-based dual-time scale new energy grid voltage optimization method according to claim 1, wherein the establishing of the short-time scale grid voltage optimization model in step 3) comprises:
short-time-scale grid voltage optimization model objective function F s Comprises the following steps:
Figure FDA0003642153310000034
q is a vector formed by reactive power output of each continuous reactive power compensation device; q (t) represents a vector formed by the reactive power output of each continuous reactive power compensation device at the time t; v. of p Representing the voltage amplitude corresponding to the pivot node; v. of ref Corresponding the voltage reference value for the hub node; k s Representing the number of short time intervals within a long time scale;
considering the power grid operation flow constraint and the voltage constraint, the constraint conditions are as follows:
Figure FDA0003642153310000035
Figure FDA0003642153310000036
Figure FDA0003642153310000037
wherein p is i Representing the active power injected into node i; q. q.s i Is representative of the reactive power injected into node i; v. of i Represents the voltage amplitude of the node i;
Figure FDA0003642153310000038
and
Figure FDA0003642153310000039
respectively representing the upper limit and the lower limit of the voltage amplitude of the node i; g ij Represents the conductance between node i and node j; b is ij Representing susceptance between node i and node j; omega ij Represents the voltage phase angle difference between node i and node j;
in order to deal with the emergency in the power system, a certain reserve amount needs to be reserved in the adjusting process of the continuous reactive power compensation device, and the constraint conditions of the continuous reactive power compensation device in the adjusting process are as follows:
Figure FDA00036421533100000310
wherein q is con,n The reactive power output value of the nth continuous reactive power compensation device is obtained;
Figure FDA00036421533100000311
and
Figure FDA00036421533100000312
are each q con,n Upper and lower reactive power output limits.
6. The deep reinforcement learning-based dual-time scale new energy grid voltage optimization method according to claim 1, wherein the reward function r is designed for the short-time scale grid voltage optimization model in the step 3) s (t) the following:
Figure FDA0003642153310000041
wherein, mu (F) s ) Representation and objective function F s A corresponding membership function; sigma s A penalty factor for short timescale voltage violations; v. of i (t) represents the voltage magnitude at node i at time t.
7. The deep reinforcement learning-based dual-time-scale new energy grid voltage optimization method according to claim 1, wherein the step 3) of solving the short-time-scale grid voltage optimization model by using a DDPG algorithm to obtain a short-time-scale continuous reactive power compensation device output plan comprises the following steps:
(1) calculating a Critic network loss function L (theta) q ):
Figure FDA0003642153310000042
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003642153310000043
representing the expectation of corresponding target values over all values of t; s (T) is the state of the agent at time T, and is composed of a set of information matrices { v, p, Q, T, C, Q }V, p and q are voltage amplitude vectors of each node and active power vectors and reactive power vectors injected into each node respectively, and T, C, Q is a switching state vector of all parallel capacitor banks, a switched frequency vector of all parallel capacitor banks in one day and a reactive power output vector of all continuous reactive power compensation devices respectively; a is s (t) ═ t (t) represents the action taken by the agent at time t, derived from the policy enforcement based on state s (t-1) at time t-1; theta q Estimating parameters of a value network for the Critic network; q is output by an estimated value network in the Critic network; because the DDPG algorithm belongs to a gradient solving method based on a deterministic strategy, the probability distribution selected by each action under different states cannot be determined, the expected value solving is converted into M times of sampling to obtain an average value, the M times of sampling data are randomly and repeatedly extracted from a memory base, and s is m (t) and
Figure FDA0003642153310000044
agent status and agent actions taken as the m-th sample, respectively; y is m (t) a label to be considered as the mth sample; the label y (t) expression is:
y(t)=r s (t)+γq target {s(t+1),ψ target [s(t)∣θ ψ' ]∣θ q' }
wherein r is s (t) is a reward function of a short-time scale grid voltage optimization model; s (t) and s (t +1) are states of the agent at the time t and the time t +1 respectively; psi target Output by a target network in the Actor network; theta ψ' Parameters of a target network in an Actor network; q. q.s target Output by a target network in the Critic network; theta q' Parameters of a target network in the Critic network; gamma is an attenuation factor;
(2) the Actor network is judged based on the Critic network, and is updated by adopting a gradient updating method, wherein the gradient updating calculation method comprises the following steps:
Figure FDA0003642153310000051
wherein the content of the first and second substances,
Figure FDA0003642153310000052
representing the expectation of corresponding target values over all values of t; q is output by an estimated value network in the Critic network; psi is output by the estimated value network in the Actor network; s and a represent agent status and action taken by the agent, respectively; theta q Estimating parameters of a value network for the Critic network; theta ψ Estimating parameters of a value network for the Actor network;
Figure FDA0003642153310000053
a representing the gradient of the corresponding target value to the action a;
Figure FDA0003642153310000054
representing the corresponding target value pair parameter theta ψ The gradient of (2) is calculated; m is the sampling times; s m (t) and
Figure FDA0003642153310000055
agent status and agent actions taken as the m-th sample, respectively;
(3) respectively updating parameters of the Actor network and the Critic network, wherein the updating method comprises the following steps:
Figure FDA0003642153310000056
Figure FDA0003642153310000057
wherein the content of the first and second substances,
Figure FDA0003642153310000058
and
Figure FDA0003642153310000059
respectively representEstimating parameters of the value network in the Actor network at the t +1 moment and the t moment;
Figure FDA00036421533100000510
and
Figure FDA00036421533100000511
respectively representing parameters of an estimated value network in a Critic network at the moment t +1 and the moment t; alpha is alpha 1 And alpha 2 Respectively representing the learning rates of the estimated value network in the Actor network and the estimated value network in the Critic network in the updating process;
Figure FDA00036421533100000512
representing gradient finding.
CN202111217697.4A 2021-10-19 2021-10-19 Deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method Active CN113807029B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111217697.4A CN113807029B (en) 2021-10-19 2021-10-19 Deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111217697.4A CN113807029B (en) 2021-10-19 2021-10-19 Deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method

Publications (2)

Publication Number Publication Date
CN113807029A CN113807029A (en) 2021-12-17
CN113807029B true CN113807029B (en) 2022-07-29

Family

ID=78898027

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111217697.4A Active CN113807029B (en) 2021-10-19 2021-10-19 Deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method

Country Status (1)

Country Link
CN (1) CN113807029B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114243718B (en) * 2021-12-23 2023-08-01 华北电力大学(保定) Reactive voltage coordination control method for power grid based on DDPG algorithm
CN114336667B (en) * 2022-01-22 2023-06-27 华北电力大学(保定) Reactive voltage intelligent optimization method for high-proportion wind-solar new energy power grid
CN116054185B (en) * 2023-03-30 2023-06-02 武汉新能源接入装备与技术研究院有限公司 Control method of reactive power compensator

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408962A (en) * 2021-07-28 2021-09-17 贵州大学 Power grid multi-time scale and multi-target energy optimal scheduling method

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104993522B (en) * 2015-06-30 2018-01-19 中国电力科学研究院 A kind of active distribution network Multiple Time Scales coordination optimization dispatching method based on MPC
CN106487042B (en) * 2016-11-22 2019-01-15 合肥工业大学 A kind of Multiple Time Scales micro-capacitance sensor voltage power-less optimized controlling method
CN106953359B (en) * 2017-04-21 2019-08-27 中国农业大学 A kind of active reactive coordinating and optimizing control method of power distribution network containing distributed photovoltaic
CN108964042B (en) * 2018-07-24 2021-10-15 合肥工业大学 Regional power grid operating point scheduling optimization method based on deep Q network
US20200327411A1 (en) * 2019-04-14 2020-10-15 Di Shi Systems and Method on Deriving Real-time Coordinated Voltage Control Strategies Using Deep Reinforcement Learning
CN110535146B (en) * 2019-08-27 2022-09-23 哈尔滨工业大学 Electric power system reactive power optimization method based on depth determination strategy gradient reinforcement learning
CN112600218B (en) * 2020-11-30 2022-07-29 华北电力大学(保定) Multi-time-scale optimization control method for reactive voltage of power grid comprising photovoltaic energy storage system
CN112711902A (en) * 2020-12-15 2021-04-27 国网江苏省电力有限公司淮安供电分公司 Power grid voltage calculation method based on Monte Carlo sampling and deep learning
CN113363997B (en) * 2021-05-28 2022-06-14 浙江大学 Reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning
CN113363998B (en) * 2021-06-21 2022-06-28 东南大学 Power distribution network voltage control method based on multi-agent deep reinforcement learning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408962A (en) * 2021-07-28 2021-09-17 贵州大学 Power grid multi-time scale and multi-target energy optimal scheduling method

Also Published As

Publication number Publication date
CN113807029A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN113807029B (en) Deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method
Wang et al. Toward the prediction level of situation awareness for electric power systems using CNN-LSTM network
CN113363998B (en) Power distribution network voltage control method based on multi-agent deep reinforcement learning
CN114217524A (en) Power grid real-time self-adaptive decision-making method based on deep reinforcement learning
CN112733462A (en) Ultra-short-term wind power plant power prediction method combining meteorological factors
CN109255726A (en) A kind of ultra-short term wind power prediction method of Hybrid Intelligent Technology
CN108599180A (en) A kind of electric distribution network reactive-voltage optimization method considering power randomness
CN111525587A (en) Reactive load situation-based power grid reactive voltage control method and system
CN112564125B (en) Dynamic reactive power optimization method for power distribution network based on variable-step-length longhorn beetle whisker algorithm
CN112418496B (en) Power distribution station energy storage configuration method based on deep learning
CN114336632A (en) Method for correcting alternating current power flow based on model information assisted deep learning
Zhang et al. Deep reinforcement learning for load shedding against short-term voltage instability in large power systems
CN112818588A (en) Optimal power flow calculation method and device for power system and storage medium
CN115313403A (en) Real-time voltage regulation and control method based on deep reinforcement learning algorithm
CN113872213B (en) Autonomous optimization control method and device for power distribution network voltage
CN112001066A (en) Deep learning-based method for calculating limit transmission capacity
CN113344283A (en) Energy internet new energy consumption capacity assessment method based on edge intelligence
CN117200213A (en) Power distribution system voltage control method based on self-organizing map neural network deep reinforcement learning
Kang et al. Stability analysis of TSK fuzzy systems
CN112994016B (en) Method and system for adjusting restoration resolvable property of power flow of power system
CN115829258A (en) Electric power system economic dispatching method based on polynomial chaotic approximate dynamic programming
CN114937999A (en) Machine learning-based steady-state reactive power optimization method for synchronous generator to improve voltage transient stability
Obert et al. Efficient distributed energy resource voltage control using ensemble deep reinforcement learning
Lin et al. Reactive power optimization in area power grid based on improved Tabu search algorithm
Zhu et al. Electric vehicle load forecasting based on improved neural network based on differential evolution algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant