CN113363997B - Reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning - Google Patents

Reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning Download PDF

Info

Publication number
CN113363997B
CN113363997B CN202110597000.4A CN202110597000A CN113363997B CN 113363997 B CN113363997 B CN 113363997B CN 202110597000 A CN202110597000 A CN 202110597000A CN 113363997 B CN113363997 B CN 113363997B
Authority
CN
China
Prior art keywords
reactive
power
action
voltage
agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110597000.4A
Other languages
Chinese (zh)
Other versions
CN113363997A (en
Inventor
胡丹尔
彭勇刚
杨晋祥
韦巍
蔡田田
习伟
邓清唐
李肖博
陈波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Southern Power Grid Digital Grid Research Institute Co Ltd
Original Assignee
Zhejiang University ZJU
Southern Power Grid Digital Grid Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU, Southern Power Grid Digital Grid Research Institute Co Ltd filed Critical Zhejiang University ZJU
Priority to CN202110597000.4A priority Critical patent/CN113363997B/en
Publication of CN113363997A publication Critical patent/CN113363997A/en
Application granted granted Critical
Publication of CN113363997B publication Critical patent/CN113363997B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/12Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load
    • H02J3/16Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load by adjustment of reactive power
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/04Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
    • H02J3/06Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/22The renewable source being solar energy
    • H02J2300/24The renewable source being solar energy of photovoltaic origin
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/28The renewable source being wind energy
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/40Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation wherein a plurality of decentralised, dispersed or local energy generation technologies are operated simultaneously
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Control Of Electrical Variables (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention relates to a power system operation and optimization technology, and aims to provide a reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning. The invention defines the on-load tap changing, the capacitor bank and the energy storage in the photovoltaic, the fan and the load as the intelligent agent, applies the reinforcement learning-based method to the reactive power optimization problem, and allows the controller to learn the control strategy through the interaction with the simulation model of the similar system. The action variables of the reactive power adjusting equipment are interacted with the power distribution network environment, and the intelligent agent can finally achieve the optimal response to the external environment, so that the maximum return value is obtained. The strategic function and the action value function of the intelligent agent are analyzed and fitted by a neural network method, and the training process does not depend on the prediction data result and the accurate trend modeling; by using the reactive power optimization method of two time scales, the network loss is smaller, the voltage stabilizing effect is better, and the effect of improving the safety and reliability of the distribution network is more remarkable.

Description

Reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning
Technical Field
The invention relates to the technical field of operation and optimization of power systems, in particular to a reactive voltage optimization method based on multi-time scale multi-agent deep reinforcement learning.
Background
With the access of a large number of renewable distributed power sources to a power distribution network, the random fluctuation of the output of wind power equipment and photovoltaic equipment and the uncertain fluctuation of loads can cause the problems of large voltage fluctuation, line crossing of voltage, improvement of network loss and the like of the operation of the power distribution network, and the quality of electric energy is influenced.
The reactive power optimization of the power distribution network aims to effectively ensure the stability of the voltage of each node, reduce the voltage fluctuation and reduce the network loss of the power distribution network under the condition of fully meeting the safe operation constraint of the power distribution network. Reactive power optimization of a power distribution network often involves a number of different variables, a constraint, often considered as a problem in non-linear planning. Although most of the existing model-based reactive voltage optimization methods can effectively inhibit the voltage out-of-limit problem, the existing model-based reactive voltage optimization methods depend on accurate models and prediction data to a great extent, are slow in calculation speed and are prone to falling into local optimization.
Disclosure of Invention
The invention aims to solve the problem of overcoming the defects in the prior art and provides a reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning.
In order to solve the technical problem, the solution of the invention is as follows:
the reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning is provided and comprises the following steps:
(1) dividing a reactive voltage control process of a power distribution network accessed to a renewable distributed power supply into a first time scale stage and a second time scale stage; wherein, the first 1 hour is used as the scheduling period of the first time scale stage; the control process is a second time scale stage, and every 1 minute is taken as a scheduling period in the stage;
(2) defining an on-load voltage regulation tap (OLTC), a Capacitor Bank (CB) and an Energy Storage (ES) as an intelligent agent, and building an interactive training environment of a Markov decision process of an environment and intelligent agent interaction at a first time scale stage; in the interactive training of the process, prediction data of photovoltaic, fan and load are input, and a DDQN algorithm (Double Q Network) is adopted to perform offline training of a reactive power optimization discrete action strategy; after training is finished, obtaining scheduling strategies of the OLTC, CB and ES of the intelligent agent, and taking the optimal reactive power control strategy generated by training as the input of a second time scale scheduling stage;
in a first time scale scheduling phase, describing a markov process of interaction between an environment and an agent OLTC, CB and ES as < S, R, a, P, γ >; wherein S represents a system state space and is a set of all states that the intelligent can sense; r is a return space, which is a collection of returns to the agent according to the state actions; a is an action space which is a collection of actions of a decision-making main body on the environment; p is the state transition probability; γ is the discount rate in return, representing the conversion factor for future returns;
(3) in a second time scale stage, establishing an interactive training environment based on a Markov decision process in the same way based on the optimal reactive power control strategy obtained in the step (2); in the training process, a multi-agent deep reinforcement learning algorithm (MA-SAC) based on maximum entropy is adopted for training and optimizing an offline reactive voltage model;
in a second time scale scheduling phase, an interactive training environment based on a Markov process is built in the same way, and the process is described as the interactive training environment by tuples<N,S,a1,a2,…,aN,T,γ,r1,…,rN>(ii) a Wherein N represents the number of agents, S represents the system state, a1,a2,…,aNAn action set for an agent; t represents a state transition function, T: s x a1×…aN×S→[0,1]Giving the probability of the next state according to the current system state and the joint action; γ is a discount factor; in addition, with ri(st,a1,…,aN,st+1) Indicating the reward resulting from performing the join action s' while agent i is in state s.
(4) And deploying the reactive voltage model trained by the MA-SAC algorithm to a power distribution network with an online decision, and generating reactive output of intelligent agents in a photovoltaic system, a fan and an energy storage inverter in real-time scheduling by taking 1 minute as a time scale, so as to correct a scheduling result of reactive optimization in a first time scale scheduling stage, and further to relieve voltage fluctuation.
In the present invention, the step (2) specifically includes:
(2.1) under the condition of meeting the constraint conditions of the voltage of the power grid and the operation of the reactive compensation equipment, the reactive optimization objective function of the power distribution network is defined as the total active power loss P of the power distribution network by adjusting the output of an inverter of an intelligent agent of the reactive compensation equipmentlossThe minimum, and the constraint condition includes the upper and lower limit constraints of node voltage, reactive power and action quantity change and the constraint of a power flow equation;
(2.2) establishing an interactive training environment of a Markov decision process according to the reactive voltage optimization objective function and the model of the constraint condition; and taking the intelligent agent optimal small-scale control strategy of the trained DDQN deep reinforcement learning reactive voltage optimization scheduling model as the input of the real-time scheduling strategy of the second-stage time scale.
The step (2.2) specifically comprises:
(2.2.1) in the process of training the intelligent agent, according to the state adjustment strategy function in the power distribution system, taking control measures aiming at given operating conditions to realize reactive power optimization; for a given action by a multi-agent, the environment provides the voltage on all buses in the power distribution system as the state of the DDQN model; the concrete expression is as follows:
s={Ui,Wi,Cwi}
wherein, UiA node voltage matrix of the power distribution network in the ith decision stage; wiSwitching gears of each regulating device in the ith scheduling period; cwiAdjusting the actions completed by each device in i scheduling periods;
(2.2.2) constructing motion vectors of the on-load tap changer (OLTC), the Capacitor Bank (CB) and the Energy Storage (ES):
a={Tol,Tcb,Tes}
wherein, Tol,Tcb,TesThe tap gear of the OLTC, the switching group number of the capacitor group CB and the reactive power output of the ES are respectively;
(2.2.3) adopting a DDQN algorithm to train a reactive power optimization model in an off-line manner, taking the state as the input of a neural network, calculating all action value functions by using the neural network, and directly outputting a corresponding action-value Q value for evaluating the expected value of a certain action behavior in the current given state;
(2.2.4) defining the target function of the DDQN algorithm reinforcement learning as follows:
Q*(s,a)=Es′~e[r+γmaxa′Q*(s′,a′)|s,a]
wherein the action cost function Q*(s, a) is the maximum value of all action values; r is the corresponding reward observed, γ is the discount factor for each step in the future, and s 'and a' are the next state and actions that may be taken, respectively;
(2.2.5) initializing the capacity of the multiplexing pool D to be N; the Q value corresponding to the initialization action is random, and an optimal reactive power compensation strategy is generated according to the reactive power optimization model; the method specifically comprises the following steps:
a. initialization sequence s1={x1And the preprocessing sequence of the first state phi1=φ(s1) (ii) a Wherein x is1Is in a first state;
b. randomly selecting an action a with a probability of epsilontOtherwise, select action at=maxa Q*(s', a; θ); wherein, theta is the weight of the neural network function, s' is the next state, and a is the current action;
c. performing actions in an electrical distribution networkAs atAnd observe the corresponding rtAnd state st+1(ii) a Let the next state st+1=st,at,xt+1Next preprocessing sequence phit+1=φ(st+1);
d. Samples are stored in an empirical multiplexing pool D and small batches of samples (phi) are randomly drawn therefromj,aj,rj,φj+1) (ii) a Order to
Figure GDA0003574188050000031
Wherein r isjGamma is the discount factor per step in the future, s, for observing the corresponding rewardt+1For the next observation state, θtIs a parameter of the current neural network, θ'tParameters of the neural network in the next step;
e. to pair
Figure GDA0003574188050000032
Performing gradient descent and repeating the step b;
and (2.2.6) taking the intelligent agent optimal small-scale control strategy of the reactive voltage optimization scheduling model trained in the first time scale scheduling stage as the input of the real-time scheduling strategy in the second time scale scheduling stage.
In the present invention, the reactive voltage model in step (3) specifically includes:
under the condition of meeting the constraint conditions of the voltage of a power grid and the operation of reactive compensation equipment, the reactive optimization objective function f of the power distribution network is realized by adjusting the intelligent agent of the reactive compensation equipment1Defined as the sum of active power losses P of the distribution networklossAnd the constraint conditions comprise upper and lower limit constraints of node voltage, reactive power and action quantity change and constraints of a power flow equation, and the specific formula is defined as follows:
Figure GDA0003574188050000041
Umin≤Ui≤Umax
φmin≤φi≤φmax
Figure GDA0003574188050000042
Figure GDA0003574188050000043
Figure GDA0003574188050000044
wherein, Umin,UmaxRespectively, node i voltage UiLower and upper limits of (d); phi is amin,φmaxRespectively is the lower limit and the upper limit of the output of the CB action frequency of the node i; lij=|Iij|2,uj=|Uj|2,IijCurrent for branch ij; i: i → j indicates that in the branch ij, nodes including i flow from the node i to the node j; k: j → k indicates that in the branch jk, the node beginning with j flows from the node j to the node k; pij、QijRespectively the active and reactive power flowing through branch ij; r isij、xijThe resistance and reactance of the branch ij are respectively; p is a radical ofjAnd q isjRespectively the active power and the reactive power injected into the node j.
In the invention, the step (3) of training the offline reactive voltage optimization model by adopting an MA-SAC algorithm specifically comprises the following steps:
under the condition that the operation constraint conditions of the power grid voltage and reactive power compensation equipment are met, the reactive power output of the photovoltaic, the fan and the energy storage inverter is adjusted, the scheduling result of reactive power optimization in the first time scale stage is corrected in the second time scale stage, the objective function is defined as the minimum node voltage deviation of the power distribution network, and the constraint conditions comprise node voltage, photovoltaic fan output upper and lower limit constraints and the constraint of a power flow equation; the specific formula is defined as follows:
Figure GDA0003574188050000045
Umin≤Ui≤Umax
Figure GDA0003574188050000046
Figure GDA0003574188050000047
Figure GDA0003574188050000048
Figure GDA0003574188050000049
Figure GDA00035741880500000410
wherein, Umin,UmaxRespectively, node i voltage UiLower and upper limits of (d); u shapei,baseIs the reference voltage amplitude of node i; pPV,QPV,SPVRespectively the active power, the reactive power and the apparent power of the photovoltaic equipment; pWD,QWD,SWDRespectively the active power, the reactive power and the apparent power of the photovoltaic equipment;
and establishing an interactive training environment of the Markov process according to the reactive voltage optimization objective function and the constraint condition model.
In the invention, in the step (3), an MA-SAC algorithm is adopted to train an offline reactive voltage optimization model in the training process; the process specifically comprises the following steps:
(3.1) for a given action by the multi-agent, the environment provides the voltage on all buses in the power distribution system as the state of the MA-SAC model; s, which is formulated as:
s={Ut,PPV,t,PWD,t}
wherein, UtA node voltage matrix of the power distribution network in the tth decision stage; p isPV,tPhotovoltaic active power in the t scheduling period; pWD,tThe active power of the fan in the t scheduling period is set;
(3.2) in the off-line training process, adopting an MA-SAC off-line training reactive power optimization model, in the entropy regularization expression, each agent can obtain a positive reward which is in direct proportion to the current strategy entropy when training, and the target can be defined as
Figure GDA0003574188050000051
x represents information of the entire environment,
Figure GDA0003574188050000052
is the ith centralized evaluation function whose input is the action a taken by each agentiAnd environment information x, the output of which is the Q value of the ith agent, where entropy H is expressed as
H(π(·|st))=-∑aπ(a|st)lnπ(a|st)
(3.3) initializing a random process B for action exploration; initializing an environment state x, and generating an optimal photovoltaic and fan real-time dynamic strategy according to the intraday reactive power optimization model, wherein the method comprises the following steps:
a. selecting action a for each agentiExecuting action a ═ a1,…,aNObserving the reward r and the next state x ', and storing the conversion quadruple (x, a, r, x') into an experience playback pool D;
b. for each agent, K samples (x) are extracted from Dj,aj,rj,x′j) Let yj=rj+γQμ(x′j,a′1,…,a′N) (ii) a It is composed ofMiddle rjTo observe the corresponding reward, γ is the discount factor, x ', for each step in the future'jIs in the next observation state, a'1,…,a′NActions made by agents 1-N, respectively;
c. update the merit function by minimizing the loss function:
Figure GDA0003574188050000053
updating the action function by gradient descent:
Figure GDA0003574188050000054
and repeating step a;
d. updating parameters of the target neural network of each agent: theta'i=τθi+(1-τ)θ′i(ii) a Wherein tau is an update weight;
and (3.4) scheduling the reactive power output of the photovoltaic, the fan and the energy storage inverter by the trained reactive voltage optimization scheduling model in the second time scale stage in a 1-minute level, so as to correct the reactive power strategy in the day ahead and achieve better reactive voltage control effect.
Compared with the prior art, the invention has the beneficial effects that:
(1) the invention designs a reactive voltage optimization method under two time scales. In the first time scale scheduling stage, 1 hour is used as a time scale to schedule OLTC, CB and ES; and in the second time scale scheduling stage, the reactive power output of PV, WD and ES is scheduled by adopting 1 minute as the time scale, and the reactive power optimization strategy in the first time scale scheduling stage is modified.
(2) Compared with the traditional reactive power optimization model, the deep reinforcement learning model is adopted in the first and second time scale scheduling stages, so that the strategy scheduling can be carried out more quickly and in real time, and the accurate load flow modeling is not needed and the accurate load prediction data is not depended on.
(3) According to the invention, OLTC tap positions, CB switching and ES reactive power output are taken as discrete action variables in a first time scale dispatching stage, and PV, WD and ES reactive power output are taken as continuous action variables in a second time scale dispatching stage. In the second time scale scheduling stage, a centralized training and decentralized execution mode is adopted, the deep neural network training is completed in the process of interaction with the power distribution network environment, and the trained network can be quickly decided, so that the voltage deviation is effectively reduced.
Drawings
FIG. 1 is a first time scale scheduling stage deep reinforcement learning DDQN algorithm training framework;
fig. 2 is a second time scale scheduling phase MA-SAC algorithm centralized training-decentralized execution framework.
Detailed Description
The applicant's group of inventors was inspired by a data-driven approach, defining on-load tap-changing (OLTC), Capacitor Bank (CB) and Energy Storage (ES) in photovoltaic, wind and load as agents, applying reinforcement learning based approach to reactive power optimization problem, allowing controllers to learn control strategies by interacting with simulation models of similar systems. The action variables of the reactive power regulation equipment are interacted with the power distribution network environment, the interaction process is described into a process (markov decision process) called Markov decision through a time sequence in mathematics, and the intelligent agent can finally achieve the optimal response to the external environment, so that the maximum return value is obtained. The strategic function and the action value function of each agent are analyzed and fitted by a neural network method, the training process does not depend on a predicted data result and accurate trend modeling, and the two time-scale reactive power optimization methods based on deep reinforcement learning can reduce the network loss and achieve better voltage stabilizing effect and more remarkable effect on improving the safety and reliability of the distribution network.
The multi-agent deep reinforcement learning reactive voltage optimization method based on multi-time scale is exemplified by taking an n-node power distribution network as an example.
Every the node all can have the state of measurationing, wherein, a node is equipped with on-load voltage regulation tap (OLTC), and at least one node is equipped with Capacitor Bank (CB), and at least one node is equipped with photovoltaic equipment (PV), and at least one node equipment is equipped with fan equipment, and at least one node is equipped with energy storage Equipment (ES). The optimization target is set to reduce the system grid loss and the voltage deviation, a Centralized Training and Decentralized Execution (CTDE) framework is utilized, the time scale of a first time scale scheduling stage is 1 hour, a DDQN algorithm is adopted to train an optimization model, the gear of an on-load tap changer (OLTC), the number of switching groups of a Capacitor Bank (CB) and the reactive power output of Energy Storage (ES) are scheduled, and the optimization process can be described as a Markov game process; the time scale of the second time scale scheduling stage is 1 minute, a multi-agent deep reinforcement learning algorithm (MA-SAC) based on maximum entropy is adopted to train a reactive voltage optimization model in an off-line mode, reactive power output of photovoltaic and fan power generation equipment and an energy storage inverter is dynamically adjusted in real time, the scheduling result of reactive power optimization in the first time scale scheduling stage is corrected, and rapid voltage fluctuation is relieved.
First time scale scheduling stage
As shown in FIG. 1, the Markov process of the distribution grid environment and the agent interaction process may be described as<S,R,A,P,γ,>. Under the condition of meeting the constraint conditions of the voltage of the power grid and the operation of reactive compensation equipment, the reactive optimization objective function of the power distribution network can be defined as the sum P of the active power loss of the power distribution network by adjusting the reactive equipment OLTC, CB and ESlossMinimum, and constraint conditions including node voltage UiUpper and lower limits of variation of motion amountiAnd the constraint of the power flow equation, wherein the specific formula is defined as follows:
Figure GDA0003574188050000071
Umin≤Ui≤Umax
φmin≤φi≤φmax
Figure GDA0003574188050000072
Figure GDA0003574188050000073
Figure GDA0003574188050000074
wherein, Umin,UmaxRespectively, node i voltage UiLower and upper limits of (d); phi is amin,φmaxThe lower limit and the upper limit of the output force of the CB action frequency of the node i are respectively, for example, the upper limit and the lower limit of the voltage of each node can be set to be 0.95 and 1.05; the number of CB actions in a day does not exceed 5.
According to the reactive voltage optimization objective function and the constraint condition model, establishing an interactive training environment of a Markov process, wherein the process comprises the following steps:
1. during the training process of the intelligent agent, strategy functions can be adjusted according to states in the power distribution system, and control measures are taken according to given operating conditions so as to achieve reactive power optimization. For a given action by a multi-agent, the environment provides the voltage on all buses in the power distribution system as the state of the DDQN model. Can be expressed as s ═ Ui,Wi,CwiIn which U isiA node voltage matrix of the power distribution network in the ith decision stage; wiSwitching gears of each regulating device in the ith scheduling period; cwiThe actions that the device has completed are adjusted for each of the i scheduling periods.
2. Constructing motion vector a ═ { T of OLTC, CB and ESol,Tcb,TesIn which T isol,Tcb,TesThe tap gear of the OLTC, the switching group number of the capacitor group CB and the reactive power output of the ES are respectively.
3. And (3) performing offline training of a reactive power optimization model by adopting a DDQN (double Q network) algorithm, taking the state as the input of a neural network, calculating all action value functions by using the neural network, and directly outputting corresponding Q values.
4. Defining an objective function of DDQN reinforcement learning as
Q*(s,a)=Es′~e[r+γmaxa′Q*(s′,a′)|s,a]
Wherein r is the corresponding reward for observation, γ is the discount factor for each step in the future, and s 'and a' are the next state and actions that may be taken, respectively;
5. initializing the capacity of a multiplexing pool D to be N; the Q value corresponding to the initialization action is random, and an optimal reactive power compensation strategy is generated according to the reactive power optimization model, and the method comprises the following steps:
1) initialization sequence s1={x1And the preprocessing sequence of the first state phi1=φ(s1) (ii) a Wherein x is1Is in a first state;
2) randomly selecting an action a with a probability of epsilontOtherwise, select action at=maxa Q*(s ', a; θ), where θ is the weight of the neural network function, s' is the next state, and a is the current action;
3) performing an action a in a power distribution networktAnd observe the corresponding rtAnd state st+1(ii) a Let the next state st+1=st,at,xt+1(ii) a Next preprocessing sequence phit+1=φ(st+1);
4) Samples are stored in an empirical multiplexing pool D and small batches of samples (phi) are randomly drawn therefromj,aj,rj,φj+1) (ii) a Order to
Figure GDA0003574188050000081
Wherein r isjGamma is the discount factor per step in the future, s, for observing the corresponding rewardt+1For the next observation state, θtIs a parameter of the current neural network, θ'tParameters of the neural network in the next step;
5) to pair
Figure GDA0003574188050000091
Performing gradient descent and repeating the step (1).
6. And taking the optimal OLTC, CB and ES hour-scale control strategy of the reactive voltage optimized scheduling model trained in the first time scale scheduling stage as the input of the real-time scheduling strategy of the second time scale scheduling stage.
(II) second time scale scheduling phase
As shown in FIG. 2, an interactive training environment based on a Markov process is also built, which can be described as a tuple<N,S,a1,a2,…,aN,T,γ,r1,…,rN>Where N represents the number of agents, S represents the system state, a1,a2,…,aNIs a set of actions for the agent. T represents a state transition function, T: s x a1×…aN×S→[0,1]And giving the probability of the next state according to the current system state and the joint action. Gamma is a discount factor. In addition, with ri(st,a1,…,aN,st +1) Representing the reward resulting from performing the join action s' while agent i is in state s.
The reactive voltage model of the power distribution network comprises:
under the condition that the constraint conditions of grid voltage and reactive power compensation equipment operation are met, the reactive power output of photovoltaic fans and the reactive power output of fans are adjusted, the second stage corrects the scheduling result of reactive power optimization in the day-ahead scheduling stage, the objective function is defined as the minimum node voltage deviation of the power distribution network, the constraint conditions comprise node voltage, photovoltaic fan output upper and lower limit constraints and the constraints of a power flow equation, and the specific formula is defined as follows:
Figure GDA0003574188050000092
Umin≤Ui≤Umax
Figure GDA0003574188050000093
Figure GDA0003574188050000094
Figure GDA0003574188050000095
Figure GDA0003574188050000096
Figure GDA0003574188050000097
wherein, Umin,UmaxRespectively, node i voltage UiThe lower and upper limits of (d); u shapei,baseIs the reference voltage amplitude of node i; pPV,QPV,SPVRespectively the active power, the reactive power and the apparent power of the photovoltaic equipment; pWD,QWD,SWDRespectively the active power, the reactive power and the apparent power of the photovoltaic equipment;
according to the reactive voltage optimization objective function and the constraint condition model, establishing an interactive training environment of a Markov process, wherein the process comprises the following steps:
1. for a given action by a multi-agent, the environment provides the voltage on all buses in the power distribution system as a state of the MA-SAC model. Can be expressed as s ═ Ut,PPV,t,PWD,tIn which U istA node voltage matrix of the power distribution network in the tth decision stage; pPV,tPhotovoltaic active power in the t scheduling period; pWD,tAnd the active power of the fan in the t scheduling period.
2. In the off-line training process, an MA-SAC off-line training reactive power optimization model is adopted, in an entropy regularization expression, each agent can obtain a forward reward which is in direct proportion to the current strategy entropy during training, and the target can be defined as:
Figure GDA0003574188050000101
x represents information of the entire environment,
Figure GDA0003574188050000102
is the ith centralized evaluation function whose input is the action a taken by each agentiAnd environment information x, the output of which is the Q value of the ith agent, where entropy H is expressed as
Figure GDA0003574188050000103
3. Initializing a random process B for action exploration; initializing an environment state x, and generating an optimal photovoltaic and fan real-time dynamic strategy according to the intraday reactive power optimization model, wherein the method comprises the following steps:
1) selecting action a for each agentiExecuting action a ═ a1,…,aNObserving the reward r and the next state x ', and storing the conversion quadruple (x, a, r, x') into an experience playback pool D;
2) for each agent, K samples (x) are extracted from Dj,aj,rj,x′j) Let yj=rj+γQμ(x′j,a′1,…,a′N) (ii) a Wherein r isjTo observe the corresponding reward, γ is the discount factor, x ', for each step in the future'jIs in the next observation state, a'1,…,a′NActions made by agents 1-N, respectively;
3) update the merit function by minimizing the loss function:
Figure GDA0003574188050000104
updating the action function by gradient descent:
Figure GDA0003574188050000105
and repeating the step (1).
4) Updating parameters of the target neural network of each agent: theta'i=τθi+(1-τ)θ′i. Wherein, tau is an updating weight;
4. and scheduling the reactive power output of the photovoltaic, the fan and the energy storage inverter by the trained reactive power voltage optimization scheduling model in the second time scale stage in a 1-minute level, so as to correct the reactive power strategy in the day ahead and achieve better reactive power voltage control effect.

Claims (7)

1. A reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning is characterized by comprising the following steps:
(1) dividing a reactive voltage control process of a power distribution network accessed to a renewable distributed power supply into a first time scale stage and a second time scale stage; wherein, the first 1 hour is used as the scheduling period of the first time scale stage; the control process is a second time scale stage, and every 1 minute is taken as a scheduling period in the stage;
(2) defining an on-load tap changing OLTC, a capacitor bank CB and an energy storage ES as an intelligent agent, and building an interactive training environment of a Markov decision process of an environment and intelligent agent interaction at a first time scale stage; in the interactive training of the process, prediction data of photovoltaic, a fan and a load are input, and a DDQN algorithm Double Q Network is adopted to perform offline training of a reactive power optimization discrete action strategy; after training is finished, obtaining scheduling strategies of the OLTC, CB and ES of the intelligent agent, and taking the optimal reactive power control strategy generated by training as the input of a second time scale scheduling stage;
(3) in a second time scale stage, establishing an interactive training environment based on a Markov decision process in the same way based on the optimal reactive power control strategy obtained in the step (2); in the training process, a multi-agent deep reinforcement learning algorithm MA-SAC based on maximum entropy is adopted for training and optimizing an offline reactive voltage model;
(4) and deploying the reactive voltage model trained by the MA-SAC algorithm to a power distribution network with an online decision, and generating reactive output of intelligent agents in a photovoltaic system, a fan and an energy storage inverter in real-time scheduling by taking 1 minute as a time scale, so as to correct a scheduling result of reactive optimization in a first time scale scheduling stage, and further to relieve voltage fluctuation.
2. The method of claim 1, wherein in the first time scale scheduling phase of step (2), the markov process of interaction between the environment and the agents OLTC, CB and ES is described as < S, R, a, P, γ >; wherein S represents a system state space and is a set of all states that the intelligent can sense; r is a reward space, which is a collection of even rewards that the environment returns to the agent according to state actions; a is an action space which is a collection of actions of a decision-making main body on the environment; p is the state transition probability; γ is the discount rate in return, representing the conversion factor for future returns;
in the second time scale scheduling stage in the step (3), an interactive training environment based on the Markov process is built in the same way, and the process is described as an interactive training environment by a tuple<N,S,a1,a2,…,aN,T,γ,r1,…,rN>(ii) a Wherein N represents the number of agents, S represents the system state, a1,a2,…,aNAn action set for an agent; t represents a state transition function, T: s x a1×…aaN×S→[0,1]Giving the probability of the next state according to the current system state and the joint action; γ is a discount factor; in addition, with ri(st,a1,…,aN,st+1) Representing the reward resulting from performing the join action s' while agent i is in state s.
3. The method according to claim 1, wherein the step (2) comprises in particular:
(2.1) under the condition of meeting the constraint conditions of the voltage of the power grid and the operation of reactive compensation equipment, defining a reactive optimization objective function of the power distribution network as the total active power loss P of the power distribution network by adjusting the output of an inverter of an agent of the reactive compensation equipmentlossMinimum, and aboutThe beam condition comprises upper and lower limit constraints of node voltage, reactive power and action quantity change and constraints of a power flow equation;
(2.2) establishing an interactive training environment of a Markov decision process according to the reactive voltage optimization objective function and the model of the constraint condition; and taking the intelligent agent optimal small-scale control strategy of the trained DDQN deep reinforcement learning reactive voltage optimization scheduling model as the input of the real-time scheduling strategy of the second-stage time scale.
4. The method according to claim 3, characterized in that said step (2.2) comprises in particular:
(2.2.1) in the training process of the intelligent agent, according to the state adjustment strategy function in the power distribution system, taking control measures aiming at given operating conditions to realize reactive power optimization; for a given action by a multi-agent, the environment provides the voltage on all buses in the power distribution system as the state of the DDQN model; the concrete expression is as follows:
s={Ui,Wi,Cwi}
wherein, UiA node voltage matrix of the power distribution network in the ith decision stage; wiSwitching gears of each regulating device in the ith scheduling period; cwiAdjusting the actions completed by each device in i scheduling periods;
(2.2.2) constructing a motion vector of the agent:
a={Tol,Tcb,Tes}
wherein, Tol,Tcb,TesThe on-off tap gear of the OLTC, the switching group number of the capacitor group CB and the reactive power output of the ES are respectively;
(2.2.3) adopting a DDQN algorithm to train a reactive power optimization model in an off-line manner, taking the state as the input of a neural network, calculating all action value functions by using the neural network, and directly outputting a corresponding action-value Q value for evaluating the expected value of a certain action behavior in the current given state;
(2.2.4) defining an objective function of the DDQN algorithm reinforcement learning as follows:
Q*(s,a)=Es′~e[r+γmaxa′Q*(s′,a′)|s,a]
wherein the action cost function Q*(s, a) is the maximum value of all action values; r is the corresponding reward observed, γ is the discount factor for each step in the future, and s 'and a' are the next state and actions that may be taken, respectively;
(2.2.5) initializing the capacity of the multiplexing pool D to be N; the Q value corresponding to the initialization action is random, and an optimal reactive power compensation strategy is generated according to the reactive power optimization model; the method specifically comprises the following steps:
a. initialization sequence s1={x1And the preprocessing sequence of the first state phi1=φ(s1) (ii) a Wherein x is1Is in a first state;
b. randomly selecting an action a with a probability of epsilontOtherwise, select action at=maxa Q*(s', a; θ); wherein, theta is the weight of the neural network function, s' is the next state, and a is the current action;
c. performing an action a in a power distribution networktAnd observe the corresponding rtAnd state st+1(ii) a Let the next state st+1=st,at,xt+1Next preprocessing sequence phit+1=φ(st+1);
d. Samples are stored in an empirical multiplexing pool D and small batches of samples (phi) are randomly drawn therefromj,aj,rj,φj+1) (ii) a Order to
Figure FDA0003574188040000031
Wherein r isjGamma is the discount factor per step in the future, s, for observing the corresponding rewardt+1For the next observation state, θtIs a parameter of the current neural network, θ'tParameters of the neural network in the next step;
e. to pair
Figure FDA0003574188040000032
Performing gradient descent and repeating the step b;
and (2.2.6) taking the intelligent agent optimal small-scale control strategy of the reactive voltage optimization scheduling model trained in the first time scale scheduling stage as the input of the real-time scheduling strategy in the second time scale scheduling stage.
5. The method according to claim 1, wherein the reactive voltage model in step (3) comprises in particular:
under the condition of meeting the constraint conditions of the voltage of a power grid and the operation of reactive compensation equipment, the reactive optimization objective function f of the power distribution network is realized by adjusting the intelligent agent of the reactive compensation equipment1Defined as the sum of active power losses P of the distribution networklossAnd the constraint conditions comprise upper and lower limit constraints of node voltage, reactive power and action quantity change and constraints of a power flow equation, and the specific formula is defined as follows:
Figure FDA0003574188040000033
Umin≤Ui≤Umax
φmin≤φi≤φmax
Figure FDA0003574188040000034
Figure FDA0003574188040000035
Figure FDA0003574188040000036
wherein, Umin,UmaxRespectively, node i voltage UiLower and upper limits of (d); phi is amin,φmaxCB of nodes i respectivelyThe lower limit and the upper limit of the output of the action times; lij=|Iij|2,uj=|Uj|2,IijCurrent for branch ij; i: i → j indicates that in the branch ij, nodes including i flow from the node i to the node j; k: j → k indicates that in the branch jk, the node beginning with j flows from the node j to the node k; pij、QijRespectively the active and reactive power flowing through branch ij; r isij、xijRespectively the resistance and reactance of the branch ij; p is a radical ofjAnd q isjRespectively the active power and the reactive power injected into the node j.
6. The method according to claim 1, wherein the step (3) of training the offline reactive voltage optimization model by using an MA-SAC algorithm specifically comprises the following steps:
under the condition that the operation constraint conditions of the power grid voltage and reactive power compensation equipment are met, the reactive power output of the photovoltaic, the fan and the energy storage inverter is adjusted, the scheduling result of reactive power optimization in the first time scale stage is corrected in the second time scale stage, the objective function is defined as the minimum node voltage deviation of the power distribution network, and the constraint conditions comprise node voltage, photovoltaic fan output upper and lower limit constraints and the constraint of a power flow equation; the specific formula is defined as follows:
Figure FDA0003574188040000041
Umin≤Ui≤Umax
Figure FDA0003574188040000042
Figure FDA0003574188040000043
Figure FDA0003574188040000044
Figure FDA0003574188040000045
Figure FDA0003574188040000046
wherein, Umin,UmaxRespectively, node i voltage UiLower and upper limits of (d); u shapei,baseIs the reference voltage amplitude of node i; pPV,QPV,SPVRespectively the active power, the reactive power and the apparent power of the photovoltaic equipment; pWD,QWD,SWDRespectively the active power, the reactive power and the apparent power of the photovoltaic equipment;
wherein lij=|Iij|2,uj=|Uj|2,IijCurrent for branch ij; i: i → j indicates that in the branch ij, nodes including i flow from the node i to the node j; k: j → k indicates that in the branch jk, the node beginning with j flows from the node j to the node k; pij、QijRespectively the active and reactive power flowing through branch ij; r isij、xijThe resistance and reactance of the branch ij are respectively; p is a radical ofjAnd q isjRespectively injecting active power and reactive power of the node j;
and establishing an interactive training environment of the Markov process according to the reactive voltage optimization objective function and the constraint condition model.
7. The method according to claim 1, wherein in the step (3), an MA-SAC algorithm is adopted to train an offline reactive voltage optimization model in the training process; the process specifically comprises the following steps:
(3.1) for a given action of the multi-agent, the environment provides the voltage on all buses in the power distribution system as the current environment state s of the MA-SAC model, which is formulated as:
s={Ut,PPV,t,PWD,t}
wherein, UtA node voltage matrix of the power distribution network at the t decision stage; pPV,tPhotovoltaic active power in the t scheduling period; pWD,tThe active power of the fan in the t scheduling period is set;
(3.2) in the off-line training process, adopting an MA-SAC off-line training reactive power optimization model, in the entropy regularization expression, each agent can obtain a positive reward which is in direct proportion to the current strategy entropy when training, and the target of the positive reward is defined as
Figure FDA0003574188040000047
x represents information of the entire environment,
Figure FDA0003574188040000048
is the ith centralized evaluation function whose input is the action a taken by each agentiAnd environment information x, the output of which is the Q value of the ith agent, where entropy H is expressed as
H(π(·|st))=-∑aπ(a|st)lnπ(a|st)
(3.3) initializing a random process B for action exploration; initializing an environment state x, and generating an optimal photovoltaic and fan real-time dynamic strategy according to a reactive power optimization model in a day, wherein the method comprises the following steps:
a. selecting action a for each agentiExecuting action a ═ a1,…,aNObserving the reward r and the next state x ', and storing the converted quadruple (x, a, r, x') into an experience simultaneous storage pool D;
b. for each agent, K samples (x) are extracted from Dj,aj,rj,x′j) Let yj=rj+γQμ(x′j,a′1,…,a′N) (ii) a Wherein r isjGamma is the discount factor, x ', for each step in the future to observe the corresponding reward'jIs in the next observation state, a'1,…,a′NActions made by agents 1-N, respectively;
c. update the merit function by minimizing the loss function:
Figure FDA0003574188040000051
updating the action function by gradient descent:
Figure FDA0003574188040000052
and repeating step a;
d. updating parameters of the target neural network of each agent: theta'i=τθi+(1-τ)θ′i(ii) a Wherein, tau is an updating weight;
and (3.4) scheduling the reactive power output of the photovoltaic, the fan and the energy storage inverter by the trained reactive voltage optimization scheduling model in the second time scale stage in a 1-minute level, so as to correct the reactive power strategy in the day ahead and achieve better reactive voltage control effect.
CN202110597000.4A 2021-05-28 2021-05-28 Reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning Active CN113363997B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110597000.4A CN113363997B (en) 2021-05-28 2021-05-28 Reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110597000.4A CN113363997B (en) 2021-05-28 2021-05-28 Reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113363997A CN113363997A (en) 2021-09-07
CN113363997B true CN113363997B (en) 2022-06-14

Family

ID=77528260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110597000.4A Active CN113363997B (en) 2021-05-28 2021-05-28 Reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113363997B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110212570B (en) * 2019-05-14 2023-03-28 国网内蒙古东部电力有限公司电力科学研究院 Wind power plant equivalent model based on MMSE mining and construction method and application thereof
US20230074995A1 (en) * 2021-09-09 2023-03-09 Siemens Aktiengesellschaft System and method for controlling power distribution systems using graph-based reinforcement learning
CN113872213B (en) * 2021-09-09 2023-08-29 国电南瑞南京控制***有限公司 Autonomous optimization control method and device for power distribution network voltage
CN113807029B (en) * 2021-10-19 2022-07-29 华北电力大学(保定) Deep reinforcement learning-based double-time-scale new energy power grid voltage optimization method
CN113937829A (en) * 2021-11-16 2022-01-14 华北电力大学 Active power distribution network multi-target reactive power control method based on D3QN
CN114362188B (en) * 2022-01-07 2023-06-02 天津大学 Multi-terminal intelligent soft switch voltage control method based on deep reinforcement learning
CN114069650B (en) * 2022-01-17 2022-04-15 南方电网数字电网研究院有限公司 Power distribution network closed loop current regulation and control method and device, computer equipment and storage medium
CN114336667B (en) * 2022-01-22 2023-06-27 华北电力大学(保定) Reactive voltage intelligent optimization method for high-proportion wind-solar new energy power grid
CN114625091A (en) * 2022-03-21 2022-06-14 京东城市(北京)数字科技有限公司 Optimization control method and device, storage medium and electronic equipment
CN114697200B (en) * 2022-03-30 2023-06-30 合肥工业大学 Protection device proportion optimization method of 5G distribution network distributed protection system
CN114665478B (en) * 2022-05-23 2022-10-11 国网江西省电力有限公司电力科学研究院 Active power distribution network reconstruction method based on multi-target deep reinforcement learning
CN115065064A (en) * 2022-06-09 2022-09-16 国网江苏省电力有限公司淮安供电分公司 Feeder line-transformer area two-stage voltage optimization method based on deep reinforcement learning
CN115062871B (en) * 2022-08-11 2022-11-29 山西虚拟现实产业技术研究院有限公司 Intelligent electric meter state evaluation method based on multi-agent reinforcement learning
CN115566692B (en) * 2022-11-08 2023-04-07 南方电网数字电网研究院有限公司 Method and device for determining reactive power optimization decision, computer equipment and storage medium
CN115731072B (en) * 2022-11-22 2024-01-30 东南大学 Micro-grid space-time perception energy management method based on safety deep reinforcement learning
CN116054185B (en) * 2023-03-30 2023-06-02 武汉新能源接入装备与技术研究院有限公司 Control method of reactive power compensator

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112465664A (en) * 2020-11-12 2021-03-09 贵州电网有限责任公司 AVC intelligent control method based on artificial neural network and deep reinforcement learning
CN112507614A (en) * 2020-12-01 2021-03-16 广东电网有限责任公司中山供电局 Comprehensive optimization method for power grid in distributed power supply high-permeability area
CN112615379A (en) * 2020-12-10 2021-04-06 浙江大学 Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10985572B2 (en) * 2018-10-01 2021-04-20 Geiri Co Ltd, State Grid Jiangxi Electric Power Co, State Grid Corp of China SGCC, GEIRINA Optimal charging and discharging control for hybrid energy storage system based on reinforcement learning
US20200119556A1 (en) * 2018-10-11 2020-04-16 Di Shi Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112465664A (en) * 2020-11-12 2021-03-09 贵州电网有限责任公司 AVC intelligent control method based on artificial neural network and deep reinforcement learning
CN112507614A (en) * 2020-12-01 2021-03-16 广东电网有限责任公司中山供电局 Comprehensive optimization method for power grid in distributed power supply high-permeability area
CN112615379A (en) * 2020-12-10 2021-04-06 浙江大学 Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Two-Timescale_Voltage_Control_in_Distribution_Grids_Using_Deep_Reinforcement_Learning;Qiuling Yang;《IEEE TRANSACTIONS ON SMART GRID》;20200531;第2313-2323页 *
基于深度强化学习的配电网多时间尺度在线无功优化;倪爽;《电力***自动化》;20210525;第77-85页 *
基于电子搜索算法的电力***无功优化;黄泰相;《电子科技》;20190131;第58-71页 *

Also Published As

Publication number Publication date
CN113363997A (en) 2021-09-07

Similar Documents

Publication Publication Date Title
CN113363997B (en) Reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning
Li et al. Coordinated load frequency control of multi-area integrated energy system using multi-agent deep reinforcement learning
CN112615379B (en) Power grid multi-section power control method based on distributed multi-agent reinforcement learning
Gorostiza et al. Deep reinforcement learning-based controller for SOC management of multi-electrical energy storage system
Hu et al. Multi-agent deep reinforcement learning for voltage control with coordinated active and reactive power optimization
Erlich et al. Optimal dispatch of reactive sources in wind farms
CN113363998B (en) Power distribution network voltage control method based on multi-agent deep reinforcement learning
CN112507614B (en) Comprehensive optimization method for power grid in distributed power supply high-permeability area
CN114362187B (en) Active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning
CN111555297B (en) Unified time scale voltage control method with tri-state energy unit
Li et al. Grid-area coordinated load frequency control strategy using large-scale multi-agent deep reinforcement learning
CN113872213B (en) Autonomous optimization control method and device for power distribution network voltage
CN113937829A (en) Active power distribution network multi-target reactive power control method based on D3QN
CN115293052A (en) Power system active power flow online optimization control method, storage medium and device
CN115313403A (en) Real-time voltage regulation and control method based on deep reinforcement learning algorithm
Yin et al. Sequential reconfiguration of unbalanced distribution network with soft open points based on deep reinforcement learning
Du et al. Fuzzy logic control optimal realization using GA for multi-area AGC systems
CN117833263A (en) New energy power grid voltage control method and system based on DDPG
CN114841595A (en) Deep-enhancement-algorithm-based hydropower station plant real-time optimization scheduling method
CN110289643B (en) Rejection depth differential dynamic planning real-time power generation scheduling and control algorithm
Hong et al. MADRL-Based DSO-Customer Coordinated Bi-Level Volt/VAR Optimization Method for Power Distribution Networks
Sheng et al. Application of artificial intelligence techniques in reactive power/voltage control of power system
Wang et al. Real-time excitation control-based voltage regulation using ddpg considering system dynamic performance
Kang et al. Power flow coordination optimization control method for power system with DG based on DRL
Shi et al. Deep Reinforcement Learning-based Data-Driven Active Power Dispatching for Smart City Grid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant