CN113315131A - Intelligent power grid operation mode adjusting method and system - Google Patents

Intelligent power grid operation mode adjusting method and system Download PDF

Info

Publication number
CN113315131A
CN113315131A CN202110541404.1A CN202110541404A CN113315131A CN 113315131 A CN113315131 A CN 113315131A CN 202110541404 A CN202110541404 A CN 202110541404A CN 113315131 A CN113315131 A CN 113315131A
Authority
CN
China
Prior art keywords
power grid
bus
generator
power
active
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110541404.1A
Other languages
Chinese (zh)
Inventor
张静
杨靖萍
刁瑞盛
尚秀敏
叶琳
杨滢
周正阳
周靖皓
吕勤
徐建平
陈良亮
周材
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Zhejiang Electric Power Co Ltd
Nari Technology Co Ltd
Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
State Grid Zhejiang Electric Power Co Ltd
Nari Technology Co Ltd
Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Zhejiang Electric Power Co Ltd, Nari Technology Co Ltd, Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd filed Critical State Grid Zhejiang Electric Power Co Ltd
Priority to CN202110541404.1A priority Critical patent/CN113315131A/en
Publication of CN113315131A publication Critical patent/CN113315131A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/04Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
    • H02J3/06Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/48Controlling the sharing of the in-phase component
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a method and a system for intelligently adjusting the operation mode of a power grid, wherein the method comprises the following steps: acquiring current power grid data, inputting the current power grid data into a predetermined power grid model considering safety constraints for calculation, and extracting power grid operation state data according to a calculation result; inputting power grid operation state data into a pre-trained intelligent agent to obtain an optimal generator control strategy, wherein the generator control strategy is a generator active power control signal; and adjusting the power grid operation mode according to the optimal generator control strategy. The advantages are that: the invention utilizes the pre-trained intelligent agent to automatically search for feasible power grid operation conditions under the condition of considering uncertainty.

Description

Intelligent power grid operation mode adjusting method and system
Technical Field
The invention relates to an intelligent adjusting method and system for a power grid operation mode, and belongs to the technical field of power grid regulation and control.
Background
In recent years, energy and environment policies and standards greatly promote the rapid development of green energy, and the penetration ratio of renewable energy in a power grid is continuously improved. However, due to their intermittency, dynamics, and randomness, the access of large amounts of such energy into the grid presents significant challenges to the safe and economic operation of the power system. In the existing method, the future power grid operation mode is formulated by large-scale numerical simulation to find the optimal power grid operation mode under various fault conditions. The process includes power demand forecasting, new line construction planning, maintenance and outage planning, generator set planning, and the like. Due to the high complexity, non-linearity and dimensionality of the problem, this process typically requires a significant amount of human effort to achieve the desired goal empirically by manually modifying the model parameters. The power industry is lacking an effective method and tool to automate this process.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an intelligent power grid operation mode adjusting method and system.
In order to solve the technical problem, the invention provides an intelligent adjustment method for a power grid operation mode, which comprises the following steps:
acquiring current power grid data, inputting the current power grid data into a predetermined power grid model considering safety constraints for power grid load flow calculation, and extracting power grid operation state data according to a calculation result, wherein the current power grid data comprises a power grid topological structure, bus information, load information, generator output information, transformer information and a power grid control equipment state;
inputting power grid operation state data into a pre-trained intelligent agent to obtain an optimal generator control strategy, wherein the generator control strategy is a generator active power control signal;
and adjusting the power grid operation mode according to the optimal generator control strategy.
Further, the power grid model is as follows:
Figure BDA0003071703790000021
Figure BDA0003071703790000022
Figure BDA0003071703790000023
Figure BDA0003071703790000024
Figure BDA0003071703790000025
Figure BDA0003071703790000026
wherein the content of the first and second substances,
Figure BDA0003071703790000027
and
Figure BDA0003071703790000028
representing the active and reactive power output, P, of the generator n on the bus iij(y) and Qij(y) represents the active and reactive power from bus i to bus j, ViRepresenting the voltage amplitude of the bus i, B representing the bus set, the superscript g representing the generator, the superscript d representing the grid load, Pi gAnd
Figure BDA0003071703790000029
is the active power injection and reactive power injection of the generator on the bus i, Pi dAnd
Figure BDA00030717037900000210
is the load active and reactive power on bus i,
Figure BDA00030717037900000211
and
Figure BDA00030717037900000212
is the active and reactive power of the load m on the bus i, GiIs a set of generators on bus i, DiIs the set of loads on the bus i, BiIs a busbar set, g, forming a branch with busbar iiIs the self-conductance of the busbar i, biIs the self-susceptance of the bus i;
the constraint conditions of the safety constraint are as follows:
Figure BDA00030717037900000213
Figure BDA00030717037900000214
Figure BDA00030717037900000215
Figure BDA00030717037900000216
Figure BDA00030717037900000217
Figure BDA00030717037900000218
wherein the content of the first and second substances,
Figure BDA0003071703790000031
and
Figure BDA0003071703790000032
the upper limit and the lower limit of the active power of the generator are shown,
Figure BDA0003071703790000033
and
Figure BDA0003071703790000034
representing the upper and lower reactive limits of the generator, G representing the generator set, Vi minAnd Vi maxRepresenting the upper and lower bus voltage amplitude limits,
Figure BDA0003071703790000035
is the apparent power ceiling, Ω, of the transmission lineLRepresents a set of transmission lines, ΩTRepresents a set of transformers, gijIs the mutual conductance of bus i and bus j, θiIs the phase angle of the voltage of the bus i, thetajRepresenting the phase angle of the voltage of bus j, bijIs the mutual susceptance of the busbar i and the busbar j, bij0Is a contact line capacitor susceptance, PijAnd QijRespectively representing active and reactive power, V, on line ijiAnd VjRepresenting the voltage amplitude of bus i and bus j, respectively.
Further, the training process of the agent comprises:
obtaining historical power grid data, wherein the historical power grid data comprises a power grid topological structure, bus information, load information, generator output information, transformer information and a power grid control equipment state under different discontinuities, inputting the historical power grid data into the power grid model, obtaining power grid operation state data under corresponding time fractures and calculating an intelligent agent reward value;
the method comprises the steps that power grid operation state data under a certain time interval are used as input, a maximum entropy intelligent agent reinforcement learning algorithm is adopted, and intelligent agent control actions are obtained and are a generator control strategy;
inputting the obtained generator control strategy into the power grid model for calculation, and extracting power grid running state data under the next intermittent section according to a calculation result;
updating network parameters of the intelligent agent based on the power grid running state data under the time section, the reward value of the intelligent agent, the control action of the intelligent agent and the power grid running state data under the next intermittent section;
and (4) iterative loop calculation until the power of the transmission line in the controlled area does not exceed the safety limit under the basic state and the fault state of the power grid, and outputting the trained intelligent agent.
Further, a P-Q decomposition method, a Newton-Raphson method, a method for automatically converting P-Q into YR or a method for automatically converting P-Q into Newton-Raphson are adopted to solve the power grid model.
Further, the grid operating state data is expressed as:
s=(P,V,G)
wherein, P represents a group of line active power in a research area, V represents the voltage amplitude of a bus in the same area, and G represents the vector of the generator active power output.
Further, the calculating of the agent prize value comprises:
r=rcon+rbase
Figure BDA0003071703790000041
Figure BDA0003071703790000042
where r represents the prize value, rconIndicating a fault reward value, rbaseRepresenting the base state prize value, PfromAnd PtoIs a measure of the active power at the head and tail ends of the transmission line, PlimitIs the active upper limit of the line, a and b are respectively reward value coefficients, N is the total number of the line, k represents a line counter in a grid reference state, and l represents a counter in a grid fault state.
Further, the grid operation control targets are as follows:
Figure BDA0003071703790000043
Figure BDA0003071703790000044
where C (v) is the cost of electricity generated by generator v, C represents the set of generators considering the cost of operation, Ploss(i, j) is the value of the active loss on line ij, KmaxMin represents the minimum value for the total number of generators.
An intelligent regulation system for the operation mode of a power grid comprises:
the acquisition module is used for acquiring current power grid data, inputting the current power grid data into a predetermined power grid model considering safety constraints for power grid load flow calculation, and extracting power grid operation state data according to a calculation result, wherein the current power grid data comprises a power grid topological structure, bus information, load information, generator output information, transformer information and a power grid control equipment state;
the processing module is used for inputting the power grid operation state data into a pre-trained intelligent agent to obtain an optimal generator control strategy, wherein the generator control strategy is a generator active power control signal;
and the adjusting module is used for adjusting the power grid operation mode according to the optimal generator control strategy.
A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods.
A computing device, comprising, in combination,
one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods.
The invention achieves the following beneficial effects:
according to the invention, the feasible power grid operation condition is automatically searched under the condition of considering uncertainty according to the pre-trained intelligent agent.
Drawings
FIG. 1 is a principle of implementation of an intelligent power grid operation mode adjustment method considering security constraints according to the present invention;
FIG. 2 is a flow chart of calculating a prize value according to the present invention;
FIG. 3 is an example of an automatic adjustment algorithm for a power grid operation mode based on maximum entropy reinforcement learning according to the present invention;
FIG. 4 is a process for agent training in an embodiment of the invention;
FIG. 5 illustrates a generator output adjustment process according to an embodiment of the present invention;
FIG. 6 is a comparison of reward values of agents for different exploration steps in accordance with an embodiment of the present invention;
FIG. 7 is a comparison of total control iterations of agents for different exploration steps in an embodiment of the present invention;
FIG. 8 is a comparison of the performance of an agent using constant and varying temperature coefficients in an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides an intelligent adjusting method of a power grid operation mode, which comprises the steps of firstly describing a problem formulated by the power grid operation mode as MDP (Markov Decision Process), collecting power grid flow information in the MDP to form a state space, and modeling a control target and constraint as a reward value when training a reinforcement learning agent. Secondly, a universal framework is adopted, so that various types of control actions, including generator active power adjustment, load transfer and the like, automatically adjust and consider the transmission line power under various fault working conditions.
The invention provides an intelligent adjusting method for a power grid operation mode, and an integral framework of the method is shown in figure 1, namely, the active power output of a selected generator in a power grid is adjusted in a specified power grid area to meet various safety requirements of the power grid operation mode, including a ground state and a fault working condition.
The method comprises the following steps:
solving a power grid model according to the current power grid data, and extracting power grid state data based on a calculation result;
based on the power grid state data, a generator control strategy is obtained through a pre-trained intelligent agent; the generator control strategy refers to an active power control signal of a generator;
and adjusting the power grid operation mode based on the optimal generator control strategy.
In the present invention, the agent is required to be trained in advance, and referring to fig. 1, the method includes:
step (1): transmitting the power grid model and the power grid operation mode file to a power grid simulation environment;
step (2): solving a power grid model based on historical power grid data (including power grid topological structures, bus information, load information, generator output information, transformer information and power grid control equipment states under different discontinuities) to obtain current power grid operation state data;
and (3): calculating an intelligent agent reward value by adopting a calculation result of the power grid model;
and (4): sending the reward value to an experience playback pool for storage;
and (5): extracting current power grid operation state data;
and (6): sending the current power grid operation state data to an experience playback pool for storage;
and (7): obtaining an intelligent agent control action, namely a generator control strategy, by adopting a maximum entropy intelligent agent based on the current power grid operation state data;
and (8): sending the control action of the agent to an experience playback pool for storage;
and (9): and sending the control action of the intelligent agent to a power grid simulation environment, storing the control action into a power grid operation mode file, applying the control action to a power grid model to obtain the next power grid operation state data, updating network parameters of the intelligent agent, and performing iterative loop calculation until the power grid operation control target is met.
The maximum entropy reinforcement learning algorithm with automatic temperature coefficient calculation is adopted, and feasible power grid operation conditions are automatically searched under the condition of considering uncertainty.
Specifically, solving the power grid model includes:
the grid model is represented as:
Figure BDA0003071703790000071
Figure BDA0003071703790000072
Figure BDA0003071703790000073
Figure BDA0003071703790000074
Figure BDA0003071703790000075
Figure BDA0003071703790000076
wherein the content of the first and second substances,
Figure BDA0003071703790000077
and
Figure BDA0003071703790000078
representing the active and reactive power output, P, of the generator n on the bus iij(y) and Qij(y) represents the active and reactive power from bus i to bus j, ViRepresenting the voltage amplitude of the bus i, B representing the bus set, g representing the generator, d representing the grid load, Pi gAnd
Figure BDA0003071703790000079
is the active power injection and reactive power injection of the generator on the bus i, Pi dAnd
Figure BDA00030717037900000710
is the load active and reactive power on the bus iThe power of the electric motor is controlled by the power controller,
Figure BDA0003071703790000081
and
Figure BDA0003071703790000082
is the active and reactive power of the load m on the bus i, GiIs a set of generators on bus i, DiIs the set of loads on the bus i, BiIs a busbar set, g, forming a branch with busbar iiIs the self-conductance of the busbar i, biIs the self susceptance of the bus i.
The grid model is subject to constraints that represent the physical limits of the various electrical devices, respectively, requiring that all line currents, generator outputs and voltage magnitudes be operated within their physical limits.
Figure BDA0003071703790000083
Figure BDA0003071703790000084
Figure BDA0003071703790000085
Figure BDA0003071703790000086
Wherein the content of the first and second substances,
Figure BDA0003071703790000087
and
Figure BDA0003071703790000088
the upper limit and the lower limit of the active power of the generator are shown,
Figure BDA0003071703790000089
and
Figure BDA00030717037900000810
representing the upper and lower reactive limits of the generator, G representing the generator set, Vi minAnd Vi maxRepresenting the upper and lower bus voltage amplitude limits,
Figure BDA00030717037900000811
is the apparent power ceiling, Ω, of the transmission lineLRepresents a set of transmission lines, ΩTRepresenting a set of transformers. Active power P of lineijAnd reactive power QijThe calculation is as follows:
Figure BDA00030717037900000812
Figure BDA00030717037900000813
wherein, gijIs the mutual conductance of bus i and bus j, θiIs the phase angle of the voltage of the bus i, the mutual conductance bijIs the mutual susceptance of the busbar i and the busbar j, bij0Is the link capacitor susceptance.
The reasonable operation mode of the power grid can consider various control targets, namely, the power generation cost and/or the transmission loss are/is reduced as much as possible while all the constraint conditions are met. The control target of minimizing the power generation cost is given by formula (11); the control objective of minimizing the grid loss is given by equation (12):
Figure BDA0003071703790000091
Figure BDA0003071703790000092
where C (k) is the cost of power generation for generator k, C represents the set of generators considering the cost of operation, Ploss(i, j) is the value of the active loss on line ij, KmaxIs the total number of generators, PlossIs the sum of the network losses.
In the present invention, a P-Q decomposition method, a Newton-Raphson method, a method of automatically converting P-Q into YR or a method of automatically converting P-Q into Newton-Raphson
In the present invention, the grid state data is represented as: and s is (P, V, G), wherein P represents a group of line active power in the research area, V represents the voltage amplitude of a bus in the same area, and G represents a vector of generator active power output.
The invention adopts a mode of training an agent by a maximum entropy reinforced learning algorithm (SAC) to search the optimal operation mode of the power grid, and the training reinforced learning agent can provide preventive and corrective control measures so as to ensure the safety of the power grid under various operation conditions.
Reinforcement learning is a branch of machine learning methods that involve an agent taking actions in turn and interacting with the environment in order to maximize accumulated returns. At each step t, the agent observes a state stPerforming a control action atAnd receives a prize value rt. The behavior of the agent is defined by the policy pi, obtained from p (a) ← S, which maps states to a probability distribution of control actions. It is usually modeled as an MDP tuple st,atP, r, which comprises a state space stMotion space atTransition probability p(s)t+1|st,at) The reward function r(s)t,at). It may be defined for the expected benefit from one state as a composite of discounted future prize values,
Figure BDA0003071703790000093
where gamma is the discount coefficient, at [0, 1%]In the meantime. The ultimate goal of training the reinforcement learning agent is to find a control strategy that maximizes the reward value.
In order to find a good policy, cost function based methods such as Q-learning can be used to measure how good an action is in a particular state, or policy based methods can be used to directly find out what control strategies should be taken in different states without knowing how good the actions are. However, the problems faced in the real world are extremely complex.
In the present invention, a maximum entropy reinforcement learning algorithm (SAC) was chosen that exhibits the most advanced performance in terms of both sample efficiency and stability, because of its unique ability to maximize the desired reward and entropy during training.
The objective function for calculating the Q-value is given in equation (13), θ and ψ represent the parameterized network for modeling the soft Q-value function and the control strategy, respectively, VψIs a function of the state value and alpha is a temperature parameter that determines the relative importance of the entropy term and the reward value, thereby controlling the randomness of the optimal policy.
Figure BDA0003071703790000101
Figure BDA0003071703790000102
The objective function of the strategy is given in equation (14), in the present invention, a normal distribution is used, the temperature coefficient α is fixed in the previous calculation, but training with a fixed temperature coefficient will make the performance of the agent unstable as the reward value changes, so it is preferable to have an automatic temperature coefficient, which can also change as the policy is updated to explore more action space. Thus, in the present invention, an average entropy constraint is added to the original objective function, while allowing the entropy to vary in different states. Thus, the new objective function is modified as follows:
Figure BDA0003071703790000103
wherein H0Is the desired minimum entropy value, and the loss function of the temperature coefficient is given by equation (16):
Figure BDA0003071703790000111
specifically, the training reinforcement learning agent updates the active power output of the generator, including:
specifically, the reward value is a feedback of the performance of the intelligent agent in each control iteration, and a well-designed reward value can not only guide the intelligent agent to update the neural network parameters in a more effective direction, but also accelerate the whole training process. Our control objective is to minimize the change of active power in emergency, guarantee the grid safety, and prevent potential line power overload problem. The considered fault refers to a transmission line fault in the grid, which means that the controlled area must be able to maintain the ground state and the safety and reliability after an N-1 fault.
The reward value function is then defined as the sum of the fault reward and the base reward:
r=rcon+rbase
wherein r isconIndicating a fault reward value, rbaseRepresenting the base state prize value.
The failure reward value is calculated as:
Figure BDA0003071703790000112
wherein, PfromAnd PtoIs a measure of the active power at the head and tail ends of the transmission line, PlimitIs the active upper limit of the line, representing the thermal limit or stability limit, a and b are the reward value coefficients, respectively, and N is the total number of lines. When the function has N-1 faults, the out-of-limit degree of the residual N-1 lines of power in the controlled area.
The base state reward value is calculated as:
Figure BDA0003071703790000113
all variables in this function are the same as the variables defined in the fault reward function. The only difference is that the condition that the line power is out of limit is checked on the premise that the calculation of the basic state reward value ensures that the current topological structure is not changed. The calculation flow of the prize value is shown in fig. 2. Firstly, inputting a power grid operation mode model and data, and specifying a fault set (L); secondly, fault analysis is carried out on each line in the fault set L, and circulation is carried out until all fault lines are scanned. In each fault analysis, a power flow equation is solved, and when the power flow solution is not converged, a mode of replacing a numerical solution method or a mode of removing an electric island can be adopted to ensure the convergence of a power flow result. And thirdly, calculating the reward value according to the load flow calculation result. The accumulated prize value is finally output.
Specifically, the grid operating state is defined as: and s is (P, V, G), wherein P represents a group of line active power in the research area, V represents the voltage amplitude of a bus in the same area, and G represents a vector of generator active power output. In order to keep consistency of input and output of different types when the intelligent agent is trained, normalized output is carried out on P, V and G.
Fig. 3 is an implementation process of the SAC power flow control algorithm of the present invention. Lines 1-3 initialize the weights of the policy network, the Q network, and the target network. Line 4 sets the playback buffer size. Lines 5-29 show the training process for each complete sample. Line 6 resets s at the beginning of the sampletThe initial state of (1). Line 9 extracts the control action a from the distribution of the current policy calculationt. In the load flow calculation, the part with the largest calculation amount is the load flow calculation under the condition of considering a large number of faults, and in order to reduce the calculation amount while maintaining the key characteristics, only an N-1 fault set causing line out-of-limit in the ground state is calculated. Lines 12-15 show the method of calculating the fault reward value. Lines 16-19, show that only L is consideredcThe prize value of (1). Lines 20-21, the base state prize value is superimposed into the total prize value. Line 22, collect the next state s from the environmentt+1The information of (1). Line 23, the MDP tuple collected from the current step is sent to the replay buffer. Lines 24-29 are used to update the policy and q-value based network in equations (13), (14) and (16)。
After a control strategy is successfully found, if a reduced N-1 fault set is used to save computational resources, security problems may still occur for other faulty lines that are ignored in the computation process, thus adding the step of looking at all faulty lines in rows 30-34 to ensure system security.
The invention also provides a device for automatically adjusting the operation mode of the power grid based on maximum entropy reinforcement learning, which comprises the following components:
the power grid simulation environment module is used for solving a power grid load flow equation and updating power grid running state data in each interaction step;
the training process module is used for training the reinforcement learning agent based on the power grid state data and outputting a generator control strategy;
the using process module is used for adjusting the power grid flow under the condition that the generator control strategy meets the safety requirement; the safety requirements include a ground state and a fault condition.
Specifically, the power grid simulation environment module comprises a power flow solver and an environment assembly;
the environment component is used for updating and storing the power grid operation state data; and the power grid operation state data is stored in the power grid operation mode file.
Specifically, the environmental component updates the generator control strategy into the grid operating mode file.
The invention stores the power grid operation state data in BPA format.
The power flow solver is used for solving a power grid power flow equation based on the latest power grid operation state data, and comprises four different numerical methods, namely a P-Q decomposition method, a Newton-Raphson method, a P-Q automatic conversion method into a YR method and a P-Q automatic conversion method into a Newton-Raphson method.
Specifically, the training process module comprises a state extraction module, a reward value calculation module, an agent update module and an experience playback pool:
the state extraction module is used for capturing power grid operation state data and storing the data into an experience playback pool;
the reward value calculation module calculates a reward value according to the base state and the N-1 fault based on the output result of the load flow solver, and stores the reward value into an experience playback pool;
the intelligent agent updating module is used for giving a control action, namely a generator control strategy, by adopting a maximum entropy intelligent agent based on the power grid running state; and updating the network parameters based on the reward value, the current grid operating state, the generator control strategy and the next grid operating state.
When the empirical playback pool stores information beyond capacity, the old data is deleted from the buffer.
Specifically, the use process module is used for inputting the current power grid operation state data into the trained intelligent agent to obtain an optimal generator control strategy and adjust the power grid operation mode.
The present invention accordingly also provides a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform the method.
The invention accordingly also provides a computing device comprising one or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing the method.
Examples
In order to verify the effectiveness of the method, a power grid operation mode model is adopted to generate a future operation scheme by taking a high-voltage power grid (220kv +) model in east China as an object. The model includes 6500 bus bars, 600 generators, 6000 lines and 4300 transformers. The grid planning model is provided in BPA format. And selecting Jinhua subareas of the Zhejiang power grid to carry out individual case research. The "Jinhua" grid has about 200 buses and 170 transmission lines in total.
To train an effective reinforcement learning agent, the state space includes the main transmission line power of the "Jinhua" partition. The control space is formed by adjusting the active power output of all available generators near Jinhua. The control objective of training the SAC agent is to achieve a safe mode of operation under ground state and N-1 failures. In a power grid simulation environment, an alternating current power flow equation solving software program supporting BPA format data is used for continuously interacting with a reinforcement learning intelligent agent in a training process. The inputs to the environment are stored in a trend file (BPA format) which contains the control actions. The outputs of the environment, including bus voltage, generator output, line power, etc., are stored in the text file, and then the parser extracts its key information, updating the state space, action space and reward value.
The method is verified by taking the operating mode of the 1-month-east China Power grid in 2019 as an example, and the performances of SAC agents trained by using different parameters are compared in detail in FIGS. 4 to 8.
The overall performance of the SAC agent under consideration of N-1 failures is given in fig. 4, and as can be seen from the average reward and training step curve, the SAC agent converges successfully after 60 samples. The simulation result verifies the effectiveness of the method, and a large amount of manual operation time can be saved. Fig. 5 shows the active variation trace of 7 generator sets used by the agent, and it can be seen that the SAC agent can find the optimal control strategy for adjusting the generator power with only three steps, and the variation is reasonably adjusted around the original value, which is in accordance with the experience of engineers, i.e. the minimum variation is used to adjust the output of the generator to solve the safety problem.
Meanwhile, the embodiment also researches the influence of different parameters on the performance of the SAC intelligent agent. Two parameters are the exploration step and the temperature coefficient. Referring to fig. 6 and 7, a small search was performed for the different exploration steps, showing how many steps the agent randomly explored during the initial stages of training. We compared three different exploration steps, 10, 20 and 30, for a total of 100 samples. Referring to fig. 7, although the three different exploration steps are all oscillatory at the beginning of the training, they all converge around 90 samples. This means that the ability of the agent to find the optimal control strategy is not affected when the exploration procedure is varied over a small range. When the exploration step size is set to 30, convergence is faster, indicating that more knowledge of the environment helps the agent to reach the target state faster.
Furthermore, the temperature coefficient may affect the performance and training speed of SAC agents. Fixing the parameter value may significantly slow or compromise the convergence of the SAC agent. Referring to fig. 8, SAC agent performance converges significantly faster when the temperature coefficient can be adjusted automatically with the training process.
The invention provides a power grid operation mode intelligent adjustment method considering safety constraints, which adopts a SAC algorithm with automatic temperature coefficient calculation, and the simulation result verifies the effectiveness of the method, so that the framework can automatically search feasible power grid operation conditions under the condition of considering uncertainty.
It is to be noted that the apparatus embodiment corresponds to the method embodiment, and the implementation manners of the method embodiment are all applicable to the apparatus embodiment and can achieve the same or similar technical effects, so that the details are not described herein.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A power grid operation mode intelligent regulation method is characterized by comprising the following steps:
acquiring current power grid data, inputting the current power grid data into a predetermined power grid model considering safety constraints for power grid load flow calculation, and extracting power grid operation state data according to a calculation result, wherein the current power grid data comprises a power grid topological structure, bus information, load information, generator output information, transformer information and a power grid control equipment state;
inputting power grid operation state data into a pre-trained intelligent agent to obtain an optimal generator control strategy, wherein the generator control strategy is a generator active power control signal;
and adjusting the power grid operation mode according to the optimal generator control strategy.
2. The intelligent power grid operation mode adjustment method according to claim 1, wherein the power grid model is:
Figure FDA0003071703780000011
Figure FDA0003071703780000012
Figure FDA0003071703780000013
Figure FDA0003071703780000014
Figure FDA0003071703780000015
Figure FDA0003071703780000016
wherein the content of the first and second substances,
Figure FDA0003071703780000017
and
Figure FDA0003071703780000018
representing the active power of generator n on bus iAnd reactive power output, Pij(y) and Qij(y) represents the active and reactive power from bus i to bus j, ViRepresenting the voltage amplitude of the bus i, B representing the bus set, the superscript g representing the generator, the superscript d representing the grid load, Pi gAnd
Figure FDA0003071703780000019
is the active power injection and reactive power injection of the generator on the bus i, Pi dAnd
Figure FDA00030717037800000110
is the load active and reactive power on bus i,
Figure FDA00030717037800000111
and
Figure FDA00030717037800000112
is the active and reactive power of the load m on the bus i, GiIs a set of generators on bus i, DiIs the set of loads on the bus i, BiIs a busbar set, g, forming a branch with busbar iiIs the self-conductance of the busbar i, biIs the self-susceptance of the bus i;
the constraint conditions of the safety constraint are as follows:
Figure FDA0003071703780000021
Figure FDA0003071703780000022
Vi min≤Vi≤Vi max,i∈B
Figure FDA0003071703780000023
Pij(y)=gijVi 2-ViVj(gijcos(θij)+bijsin(θij)),(i,j)∈ΩL
Qij(y)=-Vi 2(bij0+bij)-ViVj(gijsin(θij)-bijcos(θij)),(i,j)∈ΩL
wherein the content of the first and second substances,
Figure FDA0003071703780000024
and
Figure FDA0003071703780000025
the upper limit and the lower limit of the active power of the generator are shown,
Figure FDA0003071703780000026
and
Figure FDA0003071703780000027
representing the upper and lower reactive limits of the generator, G representing the generator set, Vi minAnd Vi maxRepresenting the upper and lower bus voltage amplitude limits,
Figure FDA0003071703780000028
is the apparent power ceiling, Ω, of the transmission lineLRepresents a set of transmission lines, ΩTRepresents a set of transformers, gijIs the mutual conductance of bus i and bus j, θiIs the phase angle of the voltage of the bus i, thetajRepresenting the phase angle of the voltage of bus j, bijIs the mutual susceptance of the busbar i and the busbar j, bij0Is a contact line capacitor susceptance, PijAnd QijRespectively representing active and reactive power, V, on line ijiAnd VjRepresenting the voltage amplitude of bus i and bus j, respectively.
3. The method according to claim 2, wherein the training process of the agent comprises:
obtaining historical power grid data, wherein the historical power grid data comprises a power grid topological structure, bus information, load information, generator output information, transformer information and a power grid control equipment state under different discontinuities, inputting the historical power grid data into the power grid model, obtaining power grid operation state data under corresponding time fractures and calculating an intelligent agent reward value;
the method comprises the steps that power grid operation state data under a certain time interval are used as input, a maximum entropy intelligent agent reinforcement learning algorithm is adopted, and intelligent agent control actions are obtained and are a generator control strategy;
inputting the obtained generator control strategy into the power grid model for calculation, and extracting power grid running state data under the next intermittent section according to a calculation result;
updating network parameters of the intelligent agent based on the power grid running state data under the time section, the reward value of the intelligent agent, the control action of the intelligent agent and the power grid running state data under the next intermittent section;
and (4) iterative loop calculation until the power of the transmission line in the controlled area does not exceed the safety limit under the basic state and the fault state of the power grid, and outputting the trained intelligent agent.
4. The intelligent power grid operation mode adjusting method according to claim 3, wherein the power grid model is solved by a P-Q decomposition method, a Newton-Raphson method, a method of automatically converting P-Q into YR, or a method of automatically converting P-Q into Newton-Raphson.
5. The intelligent regulation method of grid operation mode according to claim 3, wherein the grid operation state data is expressed as:
s=(P,V,G)
wherein, P represents a group of line active power in a research area, V represents the voltage amplitude of a bus in the same area, and G represents the vector of the generator active power output.
6. The method of claim 3, wherein calculating the agent reward value comprises:
r=rcon+rbase
Figure FDA0003071703780000031
Figure FDA0003071703780000032
where r represents the prize value, rconIndicating a fault reward value, rbaseRepresenting the base state prize value, PfromAnd PtoIs a measure of the active power at the head and tail ends of the transmission line, PlimitIs the active upper limit of the line, a and b are respectively reward value coefficients, N is the total number of the line, k represents a line counter in a grid reference state, and l represents a counter in a grid fault state.
7. The intelligent power grid operation mode adjustment method according to claim 3, wherein the power grid operation control targets are:
Figure FDA0003071703780000041
Figure FDA0003071703780000042
where C (v) is the cost of electricity generated by generator v, C represents the set of generators considering the cost of operation, Ploss(i, j) is the value of the active loss on line ij, KmaxMin represents the minimum value for the total number of generators.
8. The utility model provides a power grid operation mode intelligent regulation system which characterized in that includes:
the acquisition module is used for acquiring current power grid data, inputting the current power grid data into a predetermined power grid model considering safety constraints for power grid load flow calculation, and extracting power grid operation state data according to a calculation result, wherein the current power grid data comprises a power grid topological structure, bus information, load information, generator output information, transformer information and a power grid control equipment state;
the processing module is used for inputting the power grid operation state data into a pre-trained intelligent agent to obtain an optimal generator control strategy, wherein the generator control strategy is a generator active power control signal;
and the adjusting module is used for adjusting the power grid operation mode according to the optimal generator control strategy.
9. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-8.
10. A computing device, comprising,
one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods of claims 1-8.
CN202110541404.1A 2021-05-18 2021-05-18 Intelligent power grid operation mode adjusting method and system Pending CN113315131A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110541404.1A CN113315131A (en) 2021-05-18 2021-05-18 Intelligent power grid operation mode adjusting method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110541404.1A CN113315131A (en) 2021-05-18 2021-05-18 Intelligent power grid operation mode adjusting method and system

Publications (1)

Publication Number Publication Date
CN113315131A true CN113315131A (en) 2021-08-27

Family

ID=77373499

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110541404.1A Pending CN113315131A (en) 2021-05-18 2021-05-18 Intelligent power grid operation mode adjusting method and system

Country Status (1)

Country Link
CN (1) CN113315131A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113725853A (en) * 2021-08-30 2021-11-30 浙江大学 Power grid topology control method and system based on active person in-loop reinforcement learning
CN115360772A (en) * 2022-03-23 2022-11-18 中国电力科学研究院有限公司 Power system active safety correction control method, system, equipment and storage medium
CN117526443A (en) * 2023-11-07 2024-02-06 北京清电科技有限公司 Novel power system-based power distribution network optimization regulation and control method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523737A (en) * 2020-05-29 2020-08-11 四川大学 Automatic optimization-approaching adjusting method for operation mode of electric power system driven by deep Q network
CN112615379A (en) * 2020-12-10 2021-04-06 浙江大学 Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523737A (en) * 2020-05-29 2020-08-11 四川大学 Automatic optimization-approaching adjusting method for operation mode of electric power system driven by deep Q network
CN112615379A (en) * 2020-12-10 2021-04-06 浙江大学 Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIUMIN SHANG: "Reinforcement Learning-Based Solution to Power Grid Planning and Operation Under Uncertainties", 《2020 IEEE/ACM WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC) AND WORKSHOP ON ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR SCIENTIFIC APPLICATIONS (AI4S)》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113725853A (en) * 2021-08-30 2021-11-30 浙江大学 Power grid topology control method and system based on active person in-loop reinforcement learning
CN115360772A (en) * 2022-03-23 2022-11-18 中国电力科学研究院有限公司 Power system active safety correction control method, system, equipment and storage medium
CN115360772B (en) * 2022-03-23 2023-08-15 中国电力科学研究院有限公司 Active safety correction control method, system, equipment and storage medium for power system
CN117526443A (en) * 2023-11-07 2024-02-06 北京清电科技有限公司 Novel power system-based power distribution network optimization regulation and control method and system
CN117526443B (en) * 2023-11-07 2024-04-26 北京清电科技有限公司 Power system-based power distribution network optimization regulation and control method and system

Similar Documents

Publication Publication Date Title
CN113315131A (en) Intelligent power grid operation mode adjusting method and system
CN109149638B (en) Distributed coordination voltage control method and system for VSC-HVDC grid-connected wind power plant based on MPC and ADMM algorithm
US20200119556A1 (en) Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency
US11336092B2 (en) Multi-objective real-time power flow control method using soft actor-critic
CN111666713B (en) Power grid reactive voltage control model training method and system
CN103810646B (en) Improved projection integral algorithm based active power distribution system dynamic simulation method
CN111864743B (en) Construction method of power grid dispatching control model and power grid dispatching control method
CN113159341A (en) Power distribution network aid decision-making method and system integrating deep reinforcement learning and expert experience
CN104769802A (en) Method for the computer-aided control of the power in an electrical grid
Zhang et al. Deep reinforcement learning for load shedding against short-term voltage instability in large power systems
Zhang et al. Real-time autonomous line flow control using proximal policy optimization
CN113097994A (en) Power grid operation mode adjusting method and device based on multiple reinforcement learning agents
Lin et al. Real-time power system generator tripping control based on deep reinforcement learning
US20240006890A1 (en) Local volt/var controllers with stability guarantees
CN105207255B (en) A kind of power system peak regulation computational methods suitable for wind power output
CN116031957A (en) Large-scale wind farm voltage and frequency recovery control method
Mony et al. Gaussian quantum particle swarm optimization-based wide-area power system stabilizer for damping inter-area oscillations
CN113610262B (en) Method and device for coordination optimization of power distribution network based on Benders decomposition
CN111682552B (en) Data-driven reactive voltage control method, device, equipment and storage medium
CN115051360A (en) Online computing method and device for operation risk of electric power system based on integrated knowledge migration
Khooban et al. Modeling and HiL real-time simulation for the secondary LFC in time-delay shipboard microgrids
CN113013913B (en) Reactive voltage control system and method for wind farm
Liu et al. Optimization of emergency load shedding following HVDC blocking of receiving-end power systems based on PSO
Zhu et al. Mitigating multi-stage cascading failure by reinforcement learning
Li et al. Deep reinforcement learning-based fast prediction of strategies for security control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210827