CN113315131A - Intelligent power grid operation mode adjusting method and system - Google Patents
Intelligent power grid operation mode adjusting method and system Download PDFInfo
- Publication number
- CN113315131A CN113315131A CN202110541404.1A CN202110541404A CN113315131A CN 113315131 A CN113315131 A CN 113315131A CN 202110541404 A CN202110541404 A CN 202110541404A CN 113315131 A CN113315131 A CN 113315131A
- Authority
- CN
- China
- Prior art keywords
- power grid
- bus
- generator
- power
- active
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 77
- 238000011217 control strategy Methods 0.000 claims abstract description 35
- 238000004364 calculation method Methods 0.000 claims abstract description 33
- 239000003795 chemical substances by application Substances 0.000 claims description 67
- 238000012549 training Methods 0.000 claims description 22
- 230000009471 action Effects 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 19
- 230000005540 biological transmission Effects 0.000 claims description 16
- 230000002787 reinforcement Effects 0.000 claims description 13
- 238000003860 storage Methods 0.000 claims description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 238000002347 injection Methods 0.000 claims description 6
- 239000007924 injection Substances 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 238000011160 research Methods 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 6
- 238000002940 Newton-Raphson method Methods 0.000 claims description 5
- 238000000354 decomposition reaction Methods 0.000 claims description 4
- 239000003990 capacitor Substances 0.000 claims description 3
- 230000005611 electricity Effects 0.000 claims description 2
- 230000008901 benefit Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 17
- 238000010586 diagram Methods 0.000 description 8
- 238000004088 simulation Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 239000000243 solution Substances 0.000 description 6
- 230000005283 ground state Effects 0.000 description 5
- 238000009826 distribution Methods 0.000 description 3
- 238000010248 power generation Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000003534 oscillatory effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000035515 penetration Effects 0.000 description 1
- 230000003449 preventive effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/04—Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
- H02J3/06—Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/38—Arrangements for parallely feeding a single network by two or more generators, converters or transformers
- H02J3/46—Controlling of the sharing of output between the generators, converters, or transformers
- H02J3/48—Controlling the sharing of the in-phase component
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/10—Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/20—Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
Landscapes
- Engineering & Computer Science (AREA)
- Power Engineering (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention discloses a method and a system for intelligently adjusting the operation mode of a power grid, wherein the method comprises the following steps: acquiring current power grid data, inputting the current power grid data into a predetermined power grid model considering safety constraints for calculation, and extracting power grid operation state data according to a calculation result; inputting power grid operation state data into a pre-trained intelligent agent to obtain an optimal generator control strategy, wherein the generator control strategy is a generator active power control signal; and adjusting the power grid operation mode according to the optimal generator control strategy. The advantages are that: the invention utilizes the pre-trained intelligent agent to automatically search for feasible power grid operation conditions under the condition of considering uncertainty.
Description
Technical Field
The invention relates to an intelligent adjusting method and system for a power grid operation mode, and belongs to the technical field of power grid regulation and control.
Background
In recent years, energy and environment policies and standards greatly promote the rapid development of green energy, and the penetration ratio of renewable energy in a power grid is continuously improved. However, due to their intermittency, dynamics, and randomness, the access of large amounts of such energy into the grid presents significant challenges to the safe and economic operation of the power system. In the existing method, the future power grid operation mode is formulated by large-scale numerical simulation to find the optimal power grid operation mode under various fault conditions. The process includes power demand forecasting, new line construction planning, maintenance and outage planning, generator set planning, and the like. Due to the high complexity, non-linearity and dimensionality of the problem, this process typically requires a significant amount of human effort to achieve the desired goal empirically by manually modifying the model parameters. The power industry is lacking an effective method and tool to automate this process.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an intelligent power grid operation mode adjusting method and system.
In order to solve the technical problem, the invention provides an intelligent adjustment method for a power grid operation mode, which comprises the following steps:
acquiring current power grid data, inputting the current power grid data into a predetermined power grid model considering safety constraints for power grid load flow calculation, and extracting power grid operation state data according to a calculation result, wherein the current power grid data comprises a power grid topological structure, bus information, load information, generator output information, transformer information and a power grid control equipment state;
inputting power grid operation state data into a pre-trained intelligent agent to obtain an optimal generator control strategy, wherein the generator control strategy is a generator active power control signal;
and adjusting the power grid operation mode according to the optimal generator control strategy.
Further, the power grid model is as follows:
wherein the content of the first and second substances,andrepresenting the active and reactive power output, P, of the generator n on the bus iij(y) and Qij(y) represents the active and reactive power from bus i to bus j, ViRepresenting the voltage amplitude of the bus i, B representing the bus set, the superscript g representing the generator, the superscript d representing the grid load, Pi gAndis the active power injection and reactive power injection of the generator on the bus i, Pi dAndis the load active and reactive power on bus i,andis the active and reactive power of the load m on the bus i, GiIs a set of generators on bus i, DiIs the set of loads on the bus i, BiIs a busbar set, g, forming a branch with busbar iiIs the self-conductance of the busbar i, biIs the self-susceptance of the bus i;
the constraint conditions of the safety constraint are as follows:
wherein the content of the first and second substances,andthe upper limit and the lower limit of the active power of the generator are shown,andrepresenting the upper and lower reactive limits of the generator, G representing the generator set, Vi minAnd Vi maxRepresenting the upper and lower bus voltage amplitude limits,is the apparent power ceiling, Ω, of the transmission lineLRepresents a set of transmission lines, ΩTRepresents a set of transformers, gijIs the mutual conductance of bus i and bus j, θiIs the phase angle of the voltage of the bus i, thetajRepresenting the phase angle of the voltage of bus j, bijIs the mutual susceptance of the busbar i and the busbar j, bij0Is a contact line capacitor susceptance, PijAnd QijRespectively representing active and reactive power, V, on line ijiAnd VjRepresenting the voltage amplitude of bus i and bus j, respectively.
Further, the training process of the agent comprises:
obtaining historical power grid data, wherein the historical power grid data comprises a power grid topological structure, bus information, load information, generator output information, transformer information and a power grid control equipment state under different discontinuities, inputting the historical power grid data into the power grid model, obtaining power grid operation state data under corresponding time fractures and calculating an intelligent agent reward value;
the method comprises the steps that power grid operation state data under a certain time interval are used as input, a maximum entropy intelligent agent reinforcement learning algorithm is adopted, and intelligent agent control actions are obtained and are a generator control strategy;
inputting the obtained generator control strategy into the power grid model for calculation, and extracting power grid running state data under the next intermittent section according to a calculation result;
updating network parameters of the intelligent agent based on the power grid running state data under the time section, the reward value of the intelligent agent, the control action of the intelligent agent and the power grid running state data under the next intermittent section;
and (4) iterative loop calculation until the power of the transmission line in the controlled area does not exceed the safety limit under the basic state and the fault state of the power grid, and outputting the trained intelligent agent.
Further, a P-Q decomposition method, a Newton-Raphson method, a method for automatically converting P-Q into YR or a method for automatically converting P-Q into Newton-Raphson are adopted to solve the power grid model.
Further, the grid operating state data is expressed as:
s=(P,V,G)
wherein, P represents a group of line active power in a research area, V represents the voltage amplitude of a bus in the same area, and G represents the vector of the generator active power output.
Further, the calculating of the agent prize value comprises:
r=rcon+rbase
where r represents the prize value, rconIndicating a fault reward value, rbaseRepresenting the base state prize value, PfromAnd PtoIs a measure of the active power at the head and tail ends of the transmission line, PlimitIs the active upper limit of the line, a and b are respectively reward value coefficients, N is the total number of the line, k represents a line counter in a grid reference state, and l represents a counter in a grid fault state.
Further, the grid operation control targets are as follows:
where C (v) is the cost of electricity generated by generator v, C represents the set of generators considering the cost of operation, Ploss(i, j) is the value of the active loss on line ij, KmaxMin represents the minimum value for the total number of generators.
An intelligent regulation system for the operation mode of a power grid comprises:
the acquisition module is used for acquiring current power grid data, inputting the current power grid data into a predetermined power grid model considering safety constraints for power grid load flow calculation, and extracting power grid operation state data according to a calculation result, wherein the current power grid data comprises a power grid topological structure, bus information, load information, generator output information, transformer information and a power grid control equipment state;
the processing module is used for inputting the power grid operation state data into a pre-trained intelligent agent to obtain an optimal generator control strategy, wherein the generator control strategy is a generator active power control signal;
and the adjusting module is used for adjusting the power grid operation mode according to the optimal generator control strategy.
A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods.
A computing device, comprising, in combination,
one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods.
The invention achieves the following beneficial effects:
according to the invention, the feasible power grid operation condition is automatically searched under the condition of considering uncertainty according to the pre-trained intelligent agent.
Drawings
FIG. 1 is a principle of implementation of an intelligent power grid operation mode adjustment method considering security constraints according to the present invention;
FIG. 2 is a flow chart of calculating a prize value according to the present invention;
FIG. 3 is an example of an automatic adjustment algorithm for a power grid operation mode based on maximum entropy reinforcement learning according to the present invention;
FIG. 4 is a process for agent training in an embodiment of the invention;
FIG. 5 illustrates a generator output adjustment process according to an embodiment of the present invention;
FIG. 6 is a comparison of reward values of agents for different exploration steps in accordance with an embodiment of the present invention;
FIG. 7 is a comparison of total control iterations of agents for different exploration steps in an embodiment of the present invention;
FIG. 8 is a comparison of the performance of an agent using constant and varying temperature coefficients in an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides an intelligent adjusting method of a power grid operation mode, which comprises the steps of firstly describing a problem formulated by the power grid operation mode as MDP (Markov Decision Process), collecting power grid flow information in the MDP to form a state space, and modeling a control target and constraint as a reward value when training a reinforcement learning agent. Secondly, a universal framework is adopted, so that various types of control actions, including generator active power adjustment, load transfer and the like, automatically adjust and consider the transmission line power under various fault working conditions.
The invention provides an intelligent adjusting method for a power grid operation mode, and an integral framework of the method is shown in figure 1, namely, the active power output of a selected generator in a power grid is adjusted in a specified power grid area to meet various safety requirements of the power grid operation mode, including a ground state and a fault working condition.
The method comprises the following steps:
solving a power grid model according to the current power grid data, and extracting power grid state data based on a calculation result;
based on the power grid state data, a generator control strategy is obtained through a pre-trained intelligent agent; the generator control strategy refers to an active power control signal of a generator;
and adjusting the power grid operation mode based on the optimal generator control strategy.
In the present invention, the agent is required to be trained in advance, and referring to fig. 1, the method includes:
step (1): transmitting the power grid model and the power grid operation mode file to a power grid simulation environment;
step (2): solving a power grid model based on historical power grid data (including power grid topological structures, bus information, load information, generator output information, transformer information and power grid control equipment states under different discontinuities) to obtain current power grid operation state data;
and (3): calculating an intelligent agent reward value by adopting a calculation result of the power grid model;
and (4): sending the reward value to an experience playback pool for storage;
and (5): extracting current power grid operation state data;
and (6): sending the current power grid operation state data to an experience playback pool for storage;
and (7): obtaining an intelligent agent control action, namely a generator control strategy, by adopting a maximum entropy intelligent agent based on the current power grid operation state data;
and (8): sending the control action of the agent to an experience playback pool for storage;
and (9): and sending the control action of the intelligent agent to a power grid simulation environment, storing the control action into a power grid operation mode file, applying the control action to a power grid model to obtain the next power grid operation state data, updating network parameters of the intelligent agent, and performing iterative loop calculation until the power grid operation control target is met.
The maximum entropy reinforcement learning algorithm with automatic temperature coefficient calculation is adopted, and feasible power grid operation conditions are automatically searched under the condition of considering uncertainty.
Specifically, solving the power grid model includes:
the grid model is represented as:
wherein the content of the first and second substances,andrepresenting the active and reactive power output, P, of the generator n on the bus iij(y) and Qij(y) represents the active and reactive power from bus i to bus j, ViRepresenting the voltage amplitude of the bus i, B representing the bus set, g representing the generator, d representing the grid load, Pi gAndis the active power injection and reactive power injection of the generator on the bus i, Pi dAndis the load active and reactive power on the bus iThe power of the electric motor is controlled by the power controller,andis the active and reactive power of the load m on the bus i, GiIs a set of generators on bus i, DiIs the set of loads on the bus i, BiIs a busbar set, g, forming a branch with busbar iiIs the self-conductance of the busbar i, biIs the self susceptance of the bus i.
The grid model is subject to constraints that represent the physical limits of the various electrical devices, respectively, requiring that all line currents, generator outputs and voltage magnitudes be operated within their physical limits.
Wherein the content of the first and second substances,andthe upper limit and the lower limit of the active power of the generator are shown,andrepresenting the upper and lower reactive limits of the generator, G representing the generator set, Vi minAnd Vi maxRepresenting the upper and lower bus voltage amplitude limits,is the apparent power ceiling, Ω, of the transmission lineLRepresents a set of transmission lines, ΩTRepresenting a set of transformers. Active power P of lineijAnd reactive power QijThe calculation is as follows:
wherein, gijIs the mutual conductance of bus i and bus j, θiIs the phase angle of the voltage of the bus i, the mutual conductance bijIs the mutual susceptance of the busbar i and the busbar j, bij0Is the link capacitor susceptance.
The reasonable operation mode of the power grid can consider various control targets, namely, the power generation cost and/or the transmission loss are/is reduced as much as possible while all the constraint conditions are met. The control target of minimizing the power generation cost is given by formula (11); the control objective of minimizing the grid loss is given by equation (12):
where C (k) is the cost of power generation for generator k, C represents the set of generators considering the cost of operation, Ploss(i, j) is the value of the active loss on line ij, KmaxIs the total number of generators, PlossIs the sum of the network losses.
In the present invention, a P-Q decomposition method, a Newton-Raphson method, a method of automatically converting P-Q into YR or a method of automatically converting P-Q into Newton-Raphson
In the present invention, the grid state data is represented as: and s is (P, V, G), wherein P represents a group of line active power in the research area, V represents the voltage amplitude of a bus in the same area, and G represents a vector of generator active power output.
The invention adopts a mode of training an agent by a maximum entropy reinforced learning algorithm (SAC) to search the optimal operation mode of the power grid, and the training reinforced learning agent can provide preventive and corrective control measures so as to ensure the safety of the power grid under various operation conditions.
Reinforcement learning is a branch of machine learning methods that involve an agent taking actions in turn and interacting with the environment in order to maximize accumulated returns. At each step t, the agent observes a state stPerforming a control action atAnd receives a prize value rt. The behavior of the agent is defined by the policy pi, obtained from p (a) ← S, which maps states to a probability distribution of control actions. It is usually modeled as an MDP tuple st,atP, r, which comprises a state space stMotion space atTransition probability p(s)t+1|st,at) The reward function r(s)t,at). It may be defined for the expected benefit from one state as a composite of discounted future prize values,where gamma is the discount coefficient, at [0, 1%]In the meantime. The ultimate goal of training the reinforcement learning agent is to find a control strategy that maximizes the reward value.
In order to find a good policy, cost function based methods such as Q-learning can be used to measure how good an action is in a particular state, or policy based methods can be used to directly find out what control strategies should be taken in different states without knowing how good the actions are. However, the problems faced in the real world are extremely complex.
In the present invention, a maximum entropy reinforcement learning algorithm (SAC) was chosen that exhibits the most advanced performance in terms of both sample efficiency and stability, because of its unique ability to maximize the desired reward and entropy during training.
The objective function for calculating the Q-value is given in equation (13), θ and ψ represent the parameterized network for modeling the soft Q-value function and the control strategy, respectively, VψIs a function of the state value and alpha is a temperature parameter that determines the relative importance of the entropy term and the reward value, thereby controlling the randomness of the optimal policy.
The objective function of the strategy is given in equation (14), in the present invention, a normal distribution is used, the temperature coefficient α is fixed in the previous calculation, but training with a fixed temperature coefficient will make the performance of the agent unstable as the reward value changes, so it is preferable to have an automatic temperature coefficient, which can also change as the policy is updated to explore more action space. Thus, in the present invention, an average entropy constraint is added to the original objective function, while allowing the entropy to vary in different states. Thus, the new objective function is modified as follows:
wherein H0Is the desired minimum entropy value, and the loss function of the temperature coefficient is given by equation (16):
specifically, the training reinforcement learning agent updates the active power output of the generator, including:
specifically, the reward value is a feedback of the performance of the intelligent agent in each control iteration, and a well-designed reward value can not only guide the intelligent agent to update the neural network parameters in a more effective direction, but also accelerate the whole training process. Our control objective is to minimize the change of active power in emergency, guarantee the grid safety, and prevent potential line power overload problem. The considered fault refers to a transmission line fault in the grid, which means that the controlled area must be able to maintain the ground state and the safety and reliability after an N-1 fault.
The reward value function is then defined as the sum of the fault reward and the base reward:
r=rcon+rbase
wherein r isconIndicating a fault reward value, rbaseRepresenting the base state prize value.
The failure reward value is calculated as:
wherein, PfromAnd PtoIs a measure of the active power at the head and tail ends of the transmission line, PlimitIs the active upper limit of the line, representing the thermal limit or stability limit, a and b are the reward value coefficients, respectively, and N is the total number of lines. When the function has N-1 faults, the out-of-limit degree of the residual N-1 lines of power in the controlled area.
The base state reward value is calculated as:
all variables in this function are the same as the variables defined in the fault reward function. The only difference is that the condition that the line power is out of limit is checked on the premise that the calculation of the basic state reward value ensures that the current topological structure is not changed. The calculation flow of the prize value is shown in fig. 2. Firstly, inputting a power grid operation mode model and data, and specifying a fault set (L); secondly, fault analysis is carried out on each line in the fault set L, and circulation is carried out until all fault lines are scanned. In each fault analysis, a power flow equation is solved, and when the power flow solution is not converged, a mode of replacing a numerical solution method or a mode of removing an electric island can be adopted to ensure the convergence of a power flow result. And thirdly, calculating the reward value according to the load flow calculation result. The accumulated prize value is finally output.
Specifically, the grid operating state is defined as: and s is (P, V, G), wherein P represents a group of line active power in the research area, V represents the voltage amplitude of a bus in the same area, and G represents a vector of generator active power output. In order to keep consistency of input and output of different types when the intelligent agent is trained, normalized output is carried out on P, V and G.
Fig. 3 is an implementation process of the SAC power flow control algorithm of the present invention. Lines 1-3 initialize the weights of the policy network, the Q network, and the target network. Line 4 sets the playback buffer size. Lines 5-29 show the training process for each complete sample. Line 6 resets s at the beginning of the sampletThe initial state of (1). Line 9 extracts the control action a from the distribution of the current policy calculationt. In the load flow calculation, the part with the largest calculation amount is the load flow calculation under the condition of considering a large number of faults, and in order to reduce the calculation amount while maintaining the key characteristics, only an N-1 fault set causing line out-of-limit in the ground state is calculated. Lines 12-15 show the method of calculating the fault reward value. Lines 16-19, show that only L is consideredcThe prize value of (1). Lines 20-21, the base state prize value is superimposed into the total prize value. Line 22, collect the next state s from the environmentt+1The information of (1). Line 23, the MDP tuple collected from the current step is sent to the replay buffer. Lines 24-29 are used to update the policy and q-value based network in equations (13), (14) and (16)。
After a control strategy is successfully found, if a reduced N-1 fault set is used to save computational resources, security problems may still occur for other faulty lines that are ignored in the computation process, thus adding the step of looking at all faulty lines in rows 30-34 to ensure system security.
The invention also provides a device for automatically adjusting the operation mode of the power grid based on maximum entropy reinforcement learning, which comprises the following components:
the power grid simulation environment module is used for solving a power grid load flow equation and updating power grid running state data in each interaction step;
the training process module is used for training the reinforcement learning agent based on the power grid state data and outputting a generator control strategy;
the using process module is used for adjusting the power grid flow under the condition that the generator control strategy meets the safety requirement; the safety requirements include a ground state and a fault condition.
Specifically, the power grid simulation environment module comprises a power flow solver and an environment assembly;
the environment component is used for updating and storing the power grid operation state data; and the power grid operation state data is stored in the power grid operation mode file.
Specifically, the environmental component updates the generator control strategy into the grid operating mode file.
The invention stores the power grid operation state data in BPA format.
The power flow solver is used for solving a power grid power flow equation based on the latest power grid operation state data, and comprises four different numerical methods, namely a P-Q decomposition method, a Newton-Raphson method, a P-Q automatic conversion method into a YR method and a P-Q automatic conversion method into a Newton-Raphson method.
Specifically, the training process module comprises a state extraction module, a reward value calculation module, an agent update module and an experience playback pool:
the state extraction module is used for capturing power grid operation state data and storing the data into an experience playback pool;
the reward value calculation module calculates a reward value according to the base state and the N-1 fault based on the output result of the load flow solver, and stores the reward value into an experience playback pool;
the intelligent agent updating module is used for giving a control action, namely a generator control strategy, by adopting a maximum entropy intelligent agent based on the power grid running state; and updating the network parameters based on the reward value, the current grid operating state, the generator control strategy and the next grid operating state.
When the empirical playback pool stores information beyond capacity, the old data is deleted from the buffer.
Specifically, the use process module is used for inputting the current power grid operation state data into the trained intelligent agent to obtain an optimal generator control strategy and adjust the power grid operation mode.
The present invention accordingly also provides a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform the method.
The invention accordingly also provides a computing device comprising one or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing the method.
Examples
In order to verify the effectiveness of the method, a power grid operation mode model is adopted to generate a future operation scheme by taking a high-voltage power grid (220kv +) model in east China as an object. The model includes 6500 bus bars, 600 generators, 6000 lines and 4300 transformers. The grid planning model is provided in BPA format. And selecting Jinhua subareas of the Zhejiang power grid to carry out individual case research. The "Jinhua" grid has about 200 buses and 170 transmission lines in total.
To train an effective reinforcement learning agent, the state space includes the main transmission line power of the "Jinhua" partition. The control space is formed by adjusting the active power output of all available generators near Jinhua. The control objective of training the SAC agent is to achieve a safe mode of operation under ground state and N-1 failures. In a power grid simulation environment, an alternating current power flow equation solving software program supporting BPA format data is used for continuously interacting with a reinforcement learning intelligent agent in a training process. The inputs to the environment are stored in a trend file (BPA format) which contains the control actions. The outputs of the environment, including bus voltage, generator output, line power, etc., are stored in the text file, and then the parser extracts its key information, updating the state space, action space and reward value.
The method is verified by taking the operating mode of the 1-month-east China Power grid in 2019 as an example, and the performances of SAC agents trained by using different parameters are compared in detail in FIGS. 4 to 8.
The overall performance of the SAC agent under consideration of N-1 failures is given in fig. 4, and as can be seen from the average reward and training step curve, the SAC agent converges successfully after 60 samples. The simulation result verifies the effectiveness of the method, and a large amount of manual operation time can be saved. Fig. 5 shows the active variation trace of 7 generator sets used by the agent, and it can be seen that the SAC agent can find the optimal control strategy for adjusting the generator power with only three steps, and the variation is reasonably adjusted around the original value, which is in accordance with the experience of engineers, i.e. the minimum variation is used to adjust the output of the generator to solve the safety problem.
Meanwhile, the embodiment also researches the influence of different parameters on the performance of the SAC intelligent agent. Two parameters are the exploration step and the temperature coefficient. Referring to fig. 6 and 7, a small search was performed for the different exploration steps, showing how many steps the agent randomly explored during the initial stages of training. We compared three different exploration steps, 10, 20 and 30, for a total of 100 samples. Referring to fig. 7, although the three different exploration steps are all oscillatory at the beginning of the training, they all converge around 90 samples. This means that the ability of the agent to find the optimal control strategy is not affected when the exploration procedure is varied over a small range. When the exploration step size is set to 30, convergence is faster, indicating that more knowledge of the environment helps the agent to reach the target state faster.
Furthermore, the temperature coefficient may affect the performance and training speed of SAC agents. Fixing the parameter value may significantly slow or compromise the convergence of the SAC agent. Referring to fig. 8, SAC agent performance converges significantly faster when the temperature coefficient can be adjusted automatically with the training process.
The invention provides a power grid operation mode intelligent adjustment method considering safety constraints, which adopts a SAC algorithm with automatic temperature coefficient calculation, and the simulation result verifies the effectiveness of the method, so that the framework can automatically search feasible power grid operation conditions under the condition of considering uncertainty.
It is to be noted that the apparatus embodiment corresponds to the method embodiment, and the implementation manners of the method embodiment are all applicable to the apparatus embodiment and can achieve the same or similar technical effects, so that the details are not described herein.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A power grid operation mode intelligent regulation method is characterized by comprising the following steps:
acquiring current power grid data, inputting the current power grid data into a predetermined power grid model considering safety constraints for power grid load flow calculation, and extracting power grid operation state data according to a calculation result, wherein the current power grid data comprises a power grid topological structure, bus information, load information, generator output information, transformer information and a power grid control equipment state;
inputting power grid operation state data into a pre-trained intelligent agent to obtain an optimal generator control strategy, wherein the generator control strategy is a generator active power control signal;
and adjusting the power grid operation mode according to the optimal generator control strategy.
2. The intelligent power grid operation mode adjustment method according to claim 1, wherein the power grid model is:
wherein the content of the first and second substances,andrepresenting the active power of generator n on bus iAnd reactive power output, Pij(y) and Qij(y) represents the active and reactive power from bus i to bus j, ViRepresenting the voltage amplitude of the bus i, B representing the bus set, the superscript g representing the generator, the superscript d representing the grid load, Pi gAndis the active power injection and reactive power injection of the generator on the bus i, Pi dAndis the load active and reactive power on bus i,andis the active and reactive power of the load m on the bus i, GiIs a set of generators on bus i, DiIs the set of loads on the bus i, BiIs a busbar set, g, forming a branch with busbar iiIs the self-conductance of the busbar i, biIs the self-susceptance of the bus i;
the constraint conditions of the safety constraint are as follows:
Vi min≤Vi≤Vi max,i∈B
Pij(y)=gijVi 2-ViVj(gijcos(θi-θj)+bijsin(θi-θj)),(i,j)∈ΩL
Qij(y)=-Vi 2(bij0+bij)-ViVj(gijsin(θi-θj)-bijcos(θi-θj)),(i,j)∈ΩL
wherein the content of the first and second substances,andthe upper limit and the lower limit of the active power of the generator are shown,andrepresenting the upper and lower reactive limits of the generator, G representing the generator set, Vi minAnd Vi maxRepresenting the upper and lower bus voltage amplitude limits,is the apparent power ceiling, Ω, of the transmission lineLRepresents a set of transmission lines, ΩTRepresents a set of transformers, gijIs the mutual conductance of bus i and bus j, θiIs the phase angle of the voltage of the bus i, thetajRepresenting the phase angle of the voltage of bus j, bijIs the mutual susceptance of the busbar i and the busbar j, bij0Is a contact line capacitor susceptance, PijAnd QijRespectively representing active and reactive power, V, on line ijiAnd VjRepresenting the voltage amplitude of bus i and bus j, respectively.
3. The method according to claim 2, wherein the training process of the agent comprises:
obtaining historical power grid data, wherein the historical power grid data comprises a power grid topological structure, bus information, load information, generator output information, transformer information and a power grid control equipment state under different discontinuities, inputting the historical power grid data into the power grid model, obtaining power grid operation state data under corresponding time fractures and calculating an intelligent agent reward value;
the method comprises the steps that power grid operation state data under a certain time interval are used as input, a maximum entropy intelligent agent reinforcement learning algorithm is adopted, and intelligent agent control actions are obtained and are a generator control strategy;
inputting the obtained generator control strategy into the power grid model for calculation, and extracting power grid running state data under the next intermittent section according to a calculation result;
updating network parameters of the intelligent agent based on the power grid running state data under the time section, the reward value of the intelligent agent, the control action of the intelligent agent and the power grid running state data under the next intermittent section;
and (4) iterative loop calculation until the power of the transmission line in the controlled area does not exceed the safety limit under the basic state and the fault state of the power grid, and outputting the trained intelligent agent.
4. The intelligent power grid operation mode adjusting method according to claim 3, wherein the power grid model is solved by a P-Q decomposition method, a Newton-Raphson method, a method of automatically converting P-Q into YR, or a method of automatically converting P-Q into Newton-Raphson.
5. The intelligent regulation method of grid operation mode according to claim 3, wherein the grid operation state data is expressed as:
s=(P,V,G)
wherein, P represents a group of line active power in a research area, V represents the voltage amplitude of a bus in the same area, and G represents the vector of the generator active power output.
6. The method of claim 3, wherein calculating the agent reward value comprises:
r=rcon+rbase
where r represents the prize value, rconIndicating a fault reward value, rbaseRepresenting the base state prize value, PfromAnd PtoIs a measure of the active power at the head and tail ends of the transmission line, PlimitIs the active upper limit of the line, a and b are respectively reward value coefficients, N is the total number of the line, k represents a line counter in a grid reference state, and l represents a counter in a grid fault state.
7. The intelligent power grid operation mode adjustment method according to claim 3, wherein the power grid operation control targets are:
where C (v) is the cost of electricity generated by generator v, C represents the set of generators considering the cost of operation, Ploss(i, j) is the value of the active loss on line ij, KmaxMin represents the minimum value for the total number of generators.
8. The utility model provides a power grid operation mode intelligent regulation system which characterized in that includes:
the acquisition module is used for acquiring current power grid data, inputting the current power grid data into a predetermined power grid model considering safety constraints for power grid load flow calculation, and extracting power grid operation state data according to a calculation result, wherein the current power grid data comprises a power grid topological structure, bus information, load information, generator output information, transformer information and a power grid control equipment state;
the processing module is used for inputting the power grid operation state data into a pre-trained intelligent agent to obtain an optimal generator control strategy, wherein the generator control strategy is a generator active power control signal;
and the adjusting module is used for adjusting the power grid operation mode according to the optimal generator control strategy.
9. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-8.
10. A computing device, comprising,
one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110541404.1A CN113315131A (en) | 2021-05-18 | 2021-05-18 | Intelligent power grid operation mode adjusting method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110541404.1A CN113315131A (en) | 2021-05-18 | 2021-05-18 | Intelligent power grid operation mode adjusting method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113315131A true CN113315131A (en) | 2021-08-27 |
Family
ID=77373499
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110541404.1A Pending CN113315131A (en) | 2021-05-18 | 2021-05-18 | Intelligent power grid operation mode adjusting method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113315131A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113725853A (en) * | 2021-08-30 | 2021-11-30 | 浙江大学 | Power grid topology control method and system based on active person in-loop reinforcement learning |
CN115360772A (en) * | 2022-03-23 | 2022-11-18 | 中国电力科学研究院有限公司 | Power system active safety correction control method, system, equipment and storage medium |
CN117526443A (en) * | 2023-11-07 | 2024-02-06 | 北京清电科技有限公司 | Novel power system-based power distribution network optimization regulation and control method and system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111523737A (en) * | 2020-05-29 | 2020-08-11 | 四川大学 | Automatic optimization-approaching adjusting method for operation mode of electric power system driven by deep Q network |
CN112615379A (en) * | 2020-12-10 | 2021-04-06 | 浙江大学 | Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning |
-
2021
- 2021-05-18 CN CN202110541404.1A patent/CN113315131A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111523737A (en) * | 2020-05-29 | 2020-08-11 | 四川大学 | Automatic optimization-approaching adjusting method for operation mode of electric power system driven by deep Q network |
CN112615379A (en) * | 2020-12-10 | 2021-04-06 | 浙江大学 | Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning |
Non-Patent Citations (1)
Title |
---|
XIUMIN SHANG: "Reinforcement Learning-Based Solution to Power Grid Planning and Operation Under Uncertainties", 《2020 IEEE/ACM WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC) AND WORKSHOP ON ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR SCIENTIFIC APPLICATIONS (AI4S)》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113725853A (en) * | 2021-08-30 | 2021-11-30 | 浙江大学 | Power grid topology control method and system based on active person in-loop reinforcement learning |
CN115360772A (en) * | 2022-03-23 | 2022-11-18 | 中国电力科学研究院有限公司 | Power system active safety correction control method, system, equipment and storage medium |
CN115360772B (en) * | 2022-03-23 | 2023-08-15 | 中国电力科学研究院有限公司 | Active safety correction control method, system, equipment and storage medium for power system |
CN117526443A (en) * | 2023-11-07 | 2024-02-06 | 北京清电科技有限公司 | Novel power system-based power distribution network optimization regulation and control method and system |
CN117526443B (en) * | 2023-11-07 | 2024-04-26 | 北京清电科技有限公司 | Power system-based power distribution network optimization regulation and control method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113315131A (en) | Intelligent power grid operation mode adjusting method and system | |
CN109149638B (en) | Distributed coordination voltage control method and system for VSC-HVDC grid-connected wind power plant based on MPC and ADMM algorithm | |
US20200119556A1 (en) | Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency | |
US11336092B2 (en) | Multi-objective real-time power flow control method using soft actor-critic | |
CN111666713B (en) | Power grid reactive voltage control model training method and system | |
CN103810646B (en) | Improved projection integral algorithm based active power distribution system dynamic simulation method | |
CN111864743B (en) | Construction method of power grid dispatching control model and power grid dispatching control method | |
CN113159341A (en) | Power distribution network aid decision-making method and system integrating deep reinforcement learning and expert experience | |
CN104769802A (en) | Method for the computer-aided control of the power in an electrical grid | |
Zhang et al. | Deep reinforcement learning for load shedding against short-term voltage instability in large power systems | |
Zhang et al. | Real-time autonomous line flow control using proximal policy optimization | |
CN113097994A (en) | Power grid operation mode adjusting method and device based on multiple reinforcement learning agents | |
Lin et al. | Real-time power system generator tripping control based on deep reinforcement learning | |
US20240006890A1 (en) | Local volt/var controllers with stability guarantees | |
CN105207255B (en) | A kind of power system peak regulation computational methods suitable for wind power output | |
CN116031957A (en) | Large-scale wind farm voltage and frequency recovery control method | |
Mony et al. | Gaussian quantum particle swarm optimization-based wide-area power system stabilizer for damping inter-area oscillations | |
CN113610262B (en) | Method and device for coordination optimization of power distribution network based on Benders decomposition | |
CN111682552B (en) | Data-driven reactive voltage control method, device, equipment and storage medium | |
CN115051360A (en) | Online computing method and device for operation risk of electric power system based on integrated knowledge migration | |
Khooban et al. | Modeling and HiL real-time simulation for the secondary LFC in time-delay shipboard microgrids | |
CN113013913B (en) | Reactive voltage control system and method for wind farm | |
Liu et al. | Optimization of emergency load shedding following HVDC blocking of receiving-end power systems based on PSO | |
Zhu et al. | Mitigating multi-stage cascading failure by reinforcement learning | |
Li et al. | Deep reinforcement learning-based fast prediction of strategies for security control |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210827 |