CN113315131A

CN113315131A - Intelligent power grid operation mode adjusting method and system

Info

Publication number: CN113315131A
Application number: CN202110541404.1A
Authority: CN
Inventors: 张静; 杨靖萍; 刁瑞盛; 尚秀敏; 叶琳; 杨滢; 周正阳; 周靖皓; 吕勤; 徐建平; 陈良亮; 周材
Original assignee: State Grid Zhejiang Electric Power Co Ltd; Nari Technology Co Ltd; Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Current assignee: State Grid Zhejiang Electric Power Co Ltd; Nari Technology Co Ltd; Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2021-05-18
Filing date: 2021-05-18
Publication date: 2021-08-27

Abstract

The invention discloses a method and a system for intelligently adjusting the operation mode of a power grid, wherein the method comprises the following steps: acquiring current power grid data, inputting the current power grid data into a predetermined power grid model considering safety constraints for calculation, and extracting power grid operation state data according to a calculation result; inputting power grid operation state data into a pre-trained intelligent agent to obtain an optimal generator control strategy, wherein the generator control strategy is a generator active power control signal; and adjusting the power grid operation mode according to the optimal generator control strategy. The advantages are that: the invention utilizes the pre-trained intelligent agent to automatically search for feasible power grid operation conditions under the condition of considering uncertainty.

Description

Intelligent power grid operation mode adjusting method and system

Technical Field

The invention relates to an intelligent adjusting method and system for a power grid operation mode, and belongs to the technical field of power grid regulation and control.

Background

In recent years, energy and environment policies and standards greatly promote the rapid development of green energy, and the penetration ratio of renewable energy in a power grid is continuously improved. However, due to their intermittency, dynamics, and randomness, the access of large amounts of such energy into the grid presents significant challenges to the safe and economic operation of the power system. In the existing method, the future power grid operation mode is formulated by large-scale numerical simulation to find the optimal power grid operation mode under various fault conditions. The process includes power demand forecasting, new line construction planning, maintenance and outage planning, generator set planning, and the like. Due to the high complexity, non-linearity and dimensionality of the problem, this process typically requires a significant amount of human effort to achieve the desired goal empirically by manually modifying the model parameters. The power industry is lacking an effective method and tool to automate this process.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides an intelligent power grid operation mode adjusting method and system.

In order to solve the technical problem, the invention provides an intelligent adjustment method for a power grid operation mode, which comprises the following steps:

acquiring current power grid data, inputting the current power grid data into a predetermined power grid model considering safety constraints for power grid load flow calculation, and extracting power grid operation state data according to a calculation result, wherein the current power grid data comprises a power grid topological structure, bus information, load information, generator output information, transformer information and a power grid control equipment state;

inputting power grid operation state data into a pre-trained intelligent agent to obtain an optimal generator control strategy, wherein the generator control strategy is a generator active power control signal;

and adjusting the power grid operation mode according to the optimal generator control strategy.

Further, the power grid model is as follows:

wherein the content of the first and second substances,

and

representing the active and reactive power output, P, of the generator n on the bus i_ij(y) and Q_ij(y) represents the active and reactive power from bus i to bus j, V_iRepresenting the voltage amplitude of the bus i, B representing the bus set, the superscript g representing the generator, the superscript d representing the grid load, P_i ^gAnd

is the active power injection and reactive power injection of the generator on the bus i, P_i ^dAnd

is the load active and reactive power on bus i,

and

is the active and reactive power of the load m on the bus i, G_iIs a set of generators on bus i, D_iIs the set of loads on the bus i, B_iIs a busbar set, g, forming a branch with busbar i_iIs the self-conductance of the busbar i, b_iIs the self-susceptance of the bus i;

the constraint conditions of the safety constraint are as follows:

wherein the content of the first and second substances,

and

the upper limit and the lower limit of the active power of the generator are shown,

and

representing the upper and lower reactive limits of the generator, G representing the generator set, V_i ^minAnd V_i ^maxRepresenting the upper and lower bus voltage amplitude limits,

is the apparent power ceiling, Ω, of the transmission line_LRepresents a set of transmission lines, Ω_TRepresents a set of transformers, g_ijIs the mutual conductance of bus i and bus j, θ_iIs the phase angle of the voltage of the bus i, theta_jRepresenting the phase angle of the voltage of bus j, b_ijIs the mutual susceptance of the busbar i and the busbar j, b_ij0Is a contact line capacitor susceptance, P_ijAnd Q_ijRespectively representing active and reactive power, V, on line ij_iAnd V_jRepresenting the voltage amplitude of bus i and bus j, respectively.

Further, the training process of the agent comprises:

obtaining historical power grid data, wherein the historical power grid data comprises a power grid topological structure, bus information, load information, generator output information, transformer information and a power grid control equipment state under different discontinuities, inputting the historical power grid data into the power grid model, obtaining power grid operation state data under corresponding time fractures and calculating an intelligent agent reward value;

the method comprises the steps that power grid operation state data under a certain time interval are used as input, a maximum entropy intelligent agent reinforcement learning algorithm is adopted, and intelligent agent control actions are obtained and are a generator control strategy;

inputting the obtained generator control strategy into the power grid model for calculation, and extracting power grid running state data under the next intermittent section according to a calculation result;

updating network parameters of the intelligent agent based on the power grid running state data under the time section, the reward value of the intelligent agent, the control action of the intelligent agent and the power grid running state data under the next intermittent section;

and (4) iterative loop calculation until the power of the transmission line in the controlled area does not exceed the safety limit under the basic state and the fault state of the power grid, and outputting the trained intelligent agent.

Further, a P-Q decomposition method, a Newton-Raphson method, a method for automatically converting P-Q into YR or a method for automatically converting P-Q into Newton-Raphson are adopted to solve the power grid model.

Further, the grid operating state data is expressed as:

s＝(P，V，G)

wherein, P represents a group of line active power in a research area, V represents the voltage amplitude of a bus in the same area, and G represents the vector of the generator active power output.

Further, the calculating of the agent prize value comprises:

r＝r_con+r_base

where r represents the prize value, r_conIndicating a fault reward value, r_baseRepresenting the base state prize value, P_fromAnd P_toIs a measure of the active power at the head and tail ends of the transmission line, P_limitIs the active upper limit of the line, a and b are respectively reward value coefficients, N is the total number of the line, k represents a line counter in a grid reference state, and l represents a counter in a grid fault state.

Further, the grid operation control targets are as follows:

where C (v) is the cost of electricity generated by generator v, C represents the set of generators considering the cost of operation, P_loss(i, j) is the value of the active loss on line ij, K_maxMin represents the minimum value for the total number of generators.

An intelligent regulation system for the operation mode of a power grid comprises:

the acquisition module is used for acquiring current power grid data, inputting the current power grid data into a predetermined power grid model considering safety constraints for power grid load flow calculation, and extracting power grid operation state data according to a calculation result, wherein the current power grid data comprises a power grid topological structure, bus information, load information, generator output information, transformer information and a power grid control equipment state;

the processing module is used for inputting the power grid operation state data into a pre-trained intelligent agent to obtain an optimal generator control strategy, wherein the generator control strategy is a generator active power control signal;

and the adjusting module is used for adjusting the power grid operation mode according to the optimal generator control strategy.

A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods.

A computing device, comprising, in combination,

one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods.

The invention achieves the following beneficial effects:

according to the invention, the feasible power grid operation condition is automatically searched under the condition of considering uncertainty according to the pre-trained intelligent agent.

Drawings

FIG. 1 is a principle of implementation of an intelligent power grid operation mode adjustment method considering security constraints according to the present invention;

FIG. 2 is a flow chart of calculating a prize value according to the present invention;

FIG. 3 is an example of an automatic adjustment algorithm for a power grid operation mode based on maximum entropy reinforcement learning according to the present invention;

FIG. 4 is a process for agent training in an embodiment of the invention;

FIG. 5 illustrates a generator output adjustment process according to an embodiment of the present invention;

FIG. 6 is a comparison of reward values of agents for different exploration steps in accordance with an embodiment of the present invention;

FIG. 7 is a comparison of total control iterations of agents for different exploration steps in an embodiment of the present invention;

FIG. 8 is a comparison of the performance of an agent using constant and varying temperature coefficients in an embodiment of the present invention.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides an intelligent adjusting method of a power grid operation mode, which comprises the steps of firstly describing a problem formulated by the power grid operation mode as MDP (Markov Decision Process), collecting power grid flow information in the MDP to form a state space, and modeling a control target and constraint as a reward value when training a reinforcement learning agent. Secondly, a universal framework is adopted, so that various types of control actions, including generator active power adjustment, load transfer and the like, automatically adjust and consider the transmission line power under various fault working conditions.

The invention provides an intelligent adjusting method for a power grid operation mode, and an integral framework of the method is shown in figure 1, namely, the active power output of a selected generator in a power grid is adjusted in a specified power grid area to meet various safety requirements of the power grid operation mode, including a ground state and a fault working condition.

The method comprises the following steps:

solving a power grid model according to the current power grid data, and extracting power grid state data based on a calculation result;

based on the power grid state data, a generator control strategy is obtained through a pre-trained intelligent agent; the generator control strategy refers to an active power control signal of a generator;

and adjusting the power grid operation mode based on the optimal generator control strategy.

In the present invention, the agent is required to be trained in advance, and referring to fig. 1, the method includes:

step (1): transmitting the power grid model and the power grid operation mode file to a power grid simulation environment;

step (2): solving a power grid model based on historical power grid data (including power grid topological structures, bus information, load information, generator output information, transformer information and power grid control equipment states under different discontinuities) to obtain current power grid operation state data;

and (3): calculating an intelligent agent reward value by adopting a calculation result of the power grid model;

and (4): sending the reward value to an experience playback pool for storage;

and (5): extracting current power grid operation state data;

and (6): sending the current power grid operation state data to an experience playback pool for storage;

and (7): obtaining an intelligent agent control action, namely a generator control strategy, by adopting a maximum entropy intelligent agent based on the current power grid operation state data;

and (8): sending the control action of the agent to an experience playback pool for storage;

and (9): and sending the control action of the intelligent agent to a power grid simulation environment, storing the control action into a power grid operation mode file, applying the control action to a power grid model to obtain the next power grid operation state data, updating network parameters of the intelligent agent, and performing iterative loop calculation until the power grid operation control target is met.

The maximum entropy reinforcement learning algorithm with automatic temperature coefficient calculation is adopted, and feasible power grid operation conditions are automatically searched under the condition of considering uncertainty.

Specifically, solving the power grid model includes:

the grid model is represented as:

wherein the content of the first and second substances,

and

representing the active and reactive power output, P, of the generator n on the bus i_ij(y) and Q_ij(y) represents the active and reactive power from bus i to bus j, V_iRepresenting the voltage amplitude of the bus i, B representing the bus set, g representing the generator, d representing the grid load, P_i ^gAnd

is the load active and reactive power on the bus iThe power of the electric motor is controlled by the power controller,

and

is the active and reactive power of the load m on the bus i, G_iIs a set of generators on bus i, D_iIs the set of loads on the bus i, B_iIs a busbar set, g, forming a branch with busbar i_iIs the self-conductance of the busbar i, b_iIs the self susceptance of the bus i.

The grid model is subject to constraints that represent the physical limits of the various electrical devices, respectively, requiring that all line currents, generator outputs and voltage magnitudes be operated within their physical limits.

Wherein the content of the first and second substances,

and

and

is the apparent power ceiling, Ω, of the transmission line_LRepresents a set of transmission lines, Ω_TRepresenting a set of transformers. Active power P of line_ijAnd reactive power Q_ijThe calculation is as follows:

wherein, g_ijIs the mutual conductance of bus i and bus j, θ_iIs the phase angle of the voltage of the bus i, the mutual conductance b_ijIs the mutual susceptance of the busbar i and the busbar j, b_ij0Is the link capacitor susceptance.

The reasonable operation mode of the power grid can consider various control targets, namely, the power generation cost and/or the transmission loss are/is reduced as much as possible while all the constraint conditions are met. The control target of minimizing the power generation cost is given by formula (11); the control objective of minimizing the grid loss is given by equation (12):

where C (k) is the cost of power generation for generator k, C represents the set of generators considering the cost of operation, P_loss(i, j) is the value of the active loss on line ij, K_maxIs the total number of generators, P_lossIs the sum of the network losses.

In the present invention, a P-Q decomposition method, a Newton-Raphson method, a method of automatically converting P-Q into YR or a method of automatically converting P-Q into Newton-Raphson

In the present invention, the grid state data is represented as: and s is (P, V, G), wherein P represents a group of line active power in the research area, V represents the voltage amplitude of a bus in the same area, and G represents a vector of generator active power output.

The invention adopts a mode of training an agent by a maximum entropy reinforced learning algorithm (SAC) to search the optimal operation mode of the power grid, and the training reinforced learning agent can provide preventive and corrective control measures so as to ensure the safety of the power grid under various operation conditions.

Reinforcement learning is a branch of machine learning methods that involve an agent taking actions in turn and interacting with the environment in order to maximize accumulated returns. At each step t, the agent observes a state s_tPerforming a control action a_tAnd receives a prize value r_t. The behavior of the agent is defined by the policy pi, obtained from p (a) ← S, which maps states to a probability distribution of control actions. It is usually modeled as an MDP tuple s_t,a_tP, r, which comprises a state space s_tMotion space a_tTransition probability p(s)_t+1|s_t,a_t) The reward function r(s)_t,a_t). It may be defined for the expected benefit from one state as a composite of discounted future prize values,

where gamma is the discount coefficient, at [0, 1%]In the meantime. The ultimate goal of training the reinforcement learning agent is to find a control strategy that maximizes the reward value.

In order to find a good policy, cost function based methods such as Q-learning can be used to measure how good an action is in a particular state, or policy based methods can be used to directly find out what control strategies should be taken in different states without knowing how good the actions are. However, the problems faced in the real world are extremely complex.

In the present invention, a maximum entropy reinforcement learning algorithm (SAC) was chosen that exhibits the most advanced performance in terms of both sample efficiency and stability, because of its unique ability to maximize the desired reward and entropy during training.

The objective function for calculating the Q-value is given in equation (13), θ and ψ represent the parameterized network for modeling the soft Q-value function and the control strategy, respectively, V_ψIs a function of the state value and alpha is a temperature parameter that determines the relative importance of the entropy term and the reward value, thereby controlling the randomness of the optimal policy.

The objective function of the strategy is given in equation (14), in the present invention, a normal distribution is used, the temperature coefficient α is fixed in the previous calculation, but training with a fixed temperature coefficient will make the performance of the agent unstable as the reward value changes, so it is preferable to have an automatic temperature coefficient, which can also change as the policy is updated to explore more action space. Thus, in the present invention, an average entropy constraint is added to the original objective function, while allowing the entropy to vary in different states. Thus, the new objective function is modified as follows:

wherein H₀Is the desired minimum entropy value, and the loss function of the temperature coefficient is given by equation (16):

specifically, the training reinforcement learning agent updates the active power output of the generator, including:

specifically, the reward value is a feedback of the performance of the intelligent agent in each control iteration, and a well-designed reward value can not only guide the intelligent agent to update the neural network parameters in a more effective direction, but also accelerate the whole training process. Our control objective is to minimize the change of active power in emergency, guarantee the grid safety, and prevent potential line power overload problem. The considered fault refers to a transmission line fault in the grid, which means that the controlled area must be able to maintain the ground state and the safety and reliability after an N-1 fault.

The reward value function is then defined as the sum of the fault reward and the base reward:

r＝r_con+r_base

wherein r is_conIndicating a fault reward value, r_baseRepresenting the base state prize value.

The failure reward value is calculated as:

wherein, P_fromAnd P_toIs a measure of the active power at the head and tail ends of the transmission line, P_limitIs the active upper limit of the line, representing the thermal limit or stability limit, a and b are the reward value coefficients, respectively, and N is the total number of lines. When the function has N-1 faults, the out-of-limit degree of the residual N-1 lines of power in the controlled area.

The base state reward value is calculated as:

all variables in this function are the same as the variables defined in the fault reward function. The only difference is that the condition that the line power is out of limit is checked on the premise that the calculation of the basic state reward value ensures that the current topological structure is not changed. The calculation flow of the prize value is shown in fig. 2. Firstly, inputting a power grid operation mode model and data, and specifying a fault set (L); secondly, fault analysis is carried out on each line in the fault set L, and circulation is carried out until all fault lines are scanned. In each fault analysis, a power flow equation is solved, and when the power flow solution is not converged, a mode of replacing a numerical solution method or a mode of removing an electric island can be adopted to ensure the convergence of a power flow result. And thirdly, calculating the reward value according to the load flow calculation result. The accumulated prize value is finally output.

Specifically, the grid operating state is defined as: and s is (P, V, G), wherein P represents a group of line active power in the research area, V represents the voltage amplitude of a bus in the same area, and G represents a vector of generator active power output. In order to keep consistency of input and output of different types when the intelligent agent is trained, normalized output is carried out on P, V and G.

Fig. 3 is an implementation process of the SAC power flow control algorithm of the present invention. Lines 1-3 initialize the weights of the policy network, the Q network, and the target network. Line 4 sets the playback buffer size. Lines 5-29 show the training process for each complete sample. Line 6 resets s at the beginning of the sample_tThe initial state of (1). Line 9 extracts the control action a from the distribution of the current policy calculation_t. In the load flow calculation, the part with the largest calculation amount is the load flow calculation under the condition of considering a large number of faults, and in order to reduce the calculation amount while maintaining the key characteristics, only an N-1 fault set causing line out-of-limit in the ground state is calculated. Lines 12-15 show the method of calculating the fault reward value. Lines 16-19, show that only L is considered_cThe prize value of (1). Lines 20-21, the base state prize value is superimposed into the total prize value. Line 22, collect the next state s from the environment_t+1The information of (1). Line 23, the MDP tuple collected from the current step is sent to the replay buffer. Lines 24-29 are used to update the policy and q-value based network in equations (13), (14) and (16)。

After a control strategy is successfully found, if a reduced N-1 fault set is used to save computational resources, security problems may still occur for other faulty lines that are ignored in the computation process, thus adding the step of looking at all faulty lines in rows 30-34 to ensure system security.

The invention also provides a device for automatically adjusting the operation mode of the power grid based on maximum entropy reinforcement learning, which comprises the following components:

the power grid simulation environment module is used for solving a power grid load flow equation and updating power grid running state data in each interaction step;

the training process module is used for training the reinforcement learning agent based on the power grid state data and outputting a generator control strategy;

the using process module is used for adjusting the power grid flow under the condition that the generator control strategy meets the safety requirement; the safety requirements include a ground state and a fault condition.

Specifically, the power grid simulation environment module comprises a power flow solver and an environment assembly;

the environment component is used for updating and storing the power grid operation state data; and the power grid operation state data is stored in the power grid operation mode file.

Specifically, the environmental component updates the generator control strategy into the grid operating mode file.

The invention stores the power grid operation state data in BPA format.

The power flow solver is used for solving a power grid power flow equation based on the latest power grid operation state data, and comprises four different numerical methods, namely a P-Q decomposition method, a Newton-Raphson method, a P-Q automatic conversion method into a YR method and a P-Q automatic conversion method into a Newton-Raphson method.

Specifically, the training process module comprises a state extraction module, a reward value calculation module, an agent update module and an experience playback pool:

the state extraction module is used for capturing power grid operation state data and storing the data into an experience playback pool;

the reward value calculation module calculates a reward value according to the base state and the N-1 fault based on the output result of the load flow solver, and stores the reward value into an experience playback pool;

the intelligent agent updating module is used for giving a control action, namely a generator control strategy, by adopting a maximum entropy intelligent agent based on the power grid running state; and updating the network parameters based on the reward value, the current grid operating state, the generator control strategy and the next grid operating state.

When the empirical playback pool stores information beyond capacity, the old data is deleted from the buffer.

Specifically, the use process module is used for inputting the current power grid operation state data into the trained intelligent agent to obtain an optimal generator control strategy and adjust the power grid operation mode.

The present invention accordingly also provides a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform the method.

The invention accordingly also provides a computing device comprising one or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing the method.

Examples

In order to verify the effectiveness of the method, a power grid operation mode model is adopted to generate a future operation scheme by taking a high-voltage power grid (220kv +) model in east China as an object. The model includes 6500 bus bars, 600 generators, 6000 lines and 4300 transformers. The grid planning model is provided in BPA format. And selecting Jinhua subareas of the Zhejiang power grid to carry out individual case research. The "Jinhua" grid has about 200 buses and 170 transmission lines in total.

To train an effective reinforcement learning agent, the state space includes the main transmission line power of the "Jinhua" partition. The control space is formed by adjusting the active power output of all available generators near Jinhua. The control objective of training the SAC agent is to achieve a safe mode of operation under ground state and N-1 failures. In a power grid simulation environment, an alternating current power flow equation solving software program supporting BPA format data is used for continuously interacting with a reinforcement learning intelligent agent in a training process. The inputs to the environment are stored in a trend file (BPA format) which contains the control actions. The outputs of the environment, including bus voltage, generator output, line power, etc., are stored in the text file, and then the parser extracts its key information, updating the state space, action space and reward value.

The method is verified by taking the operating mode of the 1-month-east China Power grid in 2019 as an example, and the performances of SAC agents trained by using different parameters are compared in detail in FIGS. 4 to 8.

The overall performance of the SAC agent under consideration of N-1 failures is given in fig. 4, and as can be seen from the average reward and training step curve, the SAC agent converges successfully after 60 samples. The simulation result verifies the effectiveness of the method, and a large amount of manual operation time can be saved. Fig. 5 shows the active variation trace of 7 generator sets used by the agent, and it can be seen that the SAC agent can find the optimal control strategy for adjusting the generator power with only three steps, and the variation is reasonably adjusted around the original value, which is in accordance with the experience of engineers, i.e. the minimum variation is used to adjust the output of the generator to solve the safety problem.

Meanwhile, the embodiment also researches the influence of different parameters on the performance of the SAC intelligent agent. Two parameters are the exploration step and the temperature coefficient. Referring to fig. 6 and 7, a small search was performed for the different exploration steps, showing how many steps the agent randomly explored during the initial stages of training. We compared three different exploration steps, 10, 20 and 30, for a total of 100 samples. Referring to fig. 7, although the three different exploration steps are all oscillatory at the beginning of the training, they all converge around 90 samples. This means that the ability of the agent to find the optimal control strategy is not affected when the exploration procedure is varied over a small range. When the exploration step size is set to 30, convergence is faster, indicating that more knowledge of the environment helps the agent to reach the target state faster.

Furthermore, the temperature coefficient may affect the performance and training speed of SAC agents. Fixing the parameter value may significantly slow or compromise the convergence of the SAC agent. Referring to fig. 8, SAC agent performance converges significantly faster when the temperature coefficient can be adjusted automatically with the training process.

The invention provides a power grid operation mode intelligent adjustment method considering safety constraints, which adopts a SAC algorithm with automatic temperature coefficient calculation, and the simulation result verifies the effectiveness of the method, so that the framework can automatically search feasible power grid operation conditions under the condition of considering uncertainty.

It is to be noted that the apparatus embodiment corresponds to the method embodiment, and the implementation manners of the method embodiment are all applicable to the apparatus embodiment and can achieve the same or similar technical effects, so that the details are not described herein.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A power grid operation mode intelligent regulation method is characterized by comprising the following steps:

2. The intelligent power grid operation mode adjustment method according to claim 1, wherein the power grid model is:

wherein the content of the first and second substances,

and

representing the active power of generator n on bus iAnd reactive power output, P_ij(y) and Q_ij(y) represents the active and reactive power from bus i to bus j, V_iRepresenting the voltage amplitude of the bus i, B representing the bus set, the superscript g representing the generator, the superscript d representing the grid load, P_i ^gAnd

is the load active and reactive power on bus i,

and

the constraint conditions of the safety constraint are as follows:

V_i ^min≤V_i≤V_i ^max，i∈B

P_ij(y)＝g_ijV_i ²-V_iV_j(g_ijcos(θ_i-θ_j)+b_ijsin(θ_i-θ_j))，(i，j)∈Ω_L

Q_ij(y)＝-V_i ²(b_ij0+b_ij)-V_iV_j(g_ijsin(θ_i-θ_j)-b_ijcos(θ_i-θ_j))，(i，j)∈Ω_L

wherein the content of the first and second substances,

and

and

3. The method according to claim 2, wherein the training process of the agent comprises:

4. The intelligent power grid operation mode adjusting method according to claim 3, wherein the power grid model is solved by a P-Q decomposition method, a Newton-Raphson method, a method of automatically converting P-Q into YR, or a method of automatically converting P-Q into Newton-Raphson.

5. The intelligent regulation method of grid operation mode according to claim 3, wherein the grid operation state data is expressed as:

s＝(P，V，G)

6. The method of claim 3, wherein calculating the agent reward value comprises:

r＝r_con+r_base

7. The intelligent power grid operation mode adjustment method according to claim 3, wherein the power grid operation control targets are:

8. The utility model provides a power grid operation mode intelligent regulation system which characterized in that includes:

9. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-8.

10. A computing device, comprising,

one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods of claims 1-8.