CN115345380A

CN115345380A - New energy consumption electric power scheduling method based on artificial intelligence

Info

Publication number: CN115345380A
Application number: CN202211062806.4A
Authority: CN
Inventors: 郭骏; 郭磊; 张勇; 宁剑; 郭万舒; 李敏; 王艺博; 陈茂源; 胡满; 喻乐; 訾鹏; 刘健
Original assignee: North China Grid Co Ltd
Current assignee: North China Grid Co Ltd
Priority date: 2022-09-01
Filing date: 2022-09-01
Publication date: 2022-11-15

Abstract

The invention discloses a new energy consumption power scheduling method based on artificial intelligence, which comprises the following steps: the method comprises the steps of constructing power grid active optimal power flow control into an active optimal scheduling on-line model of a power system, and training the active optimal scheduling on-line model based on a PPO (polyphenylene oxide) algorithm of a deep reinforcement learning framework; the deep reinforcement learning framework of the active optimization scheduling online model comprises states, actions and rewards; and performing online decision according to the real-time power grid operation data, and updating and optimizing the real-time power grid operation data with the aim of maximizing the reward of the intelligent agent to obtain the minimum power generation cost. According to the invention, by designing the interactive training framework of 'body-action-reward', an active power optimization scheduling online model of the power system is obtained, the optimal output control of the generator can be made in real time, and the output cost of the generator of the system is reduced under the condition of meeting the operation constraint of the power system.

Description

New energy consumption power scheduling method based on artificial intelligence

Technical Field

The invention relates to the technical field of power dispatching, in particular to a new energy consumption power dispatching method based on artificial intelligence.

Background

In recent years, with the rapid development of the electric power industry in China, the access proportion of renewable energy sources such as wind power and photovoltaic is continuously improved, and the proportion of the new energy sources in the total generated energy of an electric power system is also increasing. The fluctuation of new energy can bring great challenges to the safe and reliable operation of the power system, which has higher requirements on the real-time active power optimization scheduling of the power system. The acceptance capacity of the power system to the fluctuating new energy can be improved through a reasonable scheduling means, and the safe, reliable and economic operation of the power system is ensured.

The economic dispatching of the power system aims to adjust the active output of each generator and minimize the output cost of the generator under the condition of fully meeting the safe operation constraint of a power grid. The active optimal scheduling of a modern power system often comprises a plurality of different variables and a plurality of constraints, and is a typical nonlinear and high-dimensional problem. However, the traditional scheduling model is slow in solving speed to a certain extent, and along with the increase of the scale of the power system and the penetration of new energy, the traditional solving method model has certain errors, and the control requirement under the existing novel power system cannot be met. In the active power scheduling optimization research in the conventional power system, the commonly used calculation methods can be divided into three categories: mathematical methods, planning algorithms, heuristic algorithms. The methods have the problems of low calculation speed, easy falling into local optimization, dependence on models and prediction data and the like. With the increase of the scale of the power distribution network, the increase of the number of power electronic devices and the penetration of new energy, the complexity of solving the active optimization scheduling problem by the traditional method is greatly improved, and the method is not suitable for the active optimization scheduling of online control. Specifically, the conventional method for solving the active power optimization scheduling has time urgency in achieving convergence, and particularly when the system scale is larger and the new energy power generation ratio is gradually increased, meanwhile, the conventional idea finds the optimal solution through a model according to the state of the current time section, but cannot solve the optimal control under the continuous time section.

In recent years, artificial intelligence and data drive related technologies are promoted, so that an optimization method based on artificial intelligence is widely applied to a power system. The power system active power optimization scheduling problem can be modeled into a given power load value, and a sequential decision problem of the optimal generator power combination is searched. The deep reinforcement learning algorithm combines the excellent characterization capability of deep learning and the excellent decision-making capability of reinforcement learning, and shows good capability in solving continuous states and action spaces, so that a method for solving economic dispatch by adopting deep reinforcement learning is urgently needed to be developed in the industry.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides a new energy consumption power scheduling method based on artificial intelligence, which can promote new energy consumption.

The purpose of the invention is realized by the following technical scheme:

a new energy consumption power scheduling method based on artificial intelligence comprises the following steps:

s1, constructing the power grid active optimal power flow control into an active optimal scheduling on-line model of a power system,

s2, based on a PPO algorithm of a deep reinforcement learning framework, an intelligent agent of the active optimization scheduling online model of the power system gradually improves own actions through interaction with the environment to obtain maximum rewards so as to train the active optimization scheduling online model; the deep reinforcement learning framework of the active optimization scheduling online model comprises states, actions and rewards;

and S3, carrying out online decision on the active optimal scheduling online model according to the real-time power grid operation data, and carrying out updating optimization aiming at the maximization of the reward of the intelligent agent to obtain the minimized power generation cost.

Preferably, the PPO algorithm includes one Critic _ network and two Actor networks, which are Old _ activator and New _ activator, respectively.

Preferably, in an epicode, the agent first uses the existing active optimization scheduling policy Pi to interact with the environment to obtain data of a Batch, and after obtaining a complete Batch, the Actor _ network and the critical _ network start learning the complete Batch data.

Preferably, the starting of learning the complete Batch data by the Actor _ network and the critical _ network includes: the Critic network calculates the state value through an active optimization scheduling online model neural network, the Actor network iteratively updates parameters of a strategy function by using the state value so as to select an action and obtain a feedback and a new state, the Critic network updates the neural network parameters by using the feedback and the new state and helps the Actor network to calculate more accurate state/action value by using the new network parameter.

Preferably, each time an agent interacts with the environment, the agent saves the acquired state, action, and reward as a tuple in the experience pool.

Preferably, when the strategy function is updated, the step size of strategy update is limited by means of KL divergence.

Preferably, the relative weight of each action is obtained by means of importance sampling,

and converting the expected value of f (x) for the distribution p into the expected value relative to another distribution q, thereby realizing the reutilization of the data.

Preferably, the new energy consumption power scheduling method based on artificial intelligence specifically includes the following steps:

s11, inputting state information in an initialized and constructed environment into an Actor _ newetwork to obtain a mean value mu and a variance sigma representing motion distribution, constructing a normal distribution and then sampling motions;

s12, inputting the sampled actions into the environment to obtain the reward and the state of the next step, and then storing the reward and the state of the next step in an experience pool ((S) _t ,a _t ,r _t ,s _t+1 ) Then for the next state s _t+1 Step S11 is executed until a complete Batch data is obtained, and step S13 is executed;

s13, inputting the state into a critical _ network to obtain the state value, calculating the reward, obtaining the value of all the states, and calculating the advantage estimation function

S14, updating the parameter of the critical _ network according to the calculated dominance function and loss back propagation obtained after root mean square;

s15, inputting S and a of the experience pool into an Actor _ new and an Actor _ old respectively to obtain normal distributions N1 and N2 and P1 and P2, calculating a proportion P2/P1 of inportant sampling, and measuring and ensuring that the action distribution difference is smaller than M and M is larger than 0 by calculating the proportion P2/P1 and adopting a KL divergence mode;

s16, updating the parameters of the Actor _ network according to the calculated dominance function and the loss obtained after the root mean square, and calculating more accurate state/action value, wherein the calculation formula is as follows:

wherein pi (a) _t |s _t ) Is the probability of taking action a in the current state.

Preferably, the power grid active optimal power flow control is constructed into an active optimal scheduling online model of the power system based on a Markov decision process.

Preferably, the deep reinforcement learning framework of the active optimization scheduling online model further includes: state transitions and discount factors.

Compared with the prior art, the invention has the following advantages:

according to the real-time power grid state, the invention adopts an artificial intelligence method to carry out optimization control on the active power of the power system, carries out reasonable economic dispatching on the power system, and minimizes the operation cost of the power system under the condition of meeting the basic constraint of the power system, and specifically comprises the following steps:

(1) By designing a body-action-reward interactive training framework, an active power optimization scheduling online model of the power system is obtained, particularly when a large-scale high-proportion new energy power system is faced, optimal output control of a generator can be made in real time, and output cost of the generator of the system is reduced under the condition that operation constraint of the power system is met.

(2) The problem that the Policy Gradient algorithm is sensitive to the step length is solved, small-batch updating is achieved in multiple training steps, the experience pool is introduced, the data utilization rate is improved, and the method and the device are suitable for scenes of continuous action spaces.

(3) According to the method, under a deep reinforcement learning framework, an environment modeling is carried out on the optimal load flow calculation problem of the power system under different load levels according to a Markov decision process. Meanwhile, a training scheme of the optimal power flow calculation automatic adjustment model is integrally designed and a final model is obtained through training. Simulation experiments prove that the method can automatically provide a power grid optimal power flow calculation adjustment scheme under different load levels, can ensure that the active power output of the balance machine in the system is within a rated range, and can also maintain the output cost of the generator at a lower level.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

fig. 1 is a schematic flow diagram of a new energy consumption power scheduling method based on artificial intelligence according to the present invention.

FIG. 2 is an environment initialization diagram of the present invention.

Fig. 3 is a PPO algorithm flowchart of the present invention.

Detailed Description

The invention is further illustrated by the following figures and examples.

Referring to fig. 1 to 3, a new energy consumption power scheduling method based on artificial intelligence includes:

the method comprises the following steps of S1, constructing the power grid active optimal power flow control into an active optimal scheduling on-line model of the power system, and constructing the power grid active optimal power flow control into the active optimal scheduling on-line model of the power system based on a Markov decision process in the embodiment.

S2, based on a PPO algorithm of a deep reinforcement learning framework, an agent of the power system active optimization scheduling online model gradually improves own actions through interaction with the environment to obtain maximum rewards: the total operation cost of a power grid with high-proportion new energy blended is minimum so as to train an active optimal scheduling online model; the deep reinforcement learning framework of the active optimization scheduling online model comprises states, actions, rewards, state transitions and discount factors; an agent is an entity that interacts with the grid environment.

The PPO algorithm comprises a Critic _ network and two Actor networks, wherein the two Actor networks are respectively an Old _ operator and a New _ operator. In an Episode, an Agent first interacts with the environment using the existing active optimization scheduling policy Pi to obtain Batch data, during which the Actor and Critic networks are not optimized. After a complete Batch data is obtained, the Actor _ network and the Critic _ network start to learn the complete Batch data.

Specifically, the starting of learning the complete Batch data by the Actor _ network and the critical _ network includes: the Critic network calculates the state value through an active optimization scheduling online model neural network, the Actor network iteratively updates parameters of a strategy function by using the state value to further select actions and obtain feedback and new states, the Critic network updates neural network parameters by using the feedback and the new states, and the Actor network is helped to calculate more accurate state/action value by using the new network parameters. The problem that the basic Policy-based Policy Gradient algorithm is sensitive to the step length is solved during training, and small-batch updating can be realized in multiple training steps. Compared with the traditional method, the method has certain improvement in solving speed and solving precision.

In this embodiment, in each epoch, the agent interacts with the environment, and stores the obtained state, action, and reward as a tuple in the experience pool, and starts training the active optimization scheduling online model when the tuple in the experience pool satisfies a certain number.

In this embodiment, when the policy function is updated, the step size of policy update is limited by using KL divergence. When a strategy function (self neural network) is updated, in order to prevent the distribution of the two strategy functions from being too different, the step length of strategy updating is limited by using a KL divergence mode.

In the embodiment, the historical data is fully utilized, the true weight of each action is reflected, the relative weight of each action is obtained by an importance sampling mode,

for a variable x subject to a probability p distribution, the expectation of a function f (x) of x is estimated, and since the distribution of p is unknown, data reuse can be achieved by sampling from a known distribution q, converting the expectation of f (x) for the distribution p distribution to an expectation relative to another distribution q.

And S3, carrying out online decision making on the active optimal scheduling online model according to the real-time power grid operation data, and carrying out updating optimization aiming at the maximization of the reward of the intelligent agent to obtain the minimized power generation cost. And (3) performing online decision, namely calculating the active control strategy of the power grid in real time according to the real-time operation data of the power grid.

The method for distributing the active power of the power system reasonably can minimize the operation cost of the power system. And training the intelligent neural network based on a deep reinforcement learning framework, and realizing real-time control on active optimal scheduling of the power system.

FIG. 2 is an environment initialization diagram of the present invention. As shown in fig. 2, in the simulation process, the environment is initialized, and pandapower is used to construct the environment and read data, wherein the occupancy of new energy and other data necessary for calculating the power flow are considered in the data. The intelligent agent aims to minimize the unit operation cost, maximize the reward, express the cost in a quadratic function mode and impose certain constraint conditions for ensuring the safe operation of the power grid, and meanwhile, consider the uncertain condition of random disconnection in the power grid.

In the face of the volatility of high-proportion new energy and the uncertainty of the environment, the optimal operation condition with the minimum power generation cost is searched by using a deep reinforcement learning method. The proposed DRL-based approach is a better approach to the scheduling problem. Fig. 3 is a PPO algorithm flowchart of the present invention. As shown in fig. 3, inputting a status to the Actor-New network results in two values representing a normal distribution of actions: mu and sigma. And sampling the action, interacting with the environment to obtain a reward and a next state, and cycling the process and then storing the process. And inputting the state obtained in the last step after the circulation and all the stored states into the critic network computing value, and performing back propagation to update the network parameters. The new energy consumption power scheduling method based on artificial intelligence specifically comprises the following steps:

and S11, inputting the environment information S into an Actor _ new network to obtain a mean value mu and a variance sigma representing the motion distribution, constructing a normal distribution, and then sampling the motion, thereby realizing the purpose of solving the continuous motion problem by using the network.

S12, inputting the sampled actions into the environment to obtain the reward and the next ActionThe state of one step is then stored in an experience pool ((s) _t ,a _t ,r _t ,s _t+1 ) Then to the state s of the next step _t+1 Step S11 is executed until a complete Batch data is obtained, and step S13 is executed;

s13, inputting the state into Critic _ network to obtain the state value, calculating the reward, obtaining the value calculation advantage estimation function of all the states

S14, updating the parameter of the critical _ network according to the calculated dominance function and loss back propagation obtained after root mean square is carried out;

s15, inputting S and a of the experience pool into an Actor _ new and an Actor _ old respectively to obtain normal distributions N1 and N2 and P1 and P2, calculating a proportion P2/P1 of inportant sampling, and measuring and ensuring that the action distribution difference cannot be too large (smaller than M and M larger than 0) by calculating the proportion P2/P1 and adopting a KL divergence mode; step S15 implements significance sampling and uses KL divergence to measure distribution. When KL [ pi ] _old |π _θ ]>β _high KL _target Increasing β discourages large scale updates of parameter θ.

S16, updating parameters of the Actor _ network obtained after root mean square is carried out according to the calculated advantage function, and calculating more accurate state/action value, wherein the calculation formula is as follows:

The above-mentioned embodiments are preferred embodiments of the present invention, and the present invention is not limited thereto, and any other modifications or equivalent substitutions that do not depart from the technical spirit of the present invention are included in the scope of the present invention.

Claims

1. A new energy consumption electric power scheduling method based on artificial intelligence is characterized by comprising the following steps:

s2, based on a PPO algorithm of a deep reinforcement learning frame, gradually improving own actions of an intelligent agent of the active optimal scheduling online model of the power system through interaction with the environment to obtain the maximum reward, so as to train the active optimal scheduling online model; the deep reinforcement learning framework of the active optimization scheduling online model comprises states, actions and rewards;

and S3, carrying out online decision making on the active optimal scheduling online model according to the real-time power grid operation data, and carrying out updating optimization aiming at the maximization of the reward of the intelligent agent to obtain the minimized power generation cost.

2. The New energy consumption power scheduling method based on artificial intelligence as claimed in claim 1, wherein the PPO algorithm comprises a Critic _ network and two Actor networks, the two Actor networks being Old _ Actor and New _ Actor respectively.

3. The new energy consumption power scheduling method based on artificial intelligence according to claim 2, wherein in an Episode, the agent first uses the existing active optimization scheduling policy Pi to interact with the environment to obtain data of a Batch, and after obtaining a complete Batch, the Actor _ network and the critical _ network start learning the complete Batch data.

4. The artificial intelligence based new energy consumption power scheduling method of claim 3, wherein the starting of learning the complete Batch data by the Actor _ network and the critical _ network comprises: the Critic network calculates a state value through an active optimization scheduling online model neural network, the Actor network iteratively updates parameters of the neural network by using the state value to further select an action and obtain a feedback and a new state, the Critic network updates the neural network parameters by using the feedback and the new state, and the Actor network is helped to calculate a more accurate state/action value by using the new network parameters.

5. The method of claim 4, wherein each Episode event, agent interacts with the environment and stores the obtained state, action, and reward as a tuple in an experience pool.

6. The new energy consumption power scheduling method based on artificial intelligence according to claim 4, wherein when the policy function is updated, the step size of the policy update is limited by using KL divergence.

7. The new energy consumption power scheduling method based on artificial intelligence as claimed in claim 4, wherein the relative weight of each action is obtained by sampling the importance,

and converting the expected value of the distribution f (x) for the distribution p into an expected value relative to another distribution q, so as to realize the reutilization of the data.

8. The new energy consumption power scheduling method based on artificial intelligence according to claim 1, wherein the new energy consumption power scheduling method based on artificial intelligence specifically comprises the steps of:

s12, inputting the sampled Action into the environment to obtain the reward and the state of the next step, and then storing in an experience pool ((S) _t ,a _t ,r _t ,s _t+1 ) Then for the next state s _t+1 Executing the step S11 until a complete Batch data is obtained, and executing the step S13;

s13, inputting the state into Critic _ networkk obtaining state value, calculating reward, obtaining value of all states, calculating advantage estimation function

s15, inputting S and a of the experience pool into an Actor _ new and an Actor _ old respectively to obtain normal distributions N1 and N2 and P1 and P2, calculating a proportion P2/P1 of the inportant sampling, and measuring and ensuring that an action distribution difference is smaller than M and M is larger than 0 by calculating the proportion P2/P1 and adopting a KL divergence mode;

9. The new energy consumption power dispatching method based on artificial intelligence of claim 1, wherein the power grid active optimal power flow control is constructed as an active optimal dispatching on-line model of the power system based on Markov decision process.

10. The artificial intelligence based new energy consumption power scheduling method according to claim 1, wherein the deep reinforcement learning framework of the active optimization scheduling online model further comprises: state transitions and discount factors.