CN115345380A - New energy consumption electric power scheduling method based on artificial intelligence - Google Patents

New energy consumption electric power scheduling method based on artificial intelligence Download PDF

Info

Publication number
CN115345380A
CN115345380A CN202211062806.4A CN202211062806A CN115345380A CN 115345380 A CN115345380 A CN 115345380A CN 202211062806 A CN202211062806 A CN 202211062806A CN 115345380 A CN115345380 A CN 115345380A
Authority
CN
China
Prior art keywords
power
network
active
actor
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211062806.4A
Other languages
Chinese (zh)
Inventor
郭骏
郭磊
张勇
宁剑
郭万舒
李敏
王艺博
陈茂源
胡满
喻乐
訾鹏
刘健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China Grid Co Ltd
Original Assignee
North China Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Grid Co Ltd filed Critical North China Grid Co Ltd
Priority to CN202211062806.4A priority Critical patent/CN115345380A/en
Publication of CN115345380A publication Critical patent/CN115345380A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/007Arrangements for selectively connecting the load or loads to one or several among a plurality of power lines or power sources
    • H02J3/0075Arrangements for selectively connecting the load or loads to one or several among a plurality of power lines or power sources for providing alternative feeding paths between load and source according to economic or energy efficiency considerations, e.g. economic dispatch
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/48Controlling the sharing of the in-phase component
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Power Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a new energy consumption power scheduling method based on artificial intelligence, which comprises the following steps: the method comprises the steps of constructing power grid active optimal power flow control into an active optimal scheduling on-line model of a power system, and training the active optimal scheduling on-line model based on a PPO (polyphenylene oxide) algorithm of a deep reinforcement learning framework; the deep reinforcement learning framework of the active optimization scheduling online model comprises states, actions and rewards; and performing online decision according to the real-time power grid operation data, and updating and optimizing the real-time power grid operation data with the aim of maximizing the reward of the intelligent agent to obtain the minimum power generation cost. According to the invention, by designing the interactive training framework of 'body-action-reward', an active power optimization scheduling online model of the power system is obtained, the optimal output control of the generator can be made in real time, and the output cost of the generator of the system is reduced under the condition of meeting the operation constraint of the power system.

Description

New energy consumption power scheduling method based on artificial intelligence
Technical Field
The invention relates to the technical field of power dispatching, in particular to a new energy consumption power dispatching method based on artificial intelligence.
Background
In recent years, with the rapid development of the electric power industry in China, the access proportion of renewable energy sources such as wind power and photovoltaic is continuously improved, and the proportion of the new energy sources in the total generated energy of an electric power system is also increasing. The fluctuation of new energy can bring great challenges to the safe and reliable operation of the power system, which has higher requirements on the real-time active power optimization scheduling of the power system. The acceptance capacity of the power system to the fluctuating new energy can be improved through a reasonable scheduling means, and the safe, reliable and economic operation of the power system is ensured.
The economic dispatching of the power system aims to adjust the active output of each generator and minimize the output cost of the generator under the condition of fully meeting the safe operation constraint of a power grid. The active optimal scheduling of a modern power system often comprises a plurality of different variables and a plurality of constraints, and is a typical nonlinear and high-dimensional problem. However, the traditional scheduling model is slow in solving speed to a certain extent, and along with the increase of the scale of the power system and the penetration of new energy, the traditional solving method model has certain errors, and the control requirement under the existing novel power system cannot be met. In the active power scheduling optimization research in the conventional power system, the commonly used calculation methods can be divided into three categories: mathematical methods, planning algorithms, heuristic algorithms. The methods have the problems of low calculation speed, easy falling into local optimization, dependence on models and prediction data and the like. With the increase of the scale of the power distribution network, the increase of the number of power electronic devices and the penetration of new energy, the complexity of solving the active optimization scheduling problem by the traditional method is greatly improved, and the method is not suitable for the active optimization scheduling of online control. Specifically, the conventional method for solving the active power optimization scheduling has time urgency in achieving convergence, and particularly when the system scale is larger and the new energy power generation ratio is gradually increased, meanwhile, the conventional idea finds the optimal solution through a model according to the state of the current time section, but cannot solve the optimal control under the continuous time section.
In recent years, artificial intelligence and data drive related technologies are promoted, so that an optimization method based on artificial intelligence is widely applied to a power system. The power system active power optimization scheduling problem can be modeled into a given power load value, and a sequential decision problem of the optimal generator power combination is searched. The deep reinforcement learning algorithm combines the excellent characterization capability of deep learning and the excellent decision-making capability of reinforcement learning, and shows good capability in solving continuous states and action spaces, so that a method for solving economic dispatch by adopting deep reinforcement learning is urgently needed to be developed in the industry.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a new energy consumption power scheduling method based on artificial intelligence, which can promote new energy consumption.
The purpose of the invention is realized by the following technical scheme:
a new energy consumption power scheduling method based on artificial intelligence comprises the following steps:
s1, constructing the power grid active optimal power flow control into an active optimal scheduling on-line model of a power system,
s2, based on a PPO algorithm of a deep reinforcement learning framework, an intelligent agent of the active optimization scheduling online model of the power system gradually improves own actions through interaction with the environment to obtain maximum rewards so as to train the active optimization scheduling online model; the deep reinforcement learning framework of the active optimization scheduling online model comprises states, actions and rewards;
and S3, carrying out online decision on the active optimal scheduling online model according to the real-time power grid operation data, and carrying out updating optimization aiming at the maximization of the reward of the intelligent agent to obtain the minimized power generation cost.
Preferably, the PPO algorithm includes one Critic _ network and two Actor networks, which are Old _ activator and New _ activator, respectively.
Preferably, in an epicode, the agent first uses the existing active optimization scheduling policy Pi to interact with the environment to obtain data of a Batch, and after obtaining a complete Batch, the Actor _ network and the critical _ network start learning the complete Batch data.
Preferably, the starting of learning the complete Batch data by the Actor _ network and the critical _ network includes: the Critic network calculates the state value through an active optimization scheduling online model neural network, the Actor network iteratively updates parameters of a strategy function by using the state value so as to select an action and obtain a feedback and a new state, the Critic network updates the neural network parameters by using the feedback and the new state and helps the Actor network to calculate more accurate state/action value by using the new network parameter.
Preferably, each time an agent interacts with the environment, the agent saves the acquired state, action, and reward as a tuple in the experience pool.
Preferably, when the strategy function is updated, the step size of strategy update is limited by means of KL divergence.
Preferably, the relative weight of each action is obtained by means of importance sampling,
Figure BDA0003826956860000031
and converting the expected value of f (x) for the distribution p into the expected value relative to another distribution q, thereby realizing the reutilization of the data.
Preferably, the new energy consumption power scheduling method based on artificial intelligence specifically includes the following steps:
s11, inputting state information in an initialized and constructed environment into an Actor _ newetwork to obtain a mean value mu and a variance sigma representing motion distribution, constructing a normal distribution and then sampling motions;
s12, inputting the sampled actions into the environment to obtain the reward and the state of the next step, and then storing the reward and the state of the next step in an experience pool ((S) t ,a t ,r t ,s t+1 ) Then for the next state s t+1 Step S11 is executed until a complete Batch data is obtained, and step S13 is executed;
s13, inputting the state into a critical _ network to obtain the state value, calculating the reward, obtaining the value of all the states, and calculating the advantage estimation function
Figure BDA0003826956860000041
S14, updating the parameter of the critical _ network according to the calculated dominance function and loss back propagation obtained after root mean square;
s15, inputting S and a of the experience pool into an Actor _ new and an Actor _ old respectively to obtain normal distributions N1 and N2 and P1 and P2, calculating a proportion P2/P1 of inportant sampling, and measuring and ensuring that the action distribution difference is smaller than M and M is larger than 0 by calculating the proportion P2/P1 and adopting a KL divergence mode;
s16, updating the parameters of the Actor _ network according to the calculated dominance function and the loss obtained after the root mean square, and calculating more accurate state/action value, wherein the calculation formula is as follows:
Figure BDA0003826956860000042
wherein pi (a) t |s t ) Is the probability of taking action a in the current state.
Preferably, the power grid active optimal power flow control is constructed into an active optimal scheduling online model of the power system based on a Markov decision process.
Preferably, the deep reinforcement learning framework of the active optimization scheduling online model further includes: state transitions and discount factors.
Compared with the prior art, the invention has the following advantages:
according to the real-time power grid state, the invention adopts an artificial intelligence method to carry out optimization control on the active power of the power system, carries out reasonable economic dispatching on the power system, and minimizes the operation cost of the power system under the condition of meeting the basic constraint of the power system, and specifically comprises the following steps:
(1) By designing a body-action-reward interactive training framework, an active power optimization scheduling online model of the power system is obtained, particularly when a large-scale high-proportion new energy power system is faced, optimal output control of a generator can be made in real time, and output cost of the generator of the system is reduced under the condition that operation constraint of the power system is met.
(2) The problem that the Policy Gradient algorithm is sensitive to the step length is solved, small-batch updating is achieved in multiple training steps, the experience pool is introduced, the data utilization rate is improved, and the method and the device are suitable for scenes of continuous action spaces.
(3) According to the method, under a deep reinforcement learning framework, an environment modeling is carried out on the optimal load flow calculation problem of the power system under different load levels according to a Markov decision process. Meanwhile, a training scheme of the optimal power flow calculation automatic adjustment model is integrally designed and a final model is obtained through training. Simulation experiments prove that the method can automatically provide a power grid optimal power flow calculation adjustment scheme under different load levels, can ensure that the active power output of the balance machine in the system is within a rated range, and can also maintain the output cost of the generator at a lower level.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
fig. 1 is a schematic flow diagram of a new energy consumption power scheduling method based on artificial intelligence according to the present invention.
FIG. 2 is an environment initialization diagram of the present invention.
Fig. 3 is a PPO algorithm flowchart of the present invention.
Detailed Description
The invention is further illustrated by the following figures and examples.
Referring to fig. 1 to 3, a new energy consumption power scheduling method based on artificial intelligence includes:
the method comprises the following steps of S1, constructing the power grid active optimal power flow control into an active optimal scheduling on-line model of the power system, and constructing the power grid active optimal power flow control into the active optimal scheduling on-line model of the power system based on a Markov decision process in the embodiment.
S2, based on a PPO algorithm of a deep reinforcement learning framework, an agent of the power system active optimization scheduling online model gradually improves own actions through interaction with the environment to obtain maximum rewards: the total operation cost of a power grid with high-proportion new energy blended is minimum so as to train an active optimal scheduling online model; the deep reinforcement learning framework of the active optimization scheduling online model comprises states, actions, rewards, state transitions and discount factors; an agent is an entity that interacts with the grid environment.
The PPO algorithm comprises a Critic _ network and two Actor networks, wherein the two Actor networks are respectively an Old _ operator and a New _ operator. In an Episode, an Agent first interacts with the environment using the existing active optimization scheduling policy Pi to obtain Batch data, during which the Actor and Critic networks are not optimized. After a complete Batch data is obtained, the Actor _ network and the Critic _ network start to learn the complete Batch data.
Specifically, the starting of learning the complete Batch data by the Actor _ network and the critical _ network includes: the Critic network calculates the state value through an active optimization scheduling online model neural network, the Actor network iteratively updates parameters of a strategy function by using the state value to further select actions and obtain feedback and new states, the Critic network updates neural network parameters by using the feedback and the new states, and the Actor network is helped to calculate more accurate state/action value by using the new network parameters. The problem that the basic Policy-based Policy Gradient algorithm is sensitive to the step length is solved during training, and small-batch updating can be realized in multiple training steps. Compared with the traditional method, the method has certain improvement in solving speed and solving precision.
In this embodiment, in each epoch, the agent interacts with the environment, and stores the obtained state, action, and reward as a tuple in the experience pool, and starts training the active optimization scheduling online model when the tuple in the experience pool satisfies a certain number.
In this embodiment, when the policy function is updated, the step size of policy update is limited by using KL divergence. When a strategy function (self neural network) is updated, in order to prevent the distribution of the two strategy functions from being too different, the step length of strategy updating is limited by using a KL divergence mode.
In the embodiment, the historical data is fully utilized, the true weight of each action is reflected, the relative weight of each action is obtained by an importance sampling mode,
Figure BDA0003826956860000061
for a variable x subject to a probability p distribution, the expectation of a function f (x) of x is estimated, and since the distribution of p is unknown, data reuse can be achieved by sampling from a known distribution q, converting the expectation of f (x) for the distribution p distribution to an expectation relative to another distribution q.
And S3, carrying out online decision making on the active optimal scheduling online model according to the real-time power grid operation data, and carrying out updating optimization aiming at the maximization of the reward of the intelligent agent to obtain the minimized power generation cost. And (3) performing online decision, namely calculating the active control strategy of the power grid in real time according to the real-time operation data of the power grid.
The method for distributing the active power of the power system reasonably can minimize the operation cost of the power system. And training the intelligent neural network based on a deep reinforcement learning framework, and realizing real-time control on active optimal scheduling of the power system.
FIG. 2 is an environment initialization diagram of the present invention. As shown in fig. 2, in the simulation process, the environment is initialized, and pandapower is used to construct the environment and read data, wherein the occupancy of new energy and other data necessary for calculating the power flow are considered in the data. The intelligent agent aims to minimize the unit operation cost, maximize the reward, express the cost in a quadratic function mode and impose certain constraint conditions for ensuring the safe operation of the power grid, and meanwhile, consider the uncertain condition of random disconnection in the power grid.
In the face of the volatility of high-proportion new energy and the uncertainty of the environment, the optimal operation condition with the minimum power generation cost is searched by using a deep reinforcement learning method. The proposed DRL-based approach is a better approach to the scheduling problem. Fig. 3 is a PPO algorithm flowchart of the present invention. As shown in fig. 3, inputting a status to the Actor-New network results in two values representing a normal distribution of actions: mu and sigma. And sampling the action, interacting with the environment to obtain a reward and a next state, and cycling the process and then storing the process. And inputting the state obtained in the last step after the circulation and all the stored states into the critic network computing value, and performing back propagation to update the network parameters. The new energy consumption power scheduling method based on artificial intelligence specifically comprises the following steps:
and S11, inputting the environment information S into an Actor _ new network to obtain a mean value mu and a variance sigma representing the motion distribution, constructing a normal distribution, and then sampling the motion, thereby realizing the purpose of solving the continuous motion problem by using the network.
S12, inputting the sampled actions into the environment to obtain the reward and the next ActionThe state of one step is then stored in an experience pool ((s) t ,a t ,r t ,s t+1 ) Then to the state s of the next step t+1 Step S11 is executed until a complete Batch data is obtained, and step S13 is executed;
s13, inputting the state into Critic _ network to obtain the state value, calculating the reward, obtaining the value calculation advantage estimation function of all the states
Figure BDA0003826956860000081
S14, updating the parameter of the critical _ network according to the calculated dominance function and loss back propagation obtained after root mean square is carried out;
s15, inputting S and a of the experience pool into an Actor _ new and an Actor _ old respectively to obtain normal distributions N1 and N2 and P1 and P2, calculating a proportion P2/P1 of inportant sampling, and measuring and ensuring that the action distribution difference cannot be too large (smaller than M and M larger than 0) by calculating the proportion P2/P1 and adopting a KL divergence mode; step S15 implements significance sampling and uses KL divergence to measure distribution. When KL [ pi ] oldθ ]>β high KL target Increasing β discourages large scale updates of parameter θ.
S16, updating parameters of the Actor _ network obtained after root mean square is carried out according to the calculated advantage function, and calculating more accurate state/action value, wherein the calculation formula is as follows:
Figure BDA0003826956860000082
wherein pi (a) t |s t ) Is the probability of taking action a in the current state.
The above-mentioned embodiments are preferred embodiments of the present invention, and the present invention is not limited thereto, and any other modifications or equivalent substitutions that do not depart from the technical spirit of the present invention are included in the scope of the present invention.

Claims (10)

1. A new energy consumption electric power scheduling method based on artificial intelligence is characterized by comprising the following steps:
s1, constructing the power grid active optimal power flow control into an active optimal scheduling on-line model of a power system,
s2, based on a PPO algorithm of a deep reinforcement learning frame, gradually improving own actions of an intelligent agent of the active optimal scheduling online model of the power system through interaction with the environment to obtain the maximum reward, so as to train the active optimal scheduling online model; the deep reinforcement learning framework of the active optimization scheduling online model comprises states, actions and rewards;
and S3, carrying out online decision making on the active optimal scheduling online model according to the real-time power grid operation data, and carrying out updating optimization aiming at the maximization of the reward of the intelligent agent to obtain the minimized power generation cost.
2. The New energy consumption power scheduling method based on artificial intelligence as claimed in claim 1, wherein the PPO algorithm comprises a Critic _ network and two Actor networks, the two Actor networks being Old _ Actor and New _ Actor respectively.
3. The new energy consumption power scheduling method based on artificial intelligence according to claim 2, wherein in an Episode, the agent first uses the existing active optimization scheduling policy Pi to interact with the environment to obtain data of a Batch, and after obtaining a complete Batch, the Actor _ network and the critical _ network start learning the complete Batch data.
4. The artificial intelligence based new energy consumption power scheduling method of claim 3, wherein the starting of learning the complete Batch data by the Actor _ network and the critical _ network comprises: the Critic network calculates a state value through an active optimization scheduling online model neural network, the Actor network iteratively updates parameters of the neural network by using the state value to further select an action and obtain a feedback and a new state, the Critic network updates the neural network parameters by using the feedback and the new state, and the Actor network is helped to calculate a more accurate state/action value by using the new network parameters.
5. The method of claim 4, wherein each Episode event, agent interacts with the environment and stores the obtained state, action, and reward as a tuple in an experience pool.
6. The new energy consumption power scheduling method based on artificial intelligence according to claim 4, wherein when the policy function is updated, the step size of the policy update is limited by using KL divergence.
7. The new energy consumption power scheduling method based on artificial intelligence as claimed in claim 4, wherein the relative weight of each action is obtained by sampling the importance,
Figure FDA0003826956850000021
and converting the expected value of the distribution f (x) for the distribution p into an expected value relative to another distribution q, so as to realize the reutilization of the data.
8. The new energy consumption power scheduling method based on artificial intelligence according to claim 1, wherein the new energy consumption power scheduling method based on artificial intelligence specifically comprises the steps of:
s11, inputting state information in an initialized and constructed environment into an Actor _ newetwork to obtain a mean value mu and a variance sigma representing motion distribution, constructing a normal distribution and then sampling motions;
s12, inputting the sampled Action into the environment to obtain the reward and the state of the next step, and then storing in an experience pool ((S) t ,a t ,r t ,s t+1 ) Then for the next state s t+1 Executing the step S11 until a complete Batch data is obtained, and executing the step S13;
s13, inputting the state into Critic _ networkk obtaining state value, calculating reward, obtaining value of all states, calculating advantage estimation function
Figure FDA0003826956850000022
S14, updating the parameter of the critical _ network according to the calculated dominance function and loss back propagation obtained after root mean square;
s15, inputting S and a of the experience pool into an Actor _ new and an Actor _ old respectively to obtain normal distributions N1 and N2 and P1 and P2, calculating a proportion P2/P1 of the inportant sampling, and measuring and ensuring that an action distribution difference is smaller than M and M is larger than 0 by calculating the proportion P2/P1 and adopting a KL divergence mode;
s16, updating the parameters of the Actor _ network according to the calculated dominance function and the loss obtained after the root mean square, and calculating more accurate state/action value, wherein the calculation formula is as follows:
Figure FDA0003826956850000031
wherein pi (a) t |s t ) Is the probability of taking action a in the current state.
9. The new energy consumption power dispatching method based on artificial intelligence of claim 1, wherein the power grid active optimal power flow control is constructed as an active optimal dispatching on-line model of the power system based on Markov decision process.
10. The artificial intelligence based new energy consumption power scheduling method according to claim 1, wherein the deep reinforcement learning framework of the active optimization scheduling online model further comprises: state transitions and discount factors.
CN202211062806.4A 2022-09-01 2022-09-01 New energy consumption electric power scheduling method based on artificial intelligence Pending CN115345380A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211062806.4A CN115345380A (en) 2022-09-01 2022-09-01 New energy consumption electric power scheduling method based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211062806.4A CN115345380A (en) 2022-09-01 2022-09-01 New energy consumption electric power scheduling method based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN115345380A true CN115345380A (en) 2022-11-15

Family

ID=83955053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211062806.4A Pending CN115345380A (en) 2022-09-01 2022-09-01 New energy consumption electric power scheduling method based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN115345380A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738874A (en) * 2023-05-12 2023-09-12 珠江水利委员会珠江水利科学研究院 Gate pump group joint optimization scheduling method based on Multi-Agent PPO reinforcement learning
CN117335414A (en) * 2023-11-24 2024-01-02 杭州鸿晟电力设计咨询有限公司 Method, device, equipment and medium for deciding alternating current optimal power flow of power system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738874A (en) * 2023-05-12 2023-09-12 珠江水利委员会珠江水利科学研究院 Gate pump group joint optimization scheduling method based on Multi-Agent PPO reinforcement learning
CN116738874B (en) * 2023-05-12 2024-01-23 珠江水利委员会珠江水利科学研究院 Gate pump group joint optimization scheduling method based on Multi-Agent PPO reinforcement learning
CN117335414A (en) * 2023-11-24 2024-01-02 杭州鸿晟电力设计咨询有限公司 Method, device, equipment and medium for deciding alternating current optimal power flow of power system
CN117335414B (en) * 2023-11-24 2024-02-27 杭州鸿晟电力设计咨询有限公司 Method, device, equipment and medium for deciding alternating current optimal power flow of power system

Similar Documents

Publication Publication Date Title
CN112615379B (en) Power grid multi-section power control method based on distributed multi-agent reinforcement learning
CN114725936B (en) Power distribution network optimization method based on multi-agent deep reinforcement learning
CN115345380A (en) New energy consumption electric power scheduling method based on artificial intelligence
Song et al. Energy capture efficiency enhancement of wind turbines via stochastic model predictive yaw control based on intelligent scenarios generation
CN110854932B (en) Multi-time scale optimization scheduling method and system for AC/DC power distribution network
CN112507614B (en) Comprehensive optimization method for power grid in distributed power supply high-permeability area
CN113363998A (en) Power distribution network voltage control method based on multi-agent deep reinforcement learning
CN115293052A (en) Power system active power flow online optimization control method, storage medium and device
CN116760047A (en) Power distribution network voltage reactive power control method and system based on safety reinforcement learning algorithm
CN116468159A (en) Reactive power optimization method based on dual-delay depth deterministic strategy gradient
CN114566971A (en) Real-time optimal power flow calculation method based on near-end strategy optimization algorithm
CN115795992A (en) Park energy Internet online scheduling method based on virtual deduction of operation situation
CN117833263A (en) New energy power grid voltage control method and system based on DDPG
US20230344242A1 (en) Method for automatic adjustment of power grid operation mode base on reinforcement learning
CN111799820A (en) Double-layer intelligent hybrid zero-star cloud energy storage countermeasure regulation and control method for power system
CN115912367A (en) Intelligent generation method for operation mode of power system based on deep reinforcement learning
CN116865270A (en) Optimal scheduling method and system for flexible interconnection power distribution network containing embedded direct current
CN115360768A (en) Power scheduling method and device based on muzero and deep reinforcement learning and storage medium
CN116454927A (en) Power grid two-stage online scheduling method, system and equipment based on shared energy storage
CN114048576A (en) Intelligent control method for energy storage system for stabilizing power grid transmission section tide
Tongyu et al. Based on deep reinforcement learning algorithm, energy storage optimization and loss reduction strategy for distribution network with high proportion of distributed generation
WO2024060344A1 (en) Data-physics fusion-driven adaptive voltage control system for flexible power distribution system
CN117117989A (en) Deep reinforcement learning solving method for unit combination
CN117394446A (en) Multi-stage robust unit combination method and device based on sequential evolution of batch scenes
CN117674160A (en) Active power distribution network real-time voltage control method based on multi-agent deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination