CN115345380A - New energy consumption electric power scheduling method based on artificial intelligence - Google Patents
New energy consumption electric power scheduling method based on artificial intelligence Download PDFInfo
- Publication number
- CN115345380A CN115345380A CN202211062806.4A CN202211062806A CN115345380A CN 115345380 A CN115345380 A CN 115345380A CN 202211062806 A CN202211062806 A CN 202211062806A CN 115345380 A CN115345380 A CN 115345380A
- Authority
- CN
- China
- Prior art keywords
- power
- network
- active
- actor
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 23
- 238000005265 energy consumption Methods 0.000 title claims abstract description 21
- 230000009471 action Effects 0.000 claims abstract description 36
- 238000005457 optimization Methods 0.000 claims abstract description 30
- 230000002787 reinforcement Effects 0.000 claims abstract description 16
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 14
- 238000010248 power generation Methods 0.000 claims abstract description 6
- 238000009826 distribution Methods 0.000 claims description 27
- 230000006870 function Effects 0.000 claims description 17
- 238000005070 sampling Methods 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 9
- 230000033001 locomotion Effects 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 230000008901 benefit Effects 0.000 claims description 5
- 230000003993 interaction Effects 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 239000003795 chemical substances by application Substances 0.000 abstract description 12
- 238000012549 training Methods 0.000 abstract description 10
- 230000002452 interceptive effect Effects 0.000 abstract description 2
- 239000004721 Polyphenylene oxide Substances 0.000 abstract 2
- 229920006380 polyphenylene oxide Polymers 0.000 abstract 2
- 238000010586 diagram Methods 0.000 description 3
- 239000012190 activator Substances 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000035515 penetration Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012067 mathematical method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012887 quadratic function Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010977 unit operation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06312—Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/067—Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/007—Arrangements for selectively connecting the load or loads to one or several among a plurality of power lines or power sources
- H02J3/0075—Arrangements for selectively connecting the load or loads to one or several among a plurality of power lines or power sources for providing alternative feeding paths between load and source according to economic or energy efficiency considerations, e.g. economic dispatch
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/38—Arrangements for parallely feeding a single network by two or more generators, converters or transformers
- H02J3/46—Controlling of the sharing of output between the generators, converters, or transformers
- H02J3/48—Controlling the sharing of the in-phase component
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/10—Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/20—Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E40/00—Technologies for an efficient electrical power generation, transmission or distribution
- Y02E40/70—Smart grids as climate change mitigation technology in the energy generation sector
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Power Engineering (AREA)
- Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Primary Health Care (AREA)
- General Health & Medical Sciences (AREA)
- Water Supply & Treatment (AREA)
- Public Health (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention discloses a new energy consumption power scheduling method based on artificial intelligence, which comprises the following steps: the method comprises the steps of constructing power grid active optimal power flow control into an active optimal scheduling on-line model of a power system, and training the active optimal scheduling on-line model based on a PPO (polyphenylene oxide) algorithm of a deep reinforcement learning framework; the deep reinforcement learning framework of the active optimization scheduling online model comprises states, actions and rewards; and performing online decision according to the real-time power grid operation data, and updating and optimizing the real-time power grid operation data with the aim of maximizing the reward of the intelligent agent to obtain the minimum power generation cost. According to the invention, by designing the interactive training framework of 'body-action-reward', an active power optimization scheduling online model of the power system is obtained, the optimal output control of the generator can be made in real time, and the output cost of the generator of the system is reduced under the condition of meeting the operation constraint of the power system.
Description
Technical Field
The invention relates to the technical field of power dispatching, in particular to a new energy consumption power dispatching method based on artificial intelligence.
Background
In recent years, with the rapid development of the electric power industry in China, the access proportion of renewable energy sources such as wind power and photovoltaic is continuously improved, and the proportion of the new energy sources in the total generated energy of an electric power system is also increasing. The fluctuation of new energy can bring great challenges to the safe and reliable operation of the power system, which has higher requirements on the real-time active power optimization scheduling of the power system. The acceptance capacity of the power system to the fluctuating new energy can be improved through a reasonable scheduling means, and the safe, reliable and economic operation of the power system is ensured.
The economic dispatching of the power system aims to adjust the active output of each generator and minimize the output cost of the generator under the condition of fully meeting the safe operation constraint of a power grid. The active optimal scheduling of a modern power system often comprises a plurality of different variables and a plurality of constraints, and is a typical nonlinear and high-dimensional problem. However, the traditional scheduling model is slow in solving speed to a certain extent, and along with the increase of the scale of the power system and the penetration of new energy, the traditional solving method model has certain errors, and the control requirement under the existing novel power system cannot be met. In the active power scheduling optimization research in the conventional power system, the commonly used calculation methods can be divided into three categories: mathematical methods, planning algorithms, heuristic algorithms. The methods have the problems of low calculation speed, easy falling into local optimization, dependence on models and prediction data and the like. With the increase of the scale of the power distribution network, the increase of the number of power electronic devices and the penetration of new energy, the complexity of solving the active optimization scheduling problem by the traditional method is greatly improved, and the method is not suitable for the active optimization scheduling of online control. Specifically, the conventional method for solving the active power optimization scheduling has time urgency in achieving convergence, and particularly when the system scale is larger and the new energy power generation ratio is gradually increased, meanwhile, the conventional idea finds the optimal solution through a model according to the state of the current time section, but cannot solve the optimal control under the continuous time section.
In recent years, artificial intelligence and data drive related technologies are promoted, so that an optimization method based on artificial intelligence is widely applied to a power system. The power system active power optimization scheduling problem can be modeled into a given power load value, and a sequential decision problem of the optimal generator power combination is searched. The deep reinforcement learning algorithm combines the excellent characterization capability of deep learning and the excellent decision-making capability of reinforcement learning, and shows good capability in solving continuous states and action spaces, so that a method for solving economic dispatch by adopting deep reinforcement learning is urgently needed to be developed in the industry.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a new energy consumption power scheduling method based on artificial intelligence, which can promote new energy consumption.
The purpose of the invention is realized by the following technical scheme:
a new energy consumption power scheduling method based on artificial intelligence comprises the following steps:
s1, constructing the power grid active optimal power flow control into an active optimal scheduling on-line model of a power system,
s2, based on a PPO algorithm of a deep reinforcement learning framework, an intelligent agent of the active optimization scheduling online model of the power system gradually improves own actions through interaction with the environment to obtain maximum rewards so as to train the active optimization scheduling online model; the deep reinforcement learning framework of the active optimization scheduling online model comprises states, actions and rewards;
and S3, carrying out online decision on the active optimal scheduling online model according to the real-time power grid operation data, and carrying out updating optimization aiming at the maximization of the reward of the intelligent agent to obtain the minimized power generation cost.
Preferably, the PPO algorithm includes one Critic _ network and two Actor networks, which are Old _ activator and New _ activator, respectively.
Preferably, in an epicode, the agent first uses the existing active optimization scheduling policy Pi to interact with the environment to obtain data of a Batch, and after obtaining a complete Batch, the Actor _ network and the critical _ network start learning the complete Batch data.
Preferably, the starting of learning the complete Batch data by the Actor _ network and the critical _ network includes: the Critic network calculates the state value through an active optimization scheduling online model neural network, the Actor network iteratively updates parameters of a strategy function by using the state value so as to select an action and obtain a feedback and a new state, the Critic network updates the neural network parameters by using the feedback and the new state and helps the Actor network to calculate more accurate state/action value by using the new network parameter.
Preferably, each time an agent interacts with the environment, the agent saves the acquired state, action, and reward as a tuple in the experience pool.
Preferably, when the strategy function is updated, the step size of strategy update is limited by means of KL divergence.
Preferably, the relative weight of each action is obtained by means of importance sampling,and converting the expected value of f (x) for the distribution p into the expected value relative to another distribution q, thereby realizing the reutilization of the data.
Preferably, the new energy consumption power scheduling method based on artificial intelligence specifically includes the following steps:
s11, inputting state information in an initialized and constructed environment into an Actor _ newetwork to obtain a mean value mu and a variance sigma representing motion distribution, constructing a normal distribution and then sampling motions;
s12, inputting the sampled actions into the environment to obtain the reward and the state of the next step, and then storing the reward and the state of the next step in an experience pool ((S) t ,a t ,r t ,s t+1 ) Then for the next state s t+1 Step S11 is executed until a complete Batch data is obtained, and step S13 is executed;
s13, inputting the state into a critical _ network to obtain the state value, calculating the reward, obtaining the value of all the states, and calculating the advantage estimation function
S14, updating the parameter of the critical _ network according to the calculated dominance function and loss back propagation obtained after root mean square;
s15, inputting S and a of the experience pool into an Actor _ new and an Actor _ old respectively to obtain normal distributions N1 and N2 and P1 and P2, calculating a proportion P2/P1 of inportant sampling, and measuring and ensuring that the action distribution difference is smaller than M and M is larger than 0 by calculating the proportion P2/P1 and adopting a KL divergence mode;
s16, updating the parameters of the Actor _ network according to the calculated dominance function and the loss obtained after the root mean square, and calculating more accurate state/action value, wherein the calculation formula is as follows:wherein pi (a) t |s t ) Is the probability of taking action a in the current state.
Preferably, the power grid active optimal power flow control is constructed into an active optimal scheduling online model of the power system based on a Markov decision process.
Preferably, the deep reinforcement learning framework of the active optimization scheduling online model further includes: state transitions and discount factors.
Compared with the prior art, the invention has the following advantages:
according to the real-time power grid state, the invention adopts an artificial intelligence method to carry out optimization control on the active power of the power system, carries out reasonable economic dispatching on the power system, and minimizes the operation cost of the power system under the condition of meeting the basic constraint of the power system, and specifically comprises the following steps:
(1) By designing a body-action-reward interactive training framework, an active power optimization scheduling online model of the power system is obtained, particularly when a large-scale high-proportion new energy power system is faced, optimal output control of a generator can be made in real time, and output cost of the generator of the system is reduced under the condition that operation constraint of the power system is met.
(2) The problem that the Policy Gradient algorithm is sensitive to the step length is solved, small-batch updating is achieved in multiple training steps, the experience pool is introduced, the data utilization rate is improved, and the method and the device are suitable for scenes of continuous action spaces.
(3) According to the method, under a deep reinforcement learning framework, an environment modeling is carried out on the optimal load flow calculation problem of the power system under different load levels according to a Markov decision process. Meanwhile, a training scheme of the optimal power flow calculation automatic adjustment model is integrally designed and a final model is obtained through training. Simulation experiments prove that the method can automatically provide a power grid optimal power flow calculation adjustment scheme under different load levels, can ensure that the active power output of the balance machine in the system is within a rated range, and can also maintain the output cost of the generator at a lower level.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
fig. 1 is a schematic flow diagram of a new energy consumption power scheduling method based on artificial intelligence according to the present invention.
FIG. 2 is an environment initialization diagram of the present invention.
Fig. 3 is a PPO algorithm flowchart of the present invention.
Detailed Description
The invention is further illustrated by the following figures and examples.
Referring to fig. 1 to 3, a new energy consumption power scheduling method based on artificial intelligence includes:
the method comprises the following steps of S1, constructing the power grid active optimal power flow control into an active optimal scheduling on-line model of the power system, and constructing the power grid active optimal power flow control into the active optimal scheduling on-line model of the power system based on a Markov decision process in the embodiment.
S2, based on a PPO algorithm of a deep reinforcement learning framework, an agent of the power system active optimization scheduling online model gradually improves own actions through interaction with the environment to obtain maximum rewards: the total operation cost of a power grid with high-proportion new energy blended is minimum so as to train an active optimal scheduling online model; the deep reinforcement learning framework of the active optimization scheduling online model comprises states, actions, rewards, state transitions and discount factors; an agent is an entity that interacts with the grid environment.
The PPO algorithm comprises a Critic _ network and two Actor networks, wherein the two Actor networks are respectively an Old _ operator and a New _ operator. In an Episode, an Agent first interacts with the environment using the existing active optimization scheduling policy Pi to obtain Batch data, during which the Actor and Critic networks are not optimized. After a complete Batch data is obtained, the Actor _ network and the Critic _ network start to learn the complete Batch data.
Specifically, the starting of learning the complete Batch data by the Actor _ network and the critical _ network includes: the Critic network calculates the state value through an active optimization scheduling online model neural network, the Actor network iteratively updates parameters of a strategy function by using the state value to further select actions and obtain feedback and new states, the Critic network updates neural network parameters by using the feedback and the new states, and the Actor network is helped to calculate more accurate state/action value by using the new network parameters. The problem that the basic Policy-based Policy Gradient algorithm is sensitive to the step length is solved during training, and small-batch updating can be realized in multiple training steps. Compared with the traditional method, the method has certain improvement in solving speed and solving precision.
In this embodiment, in each epoch, the agent interacts with the environment, and stores the obtained state, action, and reward as a tuple in the experience pool, and starts training the active optimization scheduling online model when the tuple in the experience pool satisfies a certain number.
In this embodiment, when the policy function is updated, the step size of policy update is limited by using KL divergence. When a strategy function (self neural network) is updated, in order to prevent the distribution of the two strategy functions from being too different, the step length of strategy updating is limited by using a KL divergence mode.
In the embodiment, the historical data is fully utilized, the true weight of each action is reflected, the relative weight of each action is obtained by an importance sampling mode,for a variable x subject to a probability p distribution, the expectation of a function f (x) of x is estimated, and since the distribution of p is unknown, data reuse can be achieved by sampling from a known distribution q, converting the expectation of f (x) for the distribution p distribution to an expectation relative to another distribution q.
And S3, carrying out online decision making on the active optimal scheduling online model according to the real-time power grid operation data, and carrying out updating optimization aiming at the maximization of the reward of the intelligent agent to obtain the minimized power generation cost. And (3) performing online decision, namely calculating the active control strategy of the power grid in real time according to the real-time operation data of the power grid.
The method for distributing the active power of the power system reasonably can minimize the operation cost of the power system. And training the intelligent neural network based on a deep reinforcement learning framework, and realizing real-time control on active optimal scheduling of the power system.
FIG. 2 is an environment initialization diagram of the present invention. As shown in fig. 2, in the simulation process, the environment is initialized, and pandapower is used to construct the environment and read data, wherein the occupancy of new energy and other data necessary for calculating the power flow are considered in the data. The intelligent agent aims to minimize the unit operation cost, maximize the reward, express the cost in a quadratic function mode and impose certain constraint conditions for ensuring the safe operation of the power grid, and meanwhile, consider the uncertain condition of random disconnection in the power grid.
In the face of the volatility of high-proportion new energy and the uncertainty of the environment, the optimal operation condition with the minimum power generation cost is searched by using a deep reinforcement learning method. The proposed DRL-based approach is a better approach to the scheduling problem. Fig. 3 is a PPO algorithm flowchart of the present invention. As shown in fig. 3, inputting a status to the Actor-New network results in two values representing a normal distribution of actions: mu and sigma. And sampling the action, interacting with the environment to obtain a reward and a next state, and cycling the process and then storing the process. And inputting the state obtained in the last step after the circulation and all the stored states into the critic network computing value, and performing back propagation to update the network parameters. The new energy consumption power scheduling method based on artificial intelligence specifically comprises the following steps:
and S11, inputting the environment information S into an Actor _ new network to obtain a mean value mu and a variance sigma representing the motion distribution, constructing a normal distribution, and then sampling the motion, thereby realizing the purpose of solving the continuous motion problem by using the network.
S12, inputting the sampled actions into the environment to obtain the reward and the next ActionThe state of one step is then stored in an experience pool ((s) t ,a t ,r t ,s t+1 ) Then to the state s of the next step t+1 Step S11 is executed until a complete Batch data is obtained, and step S13 is executed;
s13, inputting the state into Critic _ network to obtain the state value, calculating the reward, obtaining the value calculation advantage estimation function of all the states
S14, updating the parameter of the critical _ network according to the calculated dominance function and loss back propagation obtained after root mean square is carried out;
s15, inputting S and a of the experience pool into an Actor _ new and an Actor _ old respectively to obtain normal distributions N1 and N2 and P1 and P2, calculating a proportion P2/P1 of inportant sampling, and measuring and ensuring that the action distribution difference cannot be too large (smaller than M and M larger than 0) by calculating the proportion P2/P1 and adopting a KL divergence mode; step S15 implements significance sampling and uses KL divergence to measure distribution. When KL [ pi ] old |π θ ]>β high KL target Increasing β discourages large scale updates of parameter θ.
S16, updating parameters of the Actor _ network obtained after root mean square is carried out according to the calculated advantage function, and calculating more accurate state/action value, wherein the calculation formula is as follows:wherein pi (a) t |s t ) Is the probability of taking action a in the current state.
The above-mentioned embodiments are preferred embodiments of the present invention, and the present invention is not limited thereto, and any other modifications or equivalent substitutions that do not depart from the technical spirit of the present invention are included in the scope of the present invention.
Claims (10)
1. A new energy consumption electric power scheduling method based on artificial intelligence is characterized by comprising the following steps:
s1, constructing the power grid active optimal power flow control into an active optimal scheduling on-line model of a power system,
s2, based on a PPO algorithm of a deep reinforcement learning frame, gradually improving own actions of an intelligent agent of the active optimal scheduling online model of the power system through interaction with the environment to obtain the maximum reward, so as to train the active optimal scheduling online model; the deep reinforcement learning framework of the active optimization scheduling online model comprises states, actions and rewards;
and S3, carrying out online decision making on the active optimal scheduling online model according to the real-time power grid operation data, and carrying out updating optimization aiming at the maximization of the reward of the intelligent agent to obtain the minimized power generation cost.
2. The New energy consumption power scheduling method based on artificial intelligence as claimed in claim 1, wherein the PPO algorithm comprises a Critic _ network and two Actor networks, the two Actor networks being Old _ Actor and New _ Actor respectively.
3. The new energy consumption power scheduling method based on artificial intelligence according to claim 2, wherein in an Episode, the agent first uses the existing active optimization scheduling policy Pi to interact with the environment to obtain data of a Batch, and after obtaining a complete Batch, the Actor _ network and the critical _ network start learning the complete Batch data.
4. The artificial intelligence based new energy consumption power scheduling method of claim 3, wherein the starting of learning the complete Batch data by the Actor _ network and the critical _ network comprises: the Critic network calculates a state value through an active optimization scheduling online model neural network, the Actor network iteratively updates parameters of the neural network by using the state value to further select an action and obtain a feedback and a new state, the Critic network updates the neural network parameters by using the feedback and the new state, and the Actor network is helped to calculate a more accurate state/action value by using the new network parameters.
5. The method of claim 4, wherein each Episode event, agent interacts with the environment and stores the obtained state, action, and reward as a tuple in an experience pool.
6. The new energy consumption power scheduling method based on artificial intelligence according to claim 4, wherein when the policy function is updated, the step size of the policy update is limited by using KL divergence.
7. The new energy consumption power scheduling method based on artificial intelligence as claimed in claim 4, wherein the relative weight of each action is obtained by sampling the importance,and converting the expected value of the distribution f (x) for the distribution p into an expected value relative to another distribution q, so as to realize the reutilization of the data.
8. The new energy consumption power scheduling method based on artificial intelligence according to claim 1, wherein the new energy consumption power scheduling method based on artificial intelligence specifically comprises the steps of:
s11, inputting state information in an initialized and constructed environment into an Actor _ newetwork to obtain a mean value mu and a variance sigma representing motion distribution, constructing a normal distribution and then sampling motions;
s12, inputting the sampled Action into the environment to obtain the reward and the state of the next step, and then storing in an experience pool ((S) t ,a t ,r t ,s t+1 ) Then for the next state s t+1 Executing the step S11 until a complete Batch data is obtained, and executing the step S13;
s13, inputting the state into Critic _ networkk obtaining state value, calculating reward, obtaining value of all states, calculating advantage estimation function
S14, updating the parameter of the critical _ network according to the calculated dominance function and loss back propagation obtained after root mean square;
s15, inputting S and a of the experience pool into an Actor _ new and an Actor _ old respectively to obtain normal distributions N1 and N2 and P1 and P2, calculating a proportion P2/P1 of the inportant sampling, and measuring and ensuring that an action distribution difference is smaller than M and M is larger than 0 by calculating the proportion P2/P1 and adopting a KL divergence mode;
s16, updating the parameters of the Actor _ network according to the calculated dominance function and the loss obtained after the root mean square, and calculating more accurate state/action value, wherein the calculation formula is as follows:wherein pi (a) t |s t ) Is the probability of taking action a in the current state.
9. The new energy consumption power dispatching method based on artificial intelligence of claim 1, wherein the power grid active optimal power flow control is constructed as an active optimal dispatching on-line model of the power system based on Markov decision process.
10. The artificial intelligence based new energy consumption power scheduling method according to claim 1, wherein the deep reinforcement learning framework of the active optimization scheduling online model further comprises: state transitions and discount factors.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211062806.4A CN115345380A (en) | 2022-09-01 | 2022-09-01 | New energy consumption electric power scheduling method based on artificial intelligence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211062806.4A CN115345380A (en) | 2022-09-01 | 2022-09-01 | New energy consumption electric power scheduling method based on artificial intelligence |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115345380A true CN115345380A (en) | 2022-11-15 |
Family
ID=83955053
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211062806.4A Pending CN115345380A (en) | 2022-09-01 | 2022-09-01 | New energy consumption electric power scheduling method based on artificial intelligence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115345380A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116738874A (en) * | 2023-05-12 | 2023-09-12 | 珠江水利委员会珠江水利科学研究院 | Gate pump group joint optimization scheduling method based on Multi-Agent PPO reinforcement learning |
CN117335414A (en) * | 2023-11-24 | 2024-01-02 | 杭州鸿晟电力设计咨询有限公司 | Method, device, equipment and medium for deciding alternating current optimal power flow of power system |
-
2022
- 2022-09-01 CN CN202211062806.4A patent/CN115345380A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116738874A (en) * | 2023-05-12 | 2023-09-12 | 珠江水利委员会珠江水利科学研究院 | Gate pump group joint optimization scheduling method based on Multi-Agent PPO reinforcement learning |
CN116738874B (en) * | 2023-05-12 | 2024-01-23 | 珠江水利委员会珠江水利科学研究院 | Gate pump group joint optimization scheduling method based on Multi-Agent PPO reinforcement learning |
CN117335414A (en) * | 2023-11-24 | 2024-01-02 | 杭州鸿晟电力设计咨询有限公司 | Method, device, equipment and medium for deciding alternating current optimal power flow of power system |
CN117335414B (en) * | 2023-11-24 | 2024-02-27 | 杭州鸿晟电力设计咨询有限公司 | Method, device, equipment and medium for deciding alternating current optimal power flow of power system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112615379B (en) | Power grid multi-section power control method based on distributed multi-agent reinforcement learning | |
CN114725936B (en) | Power distribution network optimization method based on multi-agent deep reinforcement learning | |
CN115345380A (en) | New energy consumption electric power scheduling method based on artificial intelligence | |
Song et al. | Energy capture efficiency enhancement of wind turbines via stochastic model predictive yaw control based on intelligent scenarios generation | |
CN110854932B (en) | Multi-time scale optimization scheduling method and system for AC/DC power distribution network | |
CN112507614B (en) | Comprehensive optimization method for power grid in distributed power supply high-permeability area | |
CN113363998A (en) | Power distribution network voltage control method based on multi-agent deep reinforcement learning | |
CN115293052A (en) | Power system active power flow online optimization control method, storage medium and device | |
CN116760047A (en) | Power distribution network voltage reactive power control method and system based on safety reinforcement learning algorithm | |
CN116468159A (en) | Reactive power optimization method based on dual-delay depth deterministic strategy gradient | |
CN114566971A (en) | Real-time optimal power flow calculation method based on near-end strategy optimization algorithm | |
CN115795992A (en) | Park energy Internet online scheduling method based on virtual deduction of operation situation | |
CN117833263A (en) | New energy power grid voltage control method and system based on DDPG | |
US20230344242A1 (en) | Method for automatic adjustment of power grid operation mode base on reinforcement learning | |
CN111799820A (en) | Double-layer intelligent hybrid zero-star cloud energy storage countermeasure regulation and control method for power system | |
CN115912367A (en) | Intelligent generation method for operation mode of power system based on deep reinforcement learning | |
CN116865270A (en) | Optimal scheduling method and system for flexible interconnection power distribution network containing embedded direct current | |
CN115360768A (en) | Power scheduling method and device based on muzero and deep reinforcement learning and storage medium | |
CN116454927A (en) | Power grid two-stage online scheduling method, system and equipment based on shared energy storage | |
CN114048576A (en) | Intelligent control method for energy storage system for stabilizing power grid transmission section tide | |
Tongyu et al. | Based on deep reinforcement learning algorithm, energy storage optimization and loss reduction strategy for distribution network with high proportion of distributed generation | |
WO2024060344A1 (en) | Data-physics fusion-driven adaptive voltage control system for flexible power distribution system | |
CN117117989A (en) | Deep reinforcement learning solving method for unit combination | |
CN117394446A (en) | Multi-stage robust unit combination method and device based on sequential evolution of batch scenes | |
CN117674160A (en) | Active power distribution network real-time voltage control method based on multi-agent deep reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |