CN115310760A - Gas system dynamic scheduling method based on improved near-end strategy optimization - Google Patents
Gas system dynamic scheduling method based on improved near-end strategy optimization Download PDFInfo
- Publication number
- CN115310760A CN115310760A CN202210781220.7A CN202210781220A CN115310760A CN 115310760 A CN115310760 A CN 115310760A CN 202210781220 A CN202210781220 A CN 202210781220A CN 115310760 A CN115310760 A CN 115310760A
- Authority
- CN
- China
- Prior art keywords
- gas
- pipe network
- scheduling
- model
- optimization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000005457 optimization Methods 0.000 title claims abstract description 47
- 230000002787 reinforcement Effects 0.000 claims abstract description 31
- 230000009471 action Effects 0.000 claims abstract description 27
- 230000006870 function Effects 0.000 claims abstract description 22
- 238000004519 manufacturing process Methods 0.000 claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 17
- 230000008569 process Effects 0.000 claims abstract description 14
- 238000012360 testing method Methods 0.000 claims abstract description 13
- 238000004422 calculation algorithm Methods 0.000 claims description 16
- 230000008859 change Effects 0.000 claims description 9
- 230000008901 benefit Effects 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 2
- 230000003213 activating effect Effects 0.000 claims description 2
- 230000003044 adaptive effect Effects 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 150000001875 compounds Chemical class 0.000 claims description 2
- 230000001276 controlling effect Effects 0.000 claims description 2
- 238000009826 distribution Methods 0.000 claims description 2
- 230000001105 regulatory effect Effects 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims description 2
- 239000003795 chemical substances by application Substances 0.000 description 10
- 238000007670 refining Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000446 fuel Substances 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06312—Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Tourism & Hospitality (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Quality & Reliability (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Primary Health Care (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Feedback Control In General (AREA)
Abstract
The invention discloses a gas system dynamic scheduling method based on improved near-end strategy optimization, which comprises the following steps: (1) Determining a production plan interval and a production and consumption gas device according to the scheduling optimization process of a gas system, and establishing a gas pipe network model; (2) Determining the initial state of a gas pipe network model according to the gas output and the initial value of the pressure of the gas pipe network, and iteratively updating on the basis; (3) Constructing a reward function for evaluating the action quality under the current state according to the optimization objective function; (4) The reinforcement learning intelligent agent model obtains an action a through the state, updates the model until one-time scheduling is completed, records the state, the action and the reward of the scheduling process, and updates the network parameters of the model; (5) After the iterative training is finished, applying the model to a test set for testing; (6) And carrying out scheduling optimization on the gas system by using the trained model. By utilizing the method and the device, the load capacity of the gas pipe network can be better predicted, and the pressure balance level of the gas pipe network is effectively improved.
Description
Technical Field
The invention relates to the field of balance and artificial intelligence application of a gas system, in particular to a gas system dynamic scheduling method based on improved near-end strategy optimization.
Background
The gas system is an important component of the energy system of the oil refinery and is one of the most important fuel sources of the oil refinery. The gas pipe network is main equipment for bearing gas transportation, but the pressure of the pipe network is limited by upper and lower limits, and if the gas yield is far greater than the consumption, the gas breaks through the upper limit of the pressure of the pipe network, so that potential safety hazards are easily caused; if the gas output is far less than the consumption, the gas is lower than the lower pressure limit of the pipe network, and mechanical failure is easily caused.
Currently, the scheduling optimization algorithms actually applied to the gas system are few, and most of the algorithms still depend on manual experience or traditional methods. Whether the scheduling method adopts a manual method or a traditional method mainly based on a heuristic algorithm, the performance of the solution is mostly common, and the actual effect of the scheduling rule determined by manual experience has a large difference with the difference of decision makers. The scheduling strategy which can be obtained by the heuristic algorithm in a limited time is only slightly better than that of manual operation, and a certain randomness often exists, so that a stable result cannot be obtained.
Chinese patent publication No. CN101794119A discloses a method for balancing and optimizing scheduling of a gas system based on prediction data, which includes: acquiring data required by triggering a gas system from a scheduling system; predicting the gas generation amount of each production device in a future preset time period and the energy demand of the heating furnace boiler according to the required data to obtain predicted data; judging whether the production and demand of the gas system in the future preset time period of each production device are balanced or not according to the prediction data, and optimizing the scheduling strategy and the scheduling scheme of the gas system in the future preset time period of each production device according to the prediction data when the production and demand of the gas system in the future preset time period of each production device are unbalanced; and displaying the optimized optimal scheduling strategy and scheduling scheme of the gas system in the future preset time period of each production device through the client so as to facilitate scheduling personnel to perform optimal scheduling. However, the method depends on more accurate historical data for the production and demand prediction of the gas system, and if the historical data is insufficient or the current production and demand data is changed greatly compared with the historical situation, the prediction accuracy is difficult to guarantee. In addition, the prediction model proposed by the method also needs to give production plan scheduling data in advance, which is difficult to realize accurate prediction under the scene of dynamic change of the production plan. In addition, the scheduling system mixed integer linear programming algorithm provided by the method needs repeated iteration for scheduling, has the problem of long operation time, and is difficult to achieve scheduling instantaneity.
Compared with the traditional method, the deep reinforcement learning method is used for scheduling optimization, and is a brand-new data-driven solving method. It has the following advantages:
(1) Generalization ability: the traditional method mostly needs to start from the beginning for a new problem, relatively better solutions are obtained through iteration, the algorithm has learning capacity through a deep reinforcement learning method, and the solutions can be effectively obtained when a new problem is given through analysis and solution of some problems.
(2) The flexibility is as follows: the deep reinforcement learning method can reduce the time complexity to be linear, and can be applied to large-scale problems by combining with mature parallel acceleration capability.
(3) Universality: the trained model can be suitable for the problems with different scales and different parameters, and a new parameter training study does not need to be designed for each problem.
However, currently, the academic and engineering circles do not adopt deep reinforcement learning algorithm to study and apply the scheduling optimization of the gas system of the oil refining enterprise.
Disclosure of Invention
The invention provides a gas system dynamic scheduling method based on improved near-end strategy optimization, which is used for dynamically scheduling a gas system, can better predict the load capacity of a gas pipe network and effectively improve the pressure balance level of the gas pipe network.
A gas system dynamic scheduling method based on improved near-end strategy optimization comprises the following steps:
(1) Determining a production plan interval, a device for producing gas and a device for consuming gas according to the scheduling optimization process of the gas system, and establishing a gas pipe network model;
(2) Determining the initial state of a gas pipe network model according to the initial values of the gas output and the gas pipe network pressure, and performing iterative updating on the basis;
(3) And constructing a reward function for evaluating the performance of the action under the current state according to the optimization objective function, wherein the reward function is represented by the income generated by the gas consumption device and the pressure balance degree of the gas pipe network, and the formula is as follows:
in the formula, x ik Representing the state of the ith device during the kth time period, p ik Represents the highest gain that the ith device can obtain in the kth time period, c ik Represents the maximum consumption, W, that the ith device can achieve during the kth time period k Indicates the pressure in the gas network, W, of the kth time period normal The pressure of the gas pipe network in a completely balanced state is represented; alpha is alpha k A penalty factor representing the pressure unbalance of the pipe network in the kth time period; n represents the number of devices;
(4) Building a reinforcement learning intelligent agent model, obtaining an action a through the state of the model, updating the reinforcement learning intelligent agent model until one-time scheduling is completed, recording the state, the action and the reward of the scheduling process, updating the network parameters of the reinforcement learning intelligent agent model, and improving the reward through iterative training;
(5) After the iterative training is finished, the reinforcement learning intelligent model is applied to a test set, and the change process of the pipe network pressure is visualized so as to ensure the safety and reliability of the reinforcement learning intelligent model;
(6) And storing the reinforcement learning intelligent agent model, and directly carrying out scheduling optimization on the gas system by using the trained reinforcement learning intelligent agent model.
Further, the air conditioner is provided with a fan,in the step (1), the devices consuming gas in the gas system are divided into two types, one type is a device for selecting the switching value of gas consumption, namely, all the provided gas is consumed, or no gas is consumed; another type is a device with a valve for regulating the gas consumption, the gas consumption being between 0 and c ik Continuously changing between;
assuming that there are m devices of the first type, the range of motion x ik E {0,1}, i =1,2, · m; k =1,2,. N; the second type of device has n-m devices, the range of motion of which is x jk ∈[0,1],j=m+1,m+2,...,n;k=1,2,...,N。
In the step (2), the state of the gas pipe network model is represented by the state of each device and the actual pipe network pressure at the current moment, and the current pipe network pressure is provided for the reinforcement learning intelligent model, so that the reinforcement learning intelligent model has the capabilities of predicting and controlling the pipe network pressure to maintain balance and increasing benefits.
In the step (4), the implementation of one-time scheduling by the reinforcement learning neural network specifically comprises the following steps:
(4-1) first, a network parameter θ of a policy is initialized 0 ,θ k For the parameters obtained from the previous training, theta for each iteration k Updating and interacting with the environment to obtain a group of state-action pairs, dynamically adjusting beta according to KL divergence, and estimating a merit function by using a near-end strategy optimization formula
(4-2) critic Web learning to estimate the value of the current strategyAnd parameterized according to current strategyTo calculate a future discount reward
(4-3) Actor network learning from theta π Parameterizing the resulting random strategy pi in order to take the action with the maximum probability of maximizing the future return sum; thus, the strategy is represented by θ π Parameterize and generate a probability distribution of the set of available actions at time t, formulated as:
where R represents the reward function evaluated by taking action a at state s and time t, and E represents the mathematical expectation;
(4-4) updating the parameter by calculating the timing difference ERROR TD-ERROR, and the formula is:
(4-5) activating by using a Tanh function, wherein the Tanh function is expressed as follows:
and (4-6) performing optimization calculation on the accumulated loss by adopting an Adam optimization algorithm, and performing iterative update on the weight of the neural network based on training data, thereby designing independent adaptive learning rates for different parameters.
In the step (4-1), the near-end strategy optimization formula is expressed as:
in the formula (I), the compound is shown in the specification,represents the optimized objective function, beta represents the penalty factor, KL (theta ) k ) Used for measuring theta and theta k To a similar degree.
wherein s is t Is the state at time t, R t Is from s t Conversion to s t+1 T represents the total number of scheduled time instants, Y is a discount factor, where 0 < Y ≦ 1, e represents the mathematical expectation of a future discount reward.
In step (4-3), during training, from a set of available action sets A, based on the probabilistic output of the policy network t Middle pair action a t Sampling is carried out, so that the selected action has certain randomness to encourage exploration; during the test, the action with the highest probability is selected instead.
Preferably, in the step (5), the trained reinforcement learning intelligent agent model is verified in a pre-generated test set, the total profit is calculated, a change curve of the pipe network pressure in the test process is drawn, and the control effect of the model on the pipe network pressure balance is verified.
Compared with the prior art, the invention has the following beneficial effects:
1. the gas system dynamic scheduling method based on improved near-end strategy optimization provided by the invention is improved aiming at different actual gas consumption devices, so that the algorithm can solve the mixing problem of 0/1 consumption devices and non-0/1 consumption devices.
2. Aiming at the problem of unbalanced pressure of an actual gas pipe network, the method aims at maximizing the consumption benefit and minimizing the pipe network fluctuation, trains the model by improving the near-end strategy optimization, and can efficiently obtain a scheduling optimization strategy by utilizing the trained model so as to guide the actual gas scheduling to a certain extent.
3. According to the gas system dynamic scheduling method based on the improved near-end strategy optimization, provided by the invention, the change range of the pipe network pressure does not exceed the upper limit and the lower limit according to the pressure change curve, so that the balance of the pipe network pressure can be effectively realized, and the safety of the scheduling process is improved.
4. The gas system dynamic scheduling method based on the improved near-end strategy optimization has the advantages of short solving time, good solving effect and the like, and a scheduling scheme can be obtained by adopting a trained network model for a production scene with higher scheduling strategy real-time performance.
5. The gas system dynamic scheduling method based on improved near-end strategy optimization, which is provided by the invention, is based on deep reinforcement learning, can train a network on a small-scale problem, directly migrate the trained network to a large-scale scheduling problem, effectively solve the large-scale optimization scheduling problem, obtain good solving performance and improve the adaptability of a scheduling optimization strategy.
Drawings
FIG. 1 is a view of a topology structure of a gas system pipe network according to an embodiment of the present invention;
FIG. 2 is a structural diagram of a reinforcement learning agent model constructed in an embodiment of the present invention;
FIG. 3 is a graph of a training curve of an improved near-end strategy optimization algorithm in an embodiment of the present invention;
FIG. 4 is a graph of pressure change for an improved near-end strategy optimization algorithm in an embodiment of the present invention.
Detailed Description
The invention will be described in further detail below with reference to the drawings and examples, which are intended to facilitate the understanding of the invention without limiting it in any way.
Taking a gas system in a certain oil refining enterprise as an example, a detailed description is given below to establish a dynamic scheduling model for the gas system and to perform simulation research.
As shown in fig. 1, a method for dynamically scheduling a gas system based on improved near-end policy optimization mainly includes:
step 1, determining a production plan interval and devices for producing and consuming gas, and establishing a gas pipe network model.
In this embodiment, the gas pipe network pressure is initially 100Kpa, and the upper and lower limits are 110Kpa and 90Kpa, respectively; the unit plan interval is [0, T ]]T =30, and the gas production amount in each time zone is y k K =1, 2.. N.gas consumption c k K =1, 2., N, for a total of N =5 gas consuming devices, wherein the first class of devices has m =2 and the second class of devices has N-m = 3.
And 2, determining the initial state of the gas pipe network model according to the initial values of the gas output and the gas pipe network pressure, and iteratively updating on the basis.
In this embodiment, the initial state is formed by connecting the states of the devices at the current time and the actual pipe network pressure, in the initial state, the production devices are about to produce gas according to the production plan of the 1 st time period, the consumption device is not operated, s 1 =[tank 1 ,tank 2 ,...,tank n ,W 1 ]Wherein tan 1 =tank 2 =…=tank n =0,W 1 =W normal +input 1 ,input 1 Indicating the gas input by the first time period production device.
Step 3, constructing a reward function for evaluating the action quality under the current state according to the optimization objective function, wherein the reward function is represented by the income generated by the gas consumption device and the pressure balance degree of the gas pipe network, and the formula is as follows:
(4) The reinforcement learning intelligent agent model obtains the action a through the state, updates the model until one-time scheduling is completed, records the state, the action and the reward of the scheduling process, updates the network parameters, and improves the reward through a certain number of iterations.
In this embodiment, the model structure of the reinforcement learning intelligent agent model (actor-critic network) is shown in fig. 2, and the relevant parameter information is: hiding the layer: 3, number of hidden layer neurons: 128, actor web learning rate: 5e-5, critic web learning rate: 1e-3, number of iterations: 2000 times, through three layers of full connection neural networks and activated by a Tanh function, action and state updating are obtained, and total reward is improved.
(5) After a certain number of iterations are completed, the model is applied to a test set, and the change process of the pipe network pressure is visualized.
As shown in fig. 3, in this embodiment, 2000 iterative training studies are performed altogether, and it can be seen from the training curve that the algorithm rapidly rises and converges in a short time, and it can be seen that the scheduling method provided by the present invention can effectively implement dynamic scheduling of a gas pipe network system, the training efficiency is high, the performance of the reinforcement learning intelligent agent model after training is stable, and a high benefit can be obtained under the condition of ensuring that the pressure of the gas pipe network is substantially balanced, and the method has good reliability and practicability.
As shown in fig. 4, in this embodiment, it can be seen from the pressure variation curve that the variation range of the pipe network pressure does not exceed the upper and lower limits when the algorithm runs in the test set, and it can be seen that the improvement of the near-end policy optimization algorithm can effectively achieve the balance of the pipe network pressure, and improve the safety of the scheduling process.
In addition, the average income of the algorithm running on 30 groups of test sets reaches 751, the gas generated by a production device can be fully utilized while the pressure balance of a pipe network is ensured, positive income is obtained, and the scheduling effectiveness is fully verified.
The embodiments described above are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the present invention.
Claims (8)
1. A gas system dynamic scheduling method based on improved near-end strategy optimization is characterized by comprising the following steps:
(1) Determining a production plan interval, a device for producing gas and a device for consuming gas according to the scheduling optimization process of the gas system, and establishing a gas pipe network model;
(2) Determining the initial state of a gas pipe network model according to the initial values of the gas output and the gas pipe network pressure, and iteratively updating on the basis;
(3) And constructing a reward function for evaluating the performance of the action under the current state according to the optimization objective function, wherein the reward function is represented by the income generated by the gas consumption device and the pressure balance degree of the gas pipe network, and the formula is as follows:
in the formula, x ik Representing the state of the ith device during the kth time period, p ik Represents the highest gain that the ith device can obtain in the kth time period, c ik Represents the maximum consumption, W, that the ith device can achieve during the kth time period k Indicating the magnitude of the pressure, W, in the gas network in the k-th time segment normal The pressure of the gas pipe network in a completely balanced state is represented; alpha (alpha) ("alpha") k A penalty factor representing the pressure unbalance of the pipe network in the kth time period; n represents the number of devices;
(4) Building a reinforcement learning intelligent agent model, obtaining an action a through the state of the model, updating the reinforcement learning intelligent agent model until one-time scheduling is completed, recording the state, the action and the reward of the scheduling process, updating the network parameters of the reinforcement learning intelligent agent model, and improving the reward through iterative training;
(5) After the iterative training is finished, the reinforcement learning intelligent model is applied to a test set, and the change process of the pressure of the pipe network is visualized, so that the safety and the reliability of the reinforcement learning intelligent model are ensured;
(6) And storing the reinforcement learning intelligent agent model, and directly carrying out scheduling optimization on the gas system by using the trained reinforcement learning intelligent agent model.
2. The method for dynamically scheduling gas system based on improved near-end strategy optimization as claimed in claim 1, wherein in step (1), the gas consumed in the gas system isThe devices of (1) are divided into two types, one type is a device for selecting the switching value of gas consumption, namely, all the supplied gas is consumed, or no gas is consumed; another type is a device with a valve for regulating the gas consumption, the gas consumption being between 0 and c ik Continuously changing between;
suppose that the first class of devices has m devices, the range of motion of which is x ik E {0,1}, i =1,2, ·, m; k =1,2,. Ang, N; the second type of devices has n-m, ranges of motion x jk ∈[0,1],j=m+1,m+2,…,n;k=1,2,…,N。
3. The method for dynamically scheduling the gas system based on the improved near-end strategy optimization of claim 1, wherein in the step (2), the state of the gas pipe network model is represented by the state of each device at the current time and the actual pipe network pressure, and the reinforcement learning intelligent agent model is provided with the current pipe network pressure so as to have the capability of predicting and controlling the pipe network pressure to maintain balance and increase the profit.
4. The gas system dynamic scheduling method based on improved near-end strategy optimization of claim 1, wherein in the step (4), the implementation of one-time scheduling by the reinforcement learning neural network specifically comprises the following steps:
(4-1) first, a network parameter θ of a policy is initialized 0 ,θ k For the parameters obtained from the previous training, theta for each iteration k Updating and interacting with the environment to obtain a group of state-action pairs, dynamically adjusting beta according to KL divergence, and estimating an advantage function by using a near-end strategy optimization formula
(4-2) critic Web learning to estimate the value of the current strategyAnd parameterized according to current strategyTo calculate a future discount reward
(4-3) actor network learning by theta π Parameterizing the resulting random strategy pi in order to take the action with the maximum probability of maximizing the future return sum; thus, the strategy is represented by θ π Parameterize and generate a probability distribution of the set of available actions at time t, formulated as:
where R represents the reward function evaluated by taking action a at state s and time t, and E represents the mathematical expectation;
(4-4) updating the parameter by calculating the timing difference ERROR TD-ERROR, and the formula is:
(4-5) activating by adopting a Tanh function, wherein the Tanh function is expressed as follows:
and (4-6) performing optimization calculation on the accumulated loss by adopting an Adam optimization algorithm, and performing iterative update on the weight of the neural network based on training data, thereby designing independent adaptive learning rates for different parameters.
5. The dynamic gas system scheduling method based on improved near-end strategy optimization of claim 4, wherein in the step (4-1), the near-end strategy optimization formula is expressed as:
6. The method for dynamically scheduling gas system based on improved near-end strategy optimization according to claim 4, wherein in the step (4-2), the discount reward in the future is calculatedIs expressed as:
wherein s is t Is the state at time t, R t Is from s t Conversion to s t+1 T represents the total number of scheduled time instants, gamma is a discount coefficient, where 0<γ ≦ 1, E represents the mathematical expectation of a future discount reward.
7. The method for dynamically scheduling gas system based on improved near-end strategy optimization of claim 4, wherein in the step (4-3), during the training, the probability output of the strategy network is used to select the available action set A t Middle pair action a t Sampling is carried out, so that the selected action has certain randomness to encourage exploration; during the test, the action with the highest probability is selected instead.
8. The gas system dynamic scheduling method based on improved near-end strategy optimization according to claim 1, wherein in the step (5), the trained reinforcement learning intelligent agent model is verified in a pre-generated test set, the total profit is calculated, a change curve of the pipe network pressure in the test process is drawn, and the control effect of the model on the pipe network pressure balance is verified.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210781220.7A CN115310760A (en) | 2022-07-04 | 2022-07-04 | Gas system dynamic scheduling method based on improved near-end strategy optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210781220.7A CN115310760A (en) | 2022-07-04 | 2022-07-04 | Gas system dynamic scheduling method based on improved near-end strategy optimization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115310760A true CN115310760A (en) | 2022-11-08 |
Family
ID=83856660
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210781220.7A Pending CN115310760A (en) | 2022-07-04 | 2022-07-04 | Gas system dynamic scheduling method based on improved near-end strategy optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115310760A (en) |
-
2022
- 2022-07-04 CN CN202210781220.7A patent/CN115310760A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112186743B (en) | Dynamic power system economic dispatching method based on deep reinforcement learning | |
CN112465664B (en) | AVC intelligent control method based on artificial neural network and deep reinforcement learning | |
CN111523737B (en) | Automatic optimization-seeking adjustment method for operation mode of deep Q network-driven power system | |
CN106920008A (en) | A kind of wind power forecasting method based on Modified particle swarm optimization BP neural network | |
CN107316099A (en) | Ammunition Storage Reliability Forecasting Methodology based on particle group optimizing BP neural network | |
WO2023070293A1 (en) | Long-term scheduling method for industrial byproduct gas system | |
CN113869795B (en) | Long-term scheduling method for industrial byproduct gas system | |
CN104636985A (en) | Method for predicting radio disturbance of electric transmission line by using improved BP (back propagation) neural network | |
CN114216256B (en) | Ventilation system air volume control method of off-line pre-training-on-line learning | |
CN111062170A (en) | Transformer top layer oil temperature prediction method | |
CN114909706B (en) | Two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control | |
CN104834975A (en) | Power network load factor prediction method based on intelligent algorithm optimization combination | |
CN107194460A (en) | The quantum telepotation recurrent neural network method of Financial Time Series Forecasting | |
CN116048028A (en) | Technological parameter optimization method based on reinforcement learning | |
CN106200379A (en) | A kind of distributed dynamic matrix majorization method of Nonself-regulating plant | |
CN109408896B (en) | Multi-element intelligent real-time monitoring method for anaerobic sewage treatment gas production | |
CN102663493A (en) | Delaying nerve network used for time sequence prediction | |
CN114566971A (en) | Real-time optimal power flow calculation method based on near-end strategy optimization algorithm | |
CN109932909A (en) | The big system of fired power generating unit desulphurization system couples Multi-variables optimum design match control method | |
Wei et al. | A combination forecasting method of grey neural network based on genetic algorithm | |
CN105389614A (en) | Implementation method for neural network self-updating process | |
CN115310760A (en) | Gas system dynamic scheduling method based on improved near-end strategy optimization | |
CN111799820A (en) | Double-layer intelligent hybrid zero-star cloud energy storage countermeasure regulation and control method for power system | |
CN116362635A (en) | Regional power grid source-load collaborative scheduling learning optimization method based on master-slave gaming | |
CN110826763B (en) | Middle-long term contract electric quantity decomposition method based on guided learning strategy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |