CN115310760A - Gas system dynamic scheduling method based on improved near-end strategy optimization - Google Patents

Gas system dynamic scheduling method based on improved near-end strategy optimization Download PDF

Info

Publication number
CN115310760A
CN115310760A CN202210781220.7A CN202210781220A CN115310760A CN 115310760 A CN115310760 A CN 115310760A CN 202210781220 A CN202210781220 A CN 202210781220A CN 115310760 A CN115310760 A CN 115310760A
Authority
CN
China
Prior art keywords
gas
pipe network
scheduling
model
optimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210781220.7A
Other languages
Chinese (zh)
Inventor
谢磊
常海颖
陈启明
苏宏业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210781220.7A priority Critical patent/CN115310760A/en
Publication of CN115310760A publication Critical patent/CN115310760A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Tourism & Hospitality (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a gas system dynamic scheduling method based on improved near-end strategy optimization, which comprises the following steps: (1) Determining a production plan interval and a production and consumption gas device according to the scheduling optimization process of a gas system, and establishing a gas pipe network model; (2) Determining the initial state of a gas pipe network model according to the gas output and the initial value of the pressure of the gas pipe network, and iteratively updating on the basis; (3) Constructing a reward function for evaluating the action quality under the current state according to the optimization objective function; (4) The reinforcement learning intelligent agent model obtains an action a through the state, updates the model until one-time scheduling is completed, records the state, the action and the reward of the scheduling process, and updates the network parameters of the model; (5) After the iterative training is finished, applying the model to a test set for testing; (6) And carrying out scheduling optimization on the gas system by using the trained model. By utilizing the method and the device, the load capacity of the gas pipe network can be better predicted, and the pressure balance level of the gas pipe network is effectively improved.

Description

Gas system dynamic scheduling method based on improved near-end strategy optimization
Technical Field
The invention relates to the field of balance and artificial intelligence application of a gas system, in particular to a gas system dynamic scheduling method based on improved near-end strategy optimization.
Background
The gas system is an important component of the energy system of the oil refinery and is one of the most important fuel sources of the oil refinery. The gas pipe network is main equipment for bearing gas transportation, but the pressure of the pipe network is limited by upper and lower limits, and if the gas yield is far greater than the consumption, the gas breaks through the upper limit of the pressure of the pipe network, so that potential safety hazards are easily caused; if the gas output is far less than the consumption, the gas is lower than the lower pressure limit of the pipe network, and mechanical failure is easily caused.
Currently, the scheduling optimization algorithms actually applied to the gas system are few, and most of the algorithms still depend on manual experience or traditional methods. Whether the scheduling method adopts a manual method or a traditional method mainly based on a heuristic algorithm, the performance of the solution is mostly common, and the actual effect of the scheduling rule determined by manual experience has a large difference with the difference of decision makers. The scheduling strategy which can be obtained by the heuristic algorithm in a limited time is only slightly better than that of manual operation, and a certain randomness often exists, so that a stable result cannot be obtained.
Chinese patent publication No. CN101794119A discloses a method for balancing and optimizing scheduling of a gas system based on prediction data, which includes: acquiring data required by triggering a gas system from a scheduling system; predicting the gas generation amount of each production device in a future preset time period and the energy demand of the heating furnace boiler according to the required data to obtain predicted data; judging whether the production and demand of the gas system in the future preset time period of each production device are balanced or not according to the prediction data, and optimizing the scheduling strategy and the scheduling scheme of the gas system in the future preset time period of each production device according to the prediction data when the production and demand of the gas system in the future preset time period of each production device are unbalanced; and displaying the optimized optimal scheduling strategy and scheduling scheme of the gas system in the future preset time period of each production device through the client so as to facilitate scheduling personnel to perform optimal scheduling. However, the method depends on more accurate historical data for the production and demand prediction of the gas system, and if the historical data is insufficient or the current production and demand data is changed greatly compared with the historical situation, the prediction accuracy is difficult to guarantee. In addition, the prediction model proposed by the method also needs to give production plan scheduling data in advance, which is difficult to realize accurate prediction under the scene of dynamic change of the production plan. In addition, the scheduling system mixed integer linear programming algorithm provided by the method needs repeated iteration for scheduling, has the problem of long operation time, and is difficult to achieve scheduling instantaneity.
Compared with the traditional method, the deep reinforcement learning method is used for scheduling optimization, and is a brand-new data-driven solving method. It has the following advantages:
(1) Generalization ability: the traditional method mostly needs to start from the beginning for a new problem, relatively better solutions are obtained through iteration, the algorithm has learning capacity through a deep reinforcement learning method, and the solutions can be effectively obtained when a new problem is given through analysis and solution of some problems.
(2) The flexibility is as follows: the deep reinforcement learning method can reduce the time complexity to be linear, and can be applied to large-scale problems by combining with mature parallel acceleration capability.
(3) Universality: the trained model can be suitable for the problems with different scales and different parameters, and a new parameter training study does not need to be designed for each problem.
However, currently, the academic and engineering circles do not adopt deep reinforcement learning algorithm to study and apply the scheduling optimization of the gas system of the oil refining enterprise.
Disclosure of Invention
The invention provides a gas system dynamic scheduling method based on improved near-end strategy optimization, which is used for dynamically scheduling a gas system, can better predict the load capacity of a gas pipe network and effectively improve the pressure balance level of the gas pipe network.
A gas system dynamic scheduling method based on improved near-end strategy optimization comprises the following steps:
(1) Determining a production plan interval, a device for producing gas and a device for consuming gas according to the scheduling optimization process of the gas system, and establishing a gas pipe network model;
(2) Determining the initial state of a gas pipe network model according to the initial values of the gas output and the gas pipe network pressure, and performing iterative updating on the basis;
(3) And constructing a reward function for evaluating the performance of the action under the current state according to the optimization objective function, wherein the reward function is represented by the income generated by the gas consumption device and the pressure balance degree of the gas pipe network, and the formula is as follows:
Figure BDA0003727808660000031
in the formula, x ik Representing the state of the ith device during the kth time period, p ik Represents the highest gain that the ith device can obtain in the kth time period, c ik Represents the maximum consumption, W, that the ith device can achieve during the kth time period k Indicates the pressure in the gas network, W, of the kth time period normal The pressure of the gas pipe network in a completely balanced state is represented; alpha is alpha k A penalty factor representing the pressure unbalance of the pipe network in the kth time period; n represents the number of devices;
(4) Building a reinforcement learning intelligent agent model, obtaining an action a through the state of the model, updating the reinforcement learning intelligent agent model until one-time scheduling is completed, recording the state, the action and the reward of the scheduling process, updating the network parameters of the reinforcement learning intelligent agent model, and improving the reward through iterative training;
(5) After the iterative training is finished, the reinforcement learning intelligent model is applied to a test set, and the change process of the pipe network pressure is visualized so as to ensure the safety and reliability of the reinforcement learning intelligent model;
(6) And storing the reinforcement learning intelligent agent model, and directly carrying out scheduling optimization on the gas system by using the trained reinforcement learning intelligent agent model.
Further, the air conditioner is provided with a fan,in the step (1), the devices consuming gas in the gas system are divided into two types, one type is a device for selecting the switching value of gas consumption, namely, all the provided gas is consumed, or no gas is consumed; another type is a device with a valve for regulating the gas consumption, the gas consumption being between 0 and c ik Continuously changing between;
assuming that there are m devices of the first type, the range of motion x ik E {0,1}, i =1,2, · m; k =1,2,. N; the second type of device has n-m devices, the range of motion of which is x jk ∈[0,1],j=m+1,m+2,...,n;k=1,2,...,N。
In the step (2), the state of the gas pipe network model is represented by the state of each device and the actual pipe network pressure at the current moment, and the current pipe network pressure is provided for the reinforcement learning intelligent model, so that the reinforcement learning intelligent model has the capabilities of predicting and controlling the pipe network pressure to maintain balance and increasing benefits.
In the step (4), the implementation of one-time scheduling by the reinforcement learning neural network specifically comprises the following steps:
(4-1) first, a network parameter θ of a policy is initialized 0 ,θ k For the parameters obtained from the previous training, theta for each iteration k Updating and interacting with the environment to obtain a group of state-action pairs, dynamically adjusting beta according to KL divergence, and estimating a merit function by using a near-end strategy optimization formula
Figure BDA0003727808660000032
(4-2) critic Web learning to estimate the value of the current strategy
Figure BDA0003727808660000041
And parameterized according to current strategy
Figure BDA0003727808660000042
To calculate a future discount reward
Figure BDA0003727808660000043
(4-3) Actor network learning from theta π Parameterizing the resulting random strategy pi in order to take the action with the maximum probability of maximizing the future return sum; thus, the strategy is represented by θ π Parameterize and generate a probability distribution of the set of available actions at time t, formulated as:
Figure BDA0003727808660000044
where R represents the reward function evaluated by taking action a at state s and time t, and E represents the mathematical expectation;
(4-4) updating the parameter by calculating the timing difference ERROR TD-ERROR, and the formula is:
Figure BDA0003727808660000045
(4-5) activating by using a Tanh function, wherein the Tanh function is expressed as follows:
Figure BDA0003727808660000046
and (4-6) performing optimization calculation on the accumulated loss by adopting an Adam optimization algorithm, and performing iterative update on the weight of the neural network based on training data, thereby designing independent adaptive learning rates for different parameters.
In the step (4-1), the near-end strategy optimization formula is expressed as:
Figure BDA0003727808660000047
in the formula (I), the compound is shown in the specification,
Figure BDA0003727808660000048
represents the optimized objective function, beta represents the penalty factor, KL (theta ) k ) Used for measuring theta and theta k To a similar degree.
Step (4-2) In calculating the future discount reward
Figure BDA0003727808660000049
Is expressed as:
Figure BDA00037278086600000410
wherein s is t Is the state at time t, R t Is from s t Conversion to s t+1 T represents the total number of scheduled time instants, Y is a discount factor, where 0 < Y ≦ 1, e represents the mathematical expectation of a future discount reward.
In step (4-3), during training, from a set of available action sets A, based on the probabilistic output of the policy network t Middle pair action a t Sampling is carried out, so that the selected action has certain randomness to encourage exploration; during the test, the action with the highest probability is selected instead.
Preferably, in the step (5), the trained reinforcement learning intelligent agent model is verified in a pre-generated test set, the total profit is calculated, a change curve of the pipe network pressure in the test process is drawn, and the control effect of the model on the pipe network pressure balance is verified.
Compared with the prior art, the invention has the following beneficial effects:
1. the gas system dynamic scheduling method based on improved near-end strategy optimization provided by the invention is improved aiming at different actual gas consumption devices, so that the algorithm can solve the mixing problem of 0/1 consumption devices and non-0/1 consumption devices.
2. Aiming at the problem of unbalanced pressure of an actual gas pipe network, the method aims at maximizing the consumption benefit and minimizing the pipe network fluctuation, trains the model by improving the near-end strategy optimization, and can efficiently obtain a scheduling optimization strategy by utilizing the trained model so as to guide the actual gas scheduling to a certain extent.
3. According to the gas system dynamic scheduling method based on the improved near-end strategy optimization, provided by the invention, the change range of the pipe network pressure does not exceed the upper limit and the lower limit according to the pressure change curve, so that the balance of the pipe network pressure can be effectively realized, and the safety of the scheduling process is improved.
4. The gas system dynamic scheduling method based on the improved near-end strategy optimization has the advantages of short solving time, good solving effect and the like, and a scheduling scheme can be obtained by adopting a trained network model for a production scene with higher scheduling strategy real-time performance.
5. The gas system dynamic scheduling method based on improved near-end strategy optimization, which is provided by the invention, is based on deep reinforcement learning, can train a network on a small-scale problem, directly migrate the trained network to a large-scale scheduling problem, effectively solve the large-scale optimization scheduling problem, obtain good solving performance and improve the adaptability of a scheduling optimization strategy.
Drawings
FIG. 1 is a view of a topology structure of a gas system pipe network according to an embodiment of the present invention;
FIG. 2 is a structural diagram of a reinforcement learning agent model constructed in an embodiment of the present invention;
FIG. 3 is a graph of a training curve of an improved near-end strategy optimization algorithm in an embodiment of the present invention;
FIG. 4 is a graph of pressure change for an improved near-end strategy optimization algorithm in an embodiment of the present invention.
Detailed Description
The invention will be described in further detail below with reference to the drawings and examples, which are intended to facilitate the understanding of the invention without limiting it in any way.
Taking a gas system in a certain oil refining enterprise as an example, a detailed description is given below to establish a dynamic scheduling model for the gas system and to perform simulation research.
As shown in fig. 1, a method for dynamically scheduling a gas system based on improved near-end policy optimization mainly includes:
step 1, determining a production plan interval and devices for producing and consuming gas, and establishing a gas pipe network model.
In this embodiment, the gas pipe network pressure is initially 100Kpa, and the upper and lower limits are 110Kpa and 90Kpa, respectively; the unit plan interval is [0, T ]]T =30, and the gas production amount in each time zone is y k K =1, 2.. N.gas consumption c k K =1, 2., N, for a total of N =5 gas consuming devices, wherein the first class of devices has m =2 and the second class of devices has N-m = 3.
And 2, determining the initial state of the gas pipe network model according to the initial values of the gas output and the gas pipe network pressure, and iteratively updating on the basis.
In this embodiment, the initial state is formed by connecting the states of the devices at the current time and the actual pipe network pressure, in the initial state, the production devices are about to produce gas according to the production plan of the 1 st time period, the consumption device is not operated, s 1 =[tank 1 ,tank 2 ,...,tank n ,W 1 ]Wherein tan 1 =tank 2 =…=tank n =0,W 1 =W normal +input 1 ,input 1 Indicating the gas input by the first time period production device.
Step 3, constructing a reward function for evaluating the action quality under the current state according to the optimization objective function, wherein the reward function is represented by the income generated by the gas consumption device and the pressure balance degree of the gas pipe network, and the formula is as follows:
Figure BDA0003727808660000071
(4) The reinforcement learning intelligent agent model obtains the action a through the state, updates the model until one-time scheduling is completed, records the state, the action and the reward of the scheduling process, updates the network parameters, and improves the reward through a certain number of iterations.
In this embodiment, the model structure of the reinforcement learning intelligent agent model (actor-critic network) is shown in fig. 2, and the relevant parameter information is: hiding the layer: 3, number of hidden layer neurons: 128, actor web learning rate: 5e-5, critic web learning rate: 1e-3, number of iterations: 2000 times, through three layers of full connection neural networks and activated by a Tanh function, action and state updating are obtained, and total reward is improved.
(5) After a certain number of iterations are completed, the model is applied to a test set, and the change process of the pipe network pressure is visualized.
As shown in fig. 3, in this embodiment, 2000 iterative training studies are performed altogether, and it can be seen from the training curve that the algorithm rapidly rises and converges in a short time, and it can be seen that the scheduling method provided by the present invention can effectively implement dynamic scheduling of a gas pipe network system, the training efficiency is high, the performance of the reinforcement learning intelligent agent model after training is stable, and a high benefit can be obtained under the condition of ensuring that the pressure of the gas pipe network is substantially balanced, and the method has good reliability and practicability.
As shown in fig. 4, in this embodiment, it can be seen from the pressure variation curve that the variation range of the pipe network pressure does not exceed the upper and lower limits when the algorithm runs in the test set, and it can be seen that the improvement of the near-end policy optimization algorithm can effectively achieve the balance of the pipe network pressure, and improve the safety of the scheduling process.
In addition, the average income of the algorithm running on 30 groups of test sets reaches 751, the gas generated by a production device can be fully utilized while the pressure balance of a pipe network is ensured, positive income is obtained, and the scheduling effectiveness is fully verified.
The embodiments described above are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims (8)

1. A gas system dynamic scheduling method based on improved near-end strategy optimization is characterized by comprising the following steps:
(1) Determining a production plan interval, a device for producing gas and a device for consuming gas according to the scheduling optimization process of the gas system, and establishing a gas pipe network model;
(2) Determining the initial state of a gas pipe network model according to the initial values of the gas output and the gas pipe network pressure, and iteratively updating on the basis;
(3) And constructing a reward function for evaluating the performance of the action under the current state according to the optimization objective function, wherein the reward function is represented by the income generated by the gas consumption device and the pressure balance degree of the gas pipe network, and the formula is as follows:
Figure FDA0003727808650000011
in the formula, x ik Representing the state of the ith device during the kth time period, p ik Represents the highest gain that the ith device can obtain in the kth time period, c ik Represents the maximum consumption, W, that the ith device can achieve during the kth time period k Indicating the magnitude of the pressure, W, in the gas network in the k-th time segment normal The pressure of the gas pipe network in a completely balanced state is represented; alpha (alpha) ("alpha") k A penalty factor representing the pressure unbalance of the pipe network in the kth time period; n represents the number of devices;
(4) Building a reinforcement learning intelligent agent model, obtaining an action a through the state of the model, updating the reinforcement learning intelligent agent model until one-time scheduling is completed, recording the state, the action and the reward of the scheduling process, updating the network parameters of the reinforcement learning intelligent agent model, and improving the reward through iterative training;
(5) After the iterative training is finished, the reinforcement learning intelligent model is applied to a test set, and the change process of the pressure of the pipe network is visualized, so that the safety and the reliability of the reinforcement learning intelligent model are ensured;
(6) And storing the reinforcement learning intelligent agent model, and directly carrying out scheduling optimization on the gas system by using the trained reinforcement learning intelligent agent model.
2. The method for dynamically scheduling gas system based on improved near-end strategy optimization as claimed in claim 1, wherein in step (1), the gas consumed in the gas system isThe devices of (1) are divided into two types, one type is a device for selecting the switching value of gas consumption, namely, all the supplied gas is consumed, or no gas is consumed; another type is a device with a valve for regulating the gas consumption, the gas consumption being between 0 and c ik Continuously changing between;
suppose that the first class of devices has m devices, the range of motion of which is x ik E {0,1}, i =1,2, ·, m; k =1,2,. Ang, N; the second type of devices has n-m, ranges of motion x jk ∈[0,1],j=m+1,m+2,…,n;k=1,2,…,N。
3. The method for dynamically scheduling the gas system based on the improved near-end strategy optimization of claim 1, wherein in the step (2), the state of the gas pipe network model is represented by the state of each device at the current time and the actual pipe network pressure, and the reinforcement learning intelligent agent model is provided with the current pipe network pressure so as to have the capability of predicting and controlling the pipe network pressure to maintain balance and increase the profit.
4. The gas system dynamic scheduling method based on improved near-end strategy optimization of claim 1, wherein in the step (4), the implementation of one-time scheduling by the reinforcement learning neural network specifically comprises the following steps:
(4-1) first, a network parameter θ of a policy is initialized 0 ,θ k For the parameters obtained from the previous training, theta for each iteration k Updating and interacting with the environment to obtain a group of state-action pairs, dynamically adjusting beta according to KL divergence, and estimating an advantage function by using a near-end strategy optimization formula
Figure FDA0003727808650000021
(4-2) critic Web learning to estimate the value of the current strategy
Figure FDA0003727808650000022
And parameterized according to current strategy
Figure FDA0003727808650000026
To calculate a future discount reward
Figure FDA0003727808650000023
(4-3) actor network learning by theta π Parameterizing the resulting random strategy pi in order to take the action with the maximum probability of maximizing the future return sum; thus, the strategy is represented by θ π Parameterize and generate a probability distribution of the set of available actions at time t, formulated as:
Figure FDA0003727808650000024
where R represents the reward function evaluated by taking action a at state s and time t, and E represents the mathematical expectation;
(4-4) updating the parameter by calculating the timing difference ERROR TD-ERROR, and the formula is:
Figure FDA0003727808650000025
(4-5) activating by adopting a Tanh function, wherein the Tanh function is expressed as follows:
Figure FDA0003727808650000031
and (4-6) performing optimization calculation on the accumulated loss by adopting an Adam optimization algorithm, and performing iterative update on the weight of the neural network based on training data, thereby designing independent adaptive learning rates for different parameters.
5. The dynamic gas system scheduling method based on improved near-end strategy optimization of claim 4, wherein in the step (4-1), the near-end strategy optimization formula is expressed as:
Figure FDA0003727808650000032
in the formula (I), the compound is shown in the specification,
Figure FDA0003727808650000033
denotes the optimized objective function, beta denotes the penalty factor, KL (theta ) k ) Used for measuring theta and theta k To a similar degree.
6. The method for dynamically scheduling gas system based on improved near-end strategy optimization according to claim 4, wherein in the step (4-2), the discount reward in the future is calculated
Figure FDA0003727808650000034
Is expressed as:
Figure FDA0003727808650000035
wherein s is t Is the state at time t, R t Is from s t Conversion to s t+1 T represents the total number of scheduled time instants, gamma is a discount coefficient, where 0<γ ≦ 1, E represents the mathematical expectation of a future discount reward.
7. The method for dynamically scheduling gas system based on improved near-end strategy optimization of claim 4, wherein in the step (4-3), during the training, the probability output of the strategy network is used to select the available action set A t Middle pair action a t Sampling is carried out, so that the selected action has certain randomness to encourage exploration; during the test, the action with the highest probability is selected instead.
8. The gas system dynamic scheduling method based on improved near-end strategy optimization according to claim 1, wherein in the step (5), the trained reinforcement learning intelligent agent model is verified in a pre-generated test set, the total profit is calculated, a change curve of the pipe network pressure in the test process is drawn, and the control effect of the model on the pipe network pressure balance is verified.
CN202210781220.7A 2022-07-04 2022-07-04 Gas system dynamic scheduling method based on improved near-end strategy optimization Pending CN115310760A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210781220.7A CN115310760A (en) 2022-07-04 2022-07-04 Gas system dynamic scheduling method based on improved near-end strategy optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210781220.7A CN115310760A (en) 2022-07-04 2022-07-04 Gas system dynamic scheduling method based on improved near-end strategy optimization

Publications (1)

Publication Number Publication Date
CN115310760A true CN115310760A (en) 2022-11-08

Family

ID=83856660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210781220.7A Pending CN115310760A (en) 2022-07-04 2022-07-04 Gas system dynamic scheduling method based on improved near-end strategy optimization

Country Status (1)

Country Link
CN (1) CN115310760A (en)

Similar Documents

Publication Publication Date Title
CN112186743B (en) Dynamic power system economic dispatching method based on deep reinforcement learning
CN112465664B (en) AVC intelligent control method based on artificial neural network and deep reinforcement learning
CN111523737B (en) Automatic optimization-seeking adjustment method for operation mode of deep Q network-driven power system
CN106920008A (en) A kind of wind power forecasting method based on Modified particle swarm optimization BP neural network
CN107316099A (en) Ammunition Storage Reliability Forecasting Methodology based on particle group optimizing BP neural network
WO2023070293A1 (en) Long-term scheduling method for industrial byproduct gas system
CN113869795B (en) Long-term scheduling method for industrial byproduct gas system
CN104636985A (en) Method for predicting radio disturbance of electric transmission line by using improved BP (back propagation) neural network
CN114216256B (en) Ventilation system air volume control method of off-line pre-training-on-line learning
CN111062170A (en) Transformer top layer oil temperature prediction method
CN114909706B (en) Two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control
CN104834975A (en) Power network load factor prediction method based on intelligent algorithm optimization combination
CN107194460A (en) The quantum telepotation recurrent neural network method of Financial Time Series Forecasting
CN116048028A (en) Technological parameter optimization method based on reinforcement learning
CN106200379A (en) A kind of distributed dynamic matrix majorization method of Nonself-regulating plant
CN109408896B (en) Multi-element intelligent real-time monitoring method for anaerobic sewage treatment gas production
CN102663493A (en) Delaying nerve network used for time sequence prediction
CN114566971A (en) Real-time optimal power flow calculation method based on near-end strategy optimization algorithm
CN109932909A (en) The big system of fired power generating unit desulphurization system couples Multi-variables optimum design match control method
Wei et al. A combination forecasting method of grey neural network based on genetic algorithm
CN105389614A (en) Implementation method for neural network self-updating process
CN115310760A (en) Gas system dynamic scheduling method based on improved near-end strategy optimization
CN111799820A (en) Double-layer intelligent hybrid zero-star cloud energy storage countermeasure regulation and control method for power system
CN116362635A (en) Regional power grid source-load collaborative scheduling learning optimization method based on master-slave gaming
CN110826763B (en) Middle-long term contract electric quantity decomposition method based on guided learning strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination