CN113222297A - Method, system, equipment and medium suitable for cyclic updating planning of solid waste base garden - Google Patents

Method, system, equipment and medium suitable for cyclic updating planning of solid waste base garden Download PDF

Info

Publication number
CN113222297A
CN113222297A CN202110639452.4A CN202110639452A CN113222297A CN 113222297 A CN113222297 A CN 113222297A CN 202110639452 A CN202110639452 A CN 202110639452A CN 113222297 A CN113222297 A CN 113222297A
Authority
CN
China
Prior art keywords
planning
equipment
parameters
network
updating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110639452.4A
Other languages
Chinese (zh)
Inventor
解大
王西田
王晨磊
赵玉琢
周楠
赵承汉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202110639452.4A priority Critical patent/CN113222297A/en
Publication of CN113222297A publication Critical patent/CN113222297A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Computation (AREA)
  • Game Theory and Decision Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Genetics & Genomics (AREA)
  • Computational Linguistics (AREA)
  • Physiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)

Abstract

The invention provides a circular updating and planning method suitable for a solid waste base garden, which comprises the following steps: step S1: inputting city development parameters, planning parameters, model parameters and equipment states for updating; step S2: determining the updated state of the equipment according to all possible solutions; step S3: after the reward value is obtained, the reward value and the state change are stored in a memory pool for later use; step S4: combining the experience accumulation in the memory pool with the current action to update the parameters; step S5: and fusing the urban development prediction result and the initial state of the equipment to obtain an equipment updating scheme. The invention also provides a circular updating plan, equipment and medium suitable for the solid waste base garden. The method has the advantages of time sequence and universality, can perform more detailed cyclic updating on the planning scheme according to the development change of the city, and simultaneously reduces unnecessary cost generated by misjudgment of the scheme on the future development trend.

Description

Method, system, equipment and medium suitable for cyclic updating planning of solid waste base garden
Technical Field
The invention relates to the technical field of solid waste infrastructure planning, in particular to a method, a system, equipment and a medium suitable for cyclic updating planning of a solid waste base garden.
Background
Energy is an important material basis for activities such as human production and life, and the demand for energy is continuously increasing after human beings enter industrial society. However, the great use of non-renewable energy and the unreasonable energy structure cause great pressure on the environment, and the increasing waste makes the phenomenon of 'garbage enclosing city' increasingly prominent. Coordinating the relationship between energy and environment, and becoming the key of urban sustainable development. However, the technical problem that the solid waste base park system cannot consistently provide a progressive planning scheme of an optimized system cycle under the condition of facing different city development exists at present, and a cycle updating method with time sequence and universality is required to be invented and designed to solve the problem.
Through retrieval, patent document CN112488350A discloses a power grid updating planning method and system considering clean energy, and based on the obtained updating project power supply and utilization data in the planning range and a multi-objective updating planning optimization model constructed in advance, a solution result is obtained; determining the capacity of various power supplies connected to the power grid based on the solving result; the multi-objective updating planning optimization model is constructed based on the access quantity of various renewable energy sources and the stability of a power grid. The method has the defects that the planning scheme of the solid waste base substance-energy system cannot be solved according to the multi-objective updating planning optimization model in different states.
Patent document CN112488452A discloses an energy system management scale optimal decision method based on deep reinforcement learning, which obtains the output power of a photovoltaic battery pack and the power required by a load two-step ahead through a prediction model based on a long-short term memory artificial neural network, so as to generate an optimal action decision for the charging and discharging actions of an energy storage battery pack by using the deep reinforcement learning method. The method has the disadvantages that although the problem that the action decision in a single time scale can cause system saturation and instability under certain conditions is considered, the optimal action decision is generated for the charge and discharge actions of the energy storage battery pack only according to the current and predicted system states at the future time, and the solution for the planning scheme of the solid waste base substance-energy system under different states cannot be derived naturally.
Patent document CN110309990A discloses a new energy uncertainty planning method considering typical scene tolerance, which adopts scene generation and reduction technology to solve the uncertainty and randomness problems, constructs an RDG typical output scene to balance the calculation efficiency and the calculation accuracy, and the selected typical scene can reflect the full-period operation characteristics of the planned area to the maximum extent to obtain a more practical voltage offset value and network loss. In the multi-objective optimization process, the typical scene tolerance index is introduced into an objective function, and the planning result is guaranteed to be suitable for the operating scenes as many as possible. However, although the prior art can be applied to different operation scenarios, the technical problem of how to update the planning scheme in a circulating manner is still not solved, and the adopted RDG typical processing scenario has no forward help of a depth certainty strategy to the planning scheme for proposing a high-precision planning scheme.
Therefore, it is desirable to provide a method and system with time sequence and universality, which can perform more detailed cyclic update on the planning scheme according to the development change of the city, and reduce the unnecessary cost caused by the wrong judgment of the scheme on the future development trend.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method, a system, equipment and a medium suitable for cyclic updating planning of a solid waste base garden, and provides a cyclic updating planning algorithm based on a Deep Deterministic Policy (DDPG), so that the solution of the planning scheme of the solid waste base substance-energy system is realized in different states, and the unnecessary cost generated by the wrong judgment of the scheme on the future development trend is reduced.
The invention provides a cyclic updating and planning method suitable for a solid waste base garden, which comprises the following steps:
step S1: inputting city development parameters, planning parameters, model parameters and equipment states for updating;
step S2: determining the updated state of the equipment according to all possible solutions;
step S3: after the reward value is obtained, the reward value and the state change are stored in a memory pool for later use;
step S4: combining the experience accumulation in the memory pool with the current action to update the parameters;
step S5: and fusing the urban development prediction result and the initial state of the equipment to obtain an equipment updating scheme.
Preferably, the city development parameters in step S1 include current and predicted values of population, labor, total production, solid waste production, wastewater production, and waste gas production;
the planning parameters comprise a planning scheme, a planning process and a planning year limit;
the model parameters comprise strategy network and Q network parameters;
the equipment state comprises the installed total capacity of photovoltaic power generation equipment, the installed total capacity of wind power equipment, the installed total capacity of CHP equipment, the installed total capacity of a gas boiler, the installed total capacity of an electric boiler, the installed total capacity of a fuel cell, the installed total capacity of a heat exchanger, the installed total capacity and energy efficiency of an absorption refrigerant, the installed total capacity and energy efficiency of electric gas conversion equipment, the total capacity of electric, thermal and gas energy storage equipment, the capacity of a garbage incineration power plant, the capacity and efficiency of a biogas power plant, the capacity and efficiency of a garbage landfill, the capacity and efficiency of a leachate treatment plant, and the capacity and efficiency of a biogas compression purification plant.
Preferably, step S1 includes:
step S1.1: randomly initializing all values corresponding to the states and actions, randomly initializing all parameters of the network, and generating an initial state S1;
step S1.2: finishing T times of training, finishing a planning scheme within a planning year after each training, and simultaneously updating strategy network parameters and Q network parameters;
the strategy network parameters are selected to act randomly, interact with the circulation system environment and the experience pool, generate reward values of taking different actions in different states, and update the strategy network parameters;
and after obtaining the state, action and reward of the strategy network parameter output, the Q network parameter interacts with the experience pool to update the Q network parameter.
Preferably, in step S2, in the initial state, the model policy network searches for possible solutions in all action ranges according to uniform distribution, determines the updated state of the new year device, and solves the lower-layer operating solution through the lower-layer planning, and merges the lower-layer operating solution and the upper-layer planning target to obtain the reward value.
Preferably, in step S3, when the memory pool is full, the exploration process will be decided by the policy network, and a random quantity decreasing with the training process is added to the output result of the policy network as the action of this exploration, and the state change and the reward value are also obtained and stored in the memory pool.
Preferably, in step S4, the policy network parameter and the Q network parameter are updated by randomly selecting a predetermined range from the memory pool and combining the current action.
Preferably, in step S5, after the model training is completed, inputting the prediction result of urban development within the planning year and the initial state of the equipment into the DDPG network structure model to obtain an equipment updating scheme for the solid waste base material-energy system every year;
the DDPG network structure model comprises a strategy network and a Q network, wherein the strategy network is used for randomly selecting actions, interacting with the circulation system environment and the experience pool and generating reward values for taking different actions in different states;
the Q network is used for interacting with the experience pool after obtaining the state, action and reward value output by the strategy network.
The invention provides a circulation updating and planning system suitable for a solid waste base garden, which comprises:
module M1: inputting city development parameters, planning parameters, model parameters and equipment states for updating;
module M2: determining the updated state of the equipment according to all possible solutions;
module M3: after the reward value is obtained, the reward value and the state change are stored in a memory pool for later use;
module M4: combining the experience accumulation in the memory pool with the current action to update the parameters;
module M5: and fusing the urban development prediction result and the initial state of the equipment to obtain an equipment updating scheme.
According to the present invention, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, carries out the above-mentioned method steps.
The circulation updating and planning equipment suitable for the solid waste base garden comprises the circulation updating and planning system suitable for the solid waste base garden or the storage medium which is stored with the computer and can be read by the computer.
Compared with the prior art, the invention has the following beneficial effects:
1. the network structure of the DDPG is improved based on the principle of a cyclic update algorithm, so that a DDPG framework can predict and evaluate urban development parameters and system equipment states more accurately, and the DDPG can be properly applied to material-energy system planning by combining a genetic algorithm and urban development prediction.
2. The method has the advantages of intelligent learning and selection, time sequence and universality, and can perform more detailed cyclic updating on the planning scheme according to the urban development change, and simultaneously reduce unnecessary cost generated by misjudgment of the scheme on the future development trend, so that the planning solution of the matter energy system is optimal in operation income, and finally, the optimal planning scheme considering the matter energy condition prediction of the comprehensive energy system is realized.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a schematic overall flow chart of the method for planning the cyclic update of the solid waste base garden in the present invention;
fig. 2 is a schematic diagram of a DDPG network structure model suitable for a solid waste base garden cycle updating planning system in the invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
As shown in fig. 1, the present invention provides a method for planning the cyclic update of a solid waste base garden, which comprises the following steps:
step S1: and inputting city development parameters, planning parameters, model parameters and equipment states for updating. The method specifically comprises the following steps:
step S1.1: randomly initializing all values corresponding to the states and actions, randomly initializing all parameters of the network, and generating an initial state S1;
step S1.2: finishing T times of training, finishing a planning scheme within a planning year after each training, and simultaneously updating strategy network parameters and Q network parameters;
the strategy network parameters are selected to act randomly, interact with the circulation system environment and the experience pool, generate reward values of taking different actions in different states, and update the strategy network parameters;
and after obtaining the state, action and reward of the strategy network parameter output, the Q network parameter interacts with the experience pool to update the Q network parameter.
The city development parameters comprise current values and predicted values of general population, labor force, total production value, solid waste yield, waste water yield and waste gas yield; the planning parameters comprise a planning scheme, a planning process and a planning year limit; the model parameters comprise strategy network and Q network parameters; the equipment state comprises the installed total capacity of photovoltaic power generation equipment, the installed total capacity of wind power equipment, the installed total capacity of CHP equipment, the installed total capacity of a gas boiler, the installed total capacity of an electric boiler, the installed total capacity of a fuel cell, the installed total capacity of a heat exchanger, the installed total capacity and energy efficiency of an absorption refrigerant, the installed total capacity and energy efficiency of electric gas conversion equipment, the total capacity of electric, thermal and gas energy storage equipment, the capacity of a garbage incineration power plant, the capacity and efficiency of a biogas power plant, the capacity and efficiency of a garbage landfill, the capacity and efficiency of a leachate treatment plant, and the capacity and efficiency of a biogas compression purification plant.
The more accurate the urban development parameter is predicted, the more accurate the subsequent planning scheme is, the planning parameter is used for planning the matter energy system (the matter condition obtained after prediction according to the urban development condition is used as the input of the planning), the model parameter is the DDPG learning training parameter, the model parameter is related to the prediction accuracy, the higher the equipment state parameter is used for calculating the operation benefit of the planning result, and the higher the benefit is, the best planning effect is considered. The invention finally realizes the planning scheme of the material energy system with the optimal operation income considering the urban development trend.
Step S2: determining the updated state of the equipment according to all possible solutions; in the initial state, the model strategy network searches possible solutions in all action ranges according to uniform distribution, determines the updated state of the new annual equipment, solves the lower-layer operation solution through lower-layer planning, and merges the lower-layer operation solution and the upper-layer planning target to obtain the reward value.
Step S3: after the reward value is obtained, the reward value and the state change are stored in a memory pool for later use; when the memory pool is full, the strategy network makes a decision in the exploration process, a random quantity which is continuously reduced along with the training process is added to the output result of the strategy network, and the random quantity is used as the action of the exploration and is also used for acquiring the state change and the reward value to be stored in the memory pool.
Step S4: combining the experience accumulation in the memory pool with the current action to update the parameters; experience of randomly selecting the amount in the set range from the memory pool is combined with the current action, and the strategy network parameters and the Q network parameters are updated.
Step S5: and fusing the urban development prediction result and the initial state of the equipment to obtain an equipment updating scheme. After the model training is finished, inputting the urban development prediction result within the planning year and the initial state of the equipment into the DDPG network structure model to obtain an equipment updating scheme for the solid waste base substance-energy system every year.
As shown in fig. 2, the DDPG network structure model includes a policy network and a Q network, the policy network is used for randomly selecting an action and interacting with a circulation system environment and an experience pool to generate reward values for taking different actions in different states, so as to update network parameters, the policy network introduces certain noise when executing the action, so as to ensure complete exploration of the environment, and the noise is generated by using an OU (international-Uhlenbeck) process. The Q network is used for interacting with the experience pool after obtaining the state, the action and the reward value output by the strategy network, updating Q network parameters, and then calculating the action Q value more accurately in actual decision making to make optimal selection.
In practice, DQN is an algorithm combining the classical reinforcement learning algorithm Q-learning and the deep neural network dnn (deep neural network), and can solve the difficulty of maintaining Q-table (a data table recording the expected profit of each behavior) in the face of complex problems. When continuous actions are faced, the selection range of the actions is wide, and the selection probabilities of all the actions can not be output discretely. In order to enable DQN to handle continuous actions, the value of the action needs to be updated using a policy gradient pg (policy gradient). The Deep deterministic strategy DDPG combines a deterministic strategy gradient DPG (deterministic Policy gradient) and a DQN (Deep Q-network), and realizes efficient learning of continuity states and continuity actions.
The DQN is problematic in that overestimation, i.e. a high bias is generated due to the fact that training is accelerated but the training is excessively biased towards actions with high reward values during training. Therefore, the policy network and the Q network in the DDPG each include two sets of networks with the same structure, which are respectively used for two steps of real-time calculation and delay calculation. They are referred to as the real-time network and the target network, respectively, and function such that the selection policy of the DDPG is not immediately affected by the current selection. The structures of the two networks are completely the same, and the delay network structure is gradually updated during each training. Recording two sets of network parameters of the policy network as theta and theta ', and two sets of network parameters of the Q network as w and w', and then updating the formula as follows:
Figure BDA0003106629890000061
where τ controls the update rate.
Then, for a policy network, the selection of actions is made using deterministic policies, so its loss function can be defined as:
Figure BDA0003106629890000062
wherein m represents the total number of samples taken for empirical playback; mu.sθ(s) represents a selected action; q (s, a, w) represents a Q value corresponding to the state s, the action a, and the Q network parameter w.
And correspondingly, the Q network takes the determined state and action as input, only the root mean square is taken as an error, and the updated parameters are propagated backwards. Q values were calculated similarly to DQN:
yi=ri+γQ′(s′,a′,w′)
wherein, yiFor the newly determined Q value, Q '(s', a ', w') represents the target Q value calculated by the Q network, and y isiComparing with the sample in the experience pool, the error can be obtained, and w is updated.
Figure BDA0003106629890000071
When the method is suitable for the application of the method for circularly updating and planning the solid waste base park in the solid waste base park system, in order to solve the problem that the substance-energy system can provide a progressive planning scheme for optimizing the system circulation under the condition of facing different urban development, the invention improves the network structure of the DDPG, and combines the genetic algorithm and urban development prediction to ensure that the DDPG can be properly applied to the planning of the substance-energy system.
As shown in Table 1, the overall input of the cyclic update algorithm is shown, wherein the states of the DDPG are the internal device states with serial numbers of 7-22; the actions used are discretized updates to the device; calculation of the reward uses genetic algorithms and predictions of the development through cities; a fully connected network is used as the basic structure of the network.
TABLE 1 Cyclic update Algorithm parameters
Figure BDA0003106629890000072
Aiming at each complete training, the cycle updating algorithm training process based on the DDPG network structure is as follows:
1) all states and corresponding values of the actions are randomly initialized, all parameters of the network are randomly initialized, and an initial state S1 is generated.
2) And (3) finishing T times of training, finishing the planning scheme within the planning year after each training, and simultaneously updating the strategy network and Q network parameters:
i) the strategy network selects the action a of response according to the current parameters and noisetAnd generates a new state st+1
ii) using st+1As input, obtaining the reward r according to the city development condition and the evaluation index systemt
iii) apply the current procedure st,at,rt,st+1Storing the data in an experience pool D;
vi) taking from experience pool Dm samples } { si,ai,ri,s i+11,2, …, m, calculating a target Q value yi
v) updating Q network parameters, updating strategy network parameters, and starting a new round of training if the planning age is reached; otherwise, go to ii).
(3) Total flow of cycle update algorithm
In the initial state, the model strategy network searches possible solutions in all action ranges according to uniform distribution so as to determine the updated state of the new equipment in one year, and solves the lower-layer operation solution by using a genetic algorithm through a lower-layer planning model, and combines the lower-layer operation solution with the upper-layer planning target to obtain a reward value.
And when the experience accumulation reaches the lower limit of the memory capacity, storing the reward value and the state change in a memory pool for later use.
When the memory pool is full, the strategy network makes a decision in the following exploration process, and a random quantity which is continuously reduced along with the training process is added to the output result of the strategy network to be used as an action of the exploration, and the state change and the reward value are also obtained. It is stored in a memory pool. Meanwhile, a certain amount of experience is randomly selected from the memory pool and combined with the current action, and the strategy network and Q network parameters are updated.
After the model training is finished, the urban development prediction result in the planning year and the initial state of the equipment are input into the model, and then an equipment updating scheme for the solid waste base substance-energy system every year can be obtained. In addition, through actual data updating every year, the planning scheme can be updated synchronously by taking the year as a unit, so that the planning scheme is corrected and extended.
The technical difficulty of the invention is that planning of a material energy system at the present stage is usually limited to the state of the current system, the development trend of the material energy system is not considered, and the updating schemes at different time points need to be obtained according to the development change of a city. In addition, the existing prediction of energy systems adopts a simple linear fitting method, and a more accurate method is needed to realize the development condition prediction of matter energy systems.
The invention improves the network structure of the DDPG to carry out discretization updating on the equipment, synchronously updates the planning scheme by taking the year as a unit, and solves the problem that the traditional optimization algorithm lacks time sequence. Meanwhile, the improved DDPG is applied to the planning of a matter energy system, and the planning of the matter energy system considering the urban development trend is realized.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. A cyclic updating and planning method suitable for a solid waste base garden is characterized by comprising the following steps:
step S1: inputting city development parameters, planning parameters, model parameters and equipment states for updating;
step S2: determining the updated state of the equipment according to all possible solutions;
step S3: after the reward value is obtained, the reward value and the state change are stored in a memory pool for later use;
step S4: combining the experience accumulation in the memory pool with the current action to update the parameters;
step S5: and fusing the urban development prediction result and the initial state of the equipment to obtain an equipment updating scheme.
2. The cyclic update planning method for solid waste base parks as claimed in claim 1, wherein the city development parameters in the step S1 include current and predicted values of population, labor force, total production value, solid waste yield, waste water yield and waste gas yield;
the planning parameters comprise a planning scheme, a planning process and a planning year limit;
the model parameters comprise strategy network and Q network parameters;
the equipment state comprises the installed total capacity of photovoltaic power generation equipment, the installed total capacity of wind power equipment, the installed total capacity of CHP equipment, the installed total capacity of a gas boiler, the installed total capacity of an electric boiler, the installed total capacity of a fuel cell, the installed total capacity of a heat exchanger, the installed total capacity and energy efficiency of an absorption refrigerant, the installed total capacity and energy efficiency of electric gas conversion equipment, the total capacity of electric, thermal and gas energy storage equipment, the capacity of a garbage incineration power plant, the capacity and efficiency of a biogas power plant, the capacity and efficiency of a garbage landfill, the capacity and efficiency of a leachate treatment plant, and the capacity and efficiency of a biogas compression purification plant.
3. The method for cyclic update planning for a solid waste base garden as claimed in claim 1, wherein the step S1 includes:
step S1.1: randomly initializing values corresponding to all states and actions, randomly initializing all parameters of the network, and generating an initial state S1
Step S1.2: finishing T times of training, finishing a planning scheme within a planning year after each training, and simultaneously updating strategy network parameters and Q network parameters;
the strategy network parameters are selected to act randomly, interact with the circulation system environment and the experience pool, generate reward values of taking different actions in different states, and update the strategy network parameters;
and after obtaining the state, action and reward output by the strategy network parameters, the Q network parameters are interacted with the experience pool to update the Q network parameters.
4. The method as claimed in claim 1, wherein in step S2, the model policy network searches possible solutions in all action ranges according to uniform distribution in the initial state, determines the updated state of the new annual equipment, and solves the lower-layer operation solution through the lower-layer planning, and combines the solution with the upper-layer planning target to obtain the reward value.
5. The method as claimed in claim 1, wherein in step S3, when the memory pool is full, the exploration process is decided by the policy network, and a random quantity decreasing with the training process is added to the output result of the policy network, and the state change and the reward value are also obtained and stored in the memory pool as the exploration action.
6. The method as claimed in claim 1, wherein in step S4, the policy network parameter and Q network parameter are updated by combining the experience of randomly selecting a set range of quantity from the memory pool with the current action.
7. The method for cyclic updating planning of solid waste base garden as claimed in claim 1, wherein in step S5, after the model training is completed, the prediction result of urban development within the planning year and the initial state of the equipment are input into the DDPG network structure model to obtain the equipment updating plan for the solid waste base substance-energy system every year;
the DDPG network structure model comprises a strategy network and a Q network, wherein the strategy network is used for randomly selecting actions, interacting with the circulation system environment and the experience pool and generating reward values for taking different actions in different states;
the Q network is used for interacting with the experience pool after obtaining the state, action and reward value output by the strategy network.
8. A circulation updating and planning system suitable for a solid waste base garden is characterized by comprising:
module M1: inputting city development parameters, planning parameters, model parameters and equipment states for updating;
module M2: determining the updated state of the equipment according to all possible solutions;
module M3: after the reward value is obtained, the reward value and the state change are stored in a memory pool for later use;
module M4: combining the experience accumulation in the memory pool with the current action to update the parameters;
module M5: and fusing the urban development prediction result and the initial state of the equipment to obtain an equipment updating scheme.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
10. A cyclic update planning apparatus for a solid waste base yard according to claim 8 or a computer readable storage medium according to claim 9.
CN202110639452.4A 2021-06-08 2021-06-08 Method, system, equipment and medium suitable for cyclic updating planning of solid waste base garden Pending CN113222297A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110639452.4A CN113222297A (en) 2021-06-08 2021-06-08 Method, system, equipment and medium suitable for cyclic updating planning of solid waste base garden

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110639452.4A CN113222297A (en) 2021-06-08 2021-06-08 Method, system, equipment and medium suitable for cyclic updating planning of solid waste base garden

Publications (1)

Publication Number Publication Date
CN113222297A true CN113222297A (en) 2021-08-06

Family

ID=77083276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110639452.4A Pending CN113222297A (en) 2021-06-08 2021-06-08 Method, system, equipment and medium suitable for cyclic updating planning of solid waste base garden

Country Status (1)

Country Link
CN (1) CN113222297A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399920A (en) * 2019-07-25 2019-11-01 哈尔滨工业大学(深圳) A kind of non-perfect information game method, apparatus, system and storage medium based on deeply study
CN111062632A (en) * 2019-12-24 2020-04-24 国网黑龙江省电力有限公司 5G energy Internet virtual power plant economic dispatching method based on edge intelligence
CN111158401A (en) * 2020-01-20 2020-05-15 北京理工大学 Distributed unmanned aerial vehicle path planning system and method for encouraging space-time data exploration
CN111401556A (en) * 2020-04-22 2020-07-10 清华大学深圳国际研究生院 Selection method of opponent type imitation learning winning incentive function
CN112821465A (en) * 2021-01-08 2021-05-18 合肥工业大学 Industrial microgrid load optimization scheduling method and system containing cogeneration
CN112862282A (en) * 2021-01-27 2021-05-28 合肥工业大学 DDQN algorithm-based source-load cooperative scheduling optimization method for comprehensive energy system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399920A (en) * 2019-07-25 2019-11-01 哈尔滨工业大学(深圳) A kind of non-perfect information game method, apparatus, system and storage medium based on deeply study
CN111062632A (en) * 2019-12-24 2020-04-24 国网黑龙江省电力有限公司 5G energy Internet virtual power plant economic dispatching method based on edge intelligence
CN111158401A (en) * 2020-01-20 2020-05-15 北京理工大学 Distributed unmanned aerial vehicle path planning system and method for encouraging space-time data exploration
CN111401556A (en) * 2020-04-22 2020-07-10 清华大学深圳国际研究生院 Selection method of opponent type imitation learning winning incentive function
CN112821465A (en) * 2021-01-08 2021-05-18 合肥工业大学 Industrial microgrid load optimization scheduling method and system containing cogeneration
CN112862282A (en) * 2021-01-27 2021-05-28 合肥工业大学 DDQN algorithm-based source-load cooperative scheduling optimization method for comprehensive energy system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨挺 等: ""基于深度强化学习的综合能源***动态经济调度"", 《电力***自动化》 *
苗竹 等: ""基于垃圾固废基地的电-气***规划"", 《电子自动化》 *

Similar Documents

Publication Publication Date Title
Wang et al. Forecasting the seasonal natural gas consumption in the US using a gray model with dummy variables
CN104794533B (en) It is included in the capacity optimal configuration method of the distribution network users photovoltaic plant for the electric vehicle that can network
CN108964103B (en) Microgrid energy storage configuration method considering schedulability of microgrid system
CN115983430B (en) Comprehensive energy system management optimization method and system
CN112132379B (en) Economic-considered new energy cross-region consumption evaluation method and storage medium
CN109712019A (en) Real-time energy management optimization method for multifunctional building
CN113162090A (en) Energy storage system capacity configuration optimization method considering battery module capacity
CN110032755A (en) Municipal sewage treatment process Multipurpose Optimal Method under multi-state
CN117665630B (en) Battery life prediction method and system based on charge-discharge cycle data
CN111509784A (en) Uncertainty-considered virtual power plant robust output feasible region identification method and device
CN113435659B (en) Scene analysis-based two-stage optimized operation method and system for comprehensive energy system
CN112052987B (en) Comprehensive energy system optimization planning method and system considering wind power
CN113723793A (en) Method, device, equipment and medium for realizing park comprehensive energy system
CN117595392A (en) Power distribution network joint optimization method and system considering light Fu Xiaona and light storage and charge configuration
CN113222297A (en) Method, system, equipment and medium suitable for cyclic updating planning of solid waste base garden
Zheng et al. Real-time dispatch of an integrated energy system based on multi-stage reinforcement learning with an improved action-choosing strategy
Fang et al. Energy scheduling and decision learning of combined cooling, heating and power microgrid based on deep deterministic policy gradient
Serrano-Arévalo et al. Optimal expansion for a clean power sector transition in Mexico based on predicted electricity demand using deep learning scheme
CN113516269A (en) Management method of multi-energy complementary energy hub equipment
Zhao et al. A novel binary social learning particle swarm optimizer for power system unit commitment
Guo et al. A dynamic rolling dispatch for integrated energy system with a hybrid time scale framework
CN113705067B (en) Microgrid optimization operation strategy generation method, system, equipment and storage medium
CN116454987B (en) Energy storage optimization method and system for joint scheduling with new energy
CN117060468B (en) Energy storage peak shaving capacity optimization configuration method and system based on improved NSGA-II algorithm
Bowen et al. Energy Consumption Prediction Model of Wastewater Treatment Plant Based on Stochastic Configuration Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210806