CN114909706A - Secondary network balance regulation and control method based on reinforcement learning algorithm and pressure difference control - Google Patents

Secondary network balance regulation and control method based on reinforcement learning algorithm and pressure difference control Download PDF

Info

Publication number
CN114909706A
CN114909706A CN202210432777.XA CN202210432777A CN114909706A CN 114909706 A CN114909706 A CN 114909706A CN 202210432777 A CN202210432777 A CN 202210432777A CN 114909706 A CN114909706 A CN 114909706A
Authority
CN
China
Prior art keywords
unit building
value
network
data
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210432777.XA
Other languages
Chinese (zh)
Other versions
CN114909706B (en
Inventor
刘定杰
穆佩红
金鹤峰
谢金芳
朱浩强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou Engipower Technology Co ltd
Original Assignee
Changzhou Engipower Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou Engipower Technology Co ltd filed Critical Changzhou Engipower Technology Co ltd
Priority to CN202210432777.XA priority Critical patent/CN114909706B/en
Publication of CN114909706A publication Critical patent/CN114909706A/en
Application granted granted Critical
Publication of CN114909706B publication Critical patent/CN114909706B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24DDOMESTIC- OR SPACE-HEATING SYSTEMS, e.g. CENTRAL HEATING SYSTEMS; DOMESTIC HOT-WATER SUPPLY SYSTEMS; ELEMENTS OR COMPONENTS THEREFOR
    • F24D19/00Details
    • F24D19/10Arrangement or mounting of control or safety devices
    • F24D19/1006Arrangement or mounting of control or safety devices for water heating systems
    • F24D19/1009Arrangement or mounting of control or safety devices for water heating systems for central heating
    • F24D19/1012Arrangement or mounting of control or safety devices for water heating systems for central heating by regulating the speed of a pump
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24DDOMESTIC- OR SPACE-HEATING SYSTEMS, e.g. CENTRAL HEATING SYSTEMS; DOMESTIC HOT-WATER SUPPLY SYSTEMS; ELEMENTS OR COMPONENTS THEREFOR
    • F24D19/00Details
    • F24D19/10Arrangement or mounting of control or safety devices
    • F24D19/1006Arrangement or mounting of control or safety devices for water heating systems
    • F24D19/1009Arrangement or mounting of control or safety devices for water heating systems for central heating
    • F24D19/1015Arrangement or mounting of control or safety devices for water heating systems for central heating using a valve or valves
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24DDOMESTIC- OR SPACE-HEATING SYSTEMS, e.g. CENTRAL HEATING SYSTEMS; DOMESTIC HOT-WATER SUPPLY SYSTEMS; ELEMENTS OR COMPONENTS THEREFOR
    • F24D19/00Details
    • F24D19/10Arrangement or mounting of control or safety devices
    • F24D19/1006Arrangement or mounting of control or safety devices for water heating systems
    • F24D19/1009Arrangement or mounting of control or safety devices for water heating systems for central heating
    • F24D19/1048Counting of energy consumption
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24DDOMESTIC- OR SPACE-HEATING SYSTEMS, e.g. CENTRAL HEATING SYSTEMS; DOMESTIC HOT-WATER SUPPLY SYSTEMS; ELEMENTS OR COMPONENTS THEREFOR
    • F24D2220/00Components of central heating installations excluding heat sources
    • F24D2220/04Sensors
    • F24D2220/042Temperature sensors
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B30/00Energy efficient heating, ventilation or air conditioning [HVAC]
    • Y02B30/70Efficient control or regulation technologies, e.g. for control of refrigerant flow, motor or heating

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Thermal Sciences (AREA)
  • Mechanical Engineering (AREA)
  • Combustion & Propulsion (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Chemical & Material Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Air Conditioning Control Device (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a secondary network balance regulation and control method based on a reinforcement learning algorithm and pressure difference control, which comprises the following steps: establishing a digital twin model of the heat supply secondary network unit building by adopting a mechanism modeling and data identification method; the installation of heating second grade net unit building equipment includes at least: installing a variable frequency pump on a water supply pipe of a unit building with unfavorable working conditions, installing an electric regulating valve on the mouths of other unit buildings, installing a heat meter on a water supply main pipe of each unit building, installing a differential pressure transmitter on the unit building and installing a room temperature collector on residents of the unit building; dynamically predicting the unit building through a deep reinforcement learning algorithm to obtain a predicted value of the heat load of the unit building in the next time period; when the predicted value of the heat load of the unit building in the next time period is inconsistent with the current actual heat load, adjusting the frequency of the variable frequency pump by adopting a reinforcement learning algorithm and a PID algorithm based on the measured value and the set value of the pressure difference of the supply water and the return water; feeding back the acquired water supply flow demand change to a digital twin model of a second-level network unit building, and searching a pressure difference set value required by a new pressure difference control point of the changed unit building; and performing simulation verification on the pressure difference regulation and control according to the digital twin model of the second-level network unit building.

Description

Secondary network balance regulation and control method based on reinforcement learning algorithm and pressure difference control
Technical Field
The invention belongs to the technical field of intelligent heat supply, and particularly relates to a secondary network balance regulation and control method based on a reinforcement learning algorithm and pressure difference control.
Background
The centralized heating of cities and towns as an important civil engineering is always concerned by all levels of governments and society, is an industry which is mainly supported in the field of infrastructure by the nation, and is an important subject of research of the heating industry all the time, wherein the important subject is to improve the heating quality, reduce the heating cost and reduce the pollution emission. For a long time, most heating enterprises pay attention to the fact that the hydraulic balance of a primary heating network relates to the safe operation of the whole heating network, and a great deal of capital and energy are invested to research and reform. The remarkable results are obtained, and the heat loss rate and the water loss rate of the pipe network are obviously reduced. Most of the existing secondary network management means still stay in the manual regulation and control stage, and the regulation and control fineness and flexibility can not meet the requirements far away.
The hydraulic imbalance is generated because the flow of the system is changed due to the user autonomous adjustment in the heat metering and heat supplying system, so that the hydraulic working condition characteristics in the variable flow heat supplying system autonomously adjusted by the user are analyzed, and the control method of the variable flow heat supplying system is researched, and the method has important guiding significance for the operation adjustment of the heat metering and variable flow heat supplying system.
In the heat supply regulation, the quantity regulation can be divided into controlling the flow of a heat user inlet, controlling the flow of a secondary pipe network of a heat exchange station at a heat source and controlling the pressure difference of supply and return water of the most unfavorable loop. Differential pressure control in a metering heating system, in which the system flow may change at any time, is a common variable flow control method. The pressure difference control is a main method for central heating control, the most unfavorable loop exists in each heating system, the pressure difference of water supply and return in the most unfavorable loop is determined through calculation, the pressure difference or pressure at a certain position in the heating system is taken as a control parameter, and the pressure or the pressure difference at a control point is kept unchanged by changing the frequency conversion flow of a water pump when the hydraulic working condition of the system is changed. However, the existing pressure difference control mode has poor autonomous adjustment effect and energy-saving effect, and how to reasonably control the pressure difference control mode to ensure that the running state of the whole network is optimal and the heat supply quality is the best, which is the primary problem to be solved in the heat supply industry.
Based on the technical problems, a new secondary network balance regulation and control method based on a reinforcement learning algorithm and pressure difference control needs to be designed.
Disclosure of Invention
The invention aims to solve the technical problem of overcoming the defects of the prior art and provides a secondary network balance regulation and control method based on a reinforcement learning algorithm and pressure difference control.
In order to solve the technical problems, the technical scheme of the invention is as follows:
the invention provides a secondary network balance regulation and control method based on a reinforcement learning algorithm and pressure difference control, which comprises the following steps:
s1, establishing a digital twin model of the heat supply secondary network unit building by adopting a mechanism modeling and data identification method;
step S2, installing equipment of the heating secondary network unit building, which at least comprises: installing a variable frequency pump on a water supply pipe of a unit building with unfavorable working conditions, installing an electric regulating valve on the mouths of other unit buildings, installing a heat meter on a water supply main pipe of each unit building, installing a differential pressure transmitter on the unit building and installing a room temperature collector on residents of the unit building;
step S3, dynamically predicting the unit building through a depth reinforcement learning algorithm to obtain a predicted value of the heat load of the unit building in the next time period;
s4, when the predicted value of the heat load of the unit building in the current time period is inconsistent with the current actual heat load, adjusting the frequency of the variable frequency pump by adopting a reinforcement learning algorithm and a PID algorithm based on the measured value and the set value of the pressure difference of the supply water and the return water;
s5, feeding back the acquired water supply flow demand change to a digital twin model of the second-level network unit building, and searching a pressure difference set value required by a new pressure difference control point of the changed unit building; and performing simulation verification on the pressure difference regulation and control according to the digital twin model of the second-level network unit building.
Further, in step S1, establishing a digital twin model of the heating secondary network unit building by using a mechanism modeling and data identification method specifically includes:
establishing a digital twin model comprising a physical entity, a virtual entity, a twin data service and connecting elements among all components of a second-level network unit building;
the physical entity is the basis of the digital twin model and is a data source driven by the whole digital twin model; the virtual entities and the physical entities are mapped one by one and interacted in real time, elements of a physical space are depicted from multiple dimensions and multiple scales, the actual process of the physical entities is simulated, and element data is analyzed, evaluated, predicted and controlled; the twin data service integrates physical space information and virtual space information, ensures real-time performance of data transmission, provides knowledge base data including an intelligent algorithm, a model, rule standards and expert experiences, and forms a twin database by fusing the physical information, multi-temporal-spatial correlation information and the knowledge base data; the connection among the components is to realize the interconnection of the components, and the real-time acquisition and feedback of data are realized between the physical entity and the twin data service through a sensor and a protocol transmission specification; data transmission is carried out between the physical entity and the virtual entity through a protocol, physical information is transmitted to the virtual space in real time to update the correction model, and the virtual entity controls the physical entity in real time through an actuator; the information transmission between the virtual entity and the twin data service is realized through a database interface;
and identifying the digital twin model, accessing the multi-working-condition real-time operation data of the second-level network unit building into the established digital twin model, and performing self-adaptive identification correction on the simulation result of the digital twin model by adopting a reverse identification method to obtain the identified and corrected second-level network unit building digital twin model.
Further, in step S3, dynamically predicting the unit building through a depth-enhanced learning algorithm to obtain a predicted value of the heat load of the unit building in the next time period, specifically including:
obtaining historical heat supply data of a unit building, preprocessing the historical heat supply data to obtain a sample set of a load prediction model, wherein the historical heat supply data of the unit building at least comprises indoor temperature, weather data, water supply and return temperature of the unit building, water supply flow of the unit building and instantaneous heat supply of the unit building;
modeling a unit building thermal load prediction problem into a Markov decision process model, and defining a state, an action and a reward function in the Markov decision process model;
establishing a unit building thermal load prediction model by adopting a deep reinforcement learning algorithm, inputting historical heat supply data into the unit building thermal load prediction model, and training the unit building thermal load prediction model;
and outputting the heat load demand value of the unit building through the heat load prediction model of the unit building.
Further, modeling the unit building thermal load prediction problem as a markov decision process model, and defining states, actions and reward functions therein, specifically including:
the unit building heat load data has time sequence, and a unit building heat load data training sample set of k previous i moments is constructed by taking time-by-time load as a unit and is expressed as follows: x { (q) 1 ,q 2 ,…,q i ),(q 2 ,q 3 ,…,q i+1 ),…,(q k ,q k+1 ,…,q k+i )};
Defining the initial state of the thermal load of the unit building as s 0 =[q 1 ,q 2 ,…,q k ]The action taken is denoted by a, and the next time is the predicted next time unit building thermal load, the next time transitions to state s 1 =[q 1 ,q 2 ,…,q k+1 ](ii) a The constructed dynamic space set A ═ { a ═ a 1 ,a 2 ,…,a k };
Constructing a reward set R ═ R 1 ,r 2 ,…,r k },r k =-|a k -q k+i L, |; the reward value is the negative of the absolute value of the difference between the action value taken by each state and the true value of the load at the next moment, the sample set comprises k reward values, and the training sampleEach training sample in the set corresponds to one;
the best action is obtained by maximizing the cumulative reward Q (s, a), and under continuous iteration, the Q learning process is continuously updated by the reward after the action is completed, and simultaneously a good strategy is learned, so that the target reward value is maximized.
Further, a unit building thermal load prediction model is established by adopting a deep reinforcement learning algorithm, historical heat supply data is input into the prediction model, and the model is trained, and the method specifically comprises the following steps:
adding an experience playback mechanism into the DQN algorithm, and initializing a playback memory unit;
taking a deep neural network as a Q value network, and updating parameters of the deep neural network by using a gradient descent algorithm;
obtaining Q (s, a) in any state s by the heat supply data of the unit building through a current value network, selecting an action a by using a epsilon-greedy strategy after a value function is calculated through the current value network, marking the action as a time step t after each state transition, and adding the data obtained at each time step into a playback memory unit;
during the training process, a current value function is represented by a current value network, and a target value network is used to generate a target Q value, Q (s, a | θ) i ) An output action value function representing a current network for evaluating a current state action;
Figure BDA0003611589240000031
representing the output of the network of target values, using
Figure BDA0003611589240000032
Calculating an approximate action value function of the target value network;
updating parameters of a current value network by using a mean square error between a current Q value and a target Q value as an error function; the error function is expressed as: l (theta) i )=E s,a,r,s′ [(Y i -Q(s,a|θ i )) 2 ];
Randomly selecting one (s, a, r, s ') from the playback memory unit, and respectively transmitting the (s, a), s' and r to the current value networkNet, target value net and error function, for L (theta) i ) About theta i Updating by using a gradient method to obtain a predicted value, wherein the DQN algorithm updates the value function in the following way:
Figure BDA0003611589240000041
wherein γ is a discount factor; in the iteration process, only the parameter theta of the current action value function is updated in real time, and the parameter of the current value network is copied to the target value network after N iterations.
Further, the step S3 further includes: simulating to generate a virtual sample based on current historical sample data by adopting a GAN algorithm, wherein the real historical sample data is stored in a real sample pool and used for training a GAN algorithm model; virtual samples generated by the GAN algorithm are stored in a virtual sample pool; historical sample data and virtual sample data are used as input information of a DQN model of a deep reinforcement learning algorithm for training and learning, interaction is carried out with the environment through a trial and error mechanism, and load prediction of a unit building is achieved through a mode of maximizing accumulated rewards.
Further, in step S4, based on the measured value and the set value of the pressure difference between the supply water and the return water, the frequency of the variable frequency pump is adjusted by using a reinforcement learning algorithm and a PID algorithm, which specifically includes:
designing a self-adaptive PID control algorithm based on an Actor-Critic structure and an RBF network;
based on the measured value and the set value of the pressure difference between the supplied water and the returned water, a self-adaptive PID control algorithm is adopted to self-adaptively adjust PID parameters, act on a controlled object variable frequency pump, adjust the frequency of the controlled object variable frequency pump and change the pressure difference value between the supplied water and the returned water;
the control principle of the adaptive PID control algorithm based on the Actor-Critic structure and the RBF network is designed as follows: defining the measured value and the set value of the pressure difference of the supply and return water as an error e (t), and converting the error e (t) into a state vector x (t) (e (t) delta e (t) delta (t)) needed by RBF network learning through a state converter 2 e(t)] T (ii) a The state vector x (t) is used as the input of the RBF network and is counted by a hidden layer and an output layerAnd outputting the preliminary PID parameter value K ' (t) ═ K ' by the Actor ' I k′ P k′ D ]Outputting a value function V (t) by Critic; the random motion corrector corrects K' (t) according to a value function V (t), and obtains a final PID parameter K (t) ═ K I k P k D ]。
Further, the output Δ u (t) k of the PID controller p Δe(t)+k I e(t)+k D Δ 2 e(t);
The RBF network comprises an input layer, a hidden layer and an output layer, wherein the input layer comprises three input nodes, namely, e (t), delta e (t) and delta 2 e (t); the hidden layer comprises h nodes, the activation function selects a Gaussian kernel function, and the output of the nodes is calculated; the output layer consists of an Actor and a Critic, shares the resources of an input layer and a hidden layer of the RBF network, and comprises four output nodes, the first three outputs are three components of K' (t) output by the Actor, the fourth node outputs a Critic value function V (t) which is respectively expressed as:
Figure BDA0003611589240000051
Figure BDA0003611589240000052
wherein j is 1,2,3,4,5, which is the hidden layer node number; m is 1,2 and 3 are the numbers of the nodes of the output layer; w is a jm The weight value between the jth node of the bank and the mth node of the output layer Actor.
Furthermore, the Actor is used for learning strategies, and the method for parameter correction is to superpose a Gaussian interference K on the K' (t) η (ii) a The Critic is used for an evaluation value function, a TD algorithm is adopted for learning, and a TD error is defined through the value function and a return function r (t): delta TD And r (t) + gamma V (t +1) -V (t), updating the Actor and criticic weights and the RBF network parameters according to the errors.
The invention has the beneficial effects that:
the method comprises the steps of dynamically predicting a unit building through a deep reinforcement learning algorithm to obtain a predicted value of the heat load of the unit building in the next time period; when the predicted value of the heat load of the unit building in the next time period is inconsistent with the current actual heat load, adjusting the frequency of the variable frequency pump by adopting a reinforcement learning algorithm and a PID algorithm based on the measured value and the set value of the pressure difference of the supply water and the return water; feeding back the acquired water supply flow demand change to a digital twin model of a second-level network unit building, and searching a pressure difference set value required by a new pressure difference control point of the changed unit building; performing simulation verification on the pressure difference regulation and control according to the digital twin model of the second-level network unit building; the frequency of the water pump can be reasonably controlled according to the pressure difference, so that the running state of the whole network is optimal, the heat supply quality is best, the hydraulic imbalance phenomenon is effectively solved, and the balanced and stable running of the secondary network is ensured.
Additional features and advantages will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart of a secondary network balance regulation method based on a reinforcement learning algorithm and pressure difference control according to the present invention;
FIG. 2 is a diagram of a DQN model according to the invention;
FIG. 3 is a diagram of a DQN model training process of the present invention;
FIG. 4 is a block diagram of an adaptive PID controller based on an Actor-Critic architecture and an RBF network according to the present invention;
FIG. 5 is a schematic diagram of an Actor-critical learning structure based on RBF according to the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
FIG. 1 is a flow chart of a secondary network balance regulation and control method based on a reinforcement learning algorithm and pressure difference control according to the invention.
As shown in fig. 1, the present embodiment provides a secondary network balance control method based on a reinforcement learning algorithm and pressure difference control, which includes:
s1, establishing a digital twin model of the heat supply secondary network unit building by adopting a mechanism modeling and data identification method;
step S2, installing equipment of the heating secondary network unit building, which at least comprises: installing a variable frequency pump on a water supply pipe of a unit building with unfavorable working conditions, installing an electric regulating valve on the mouths of other unit buildings, installing a heat meter on a water supply main pipe of each unit building, installing a differential pressure transmitter on the unit building and installing a room temperature collector on residents of the unit building;
step S3, dynamically predicting the unit building through a depth reinforcement learning algorithm to obtain a predicted value of the heat load of the unit building in the next time period;
s4, when the predicted value of the heat load of the unit building in the current time period is inconsistent with the current actual heat load, adjusting the frequency of the variable frequency pump by adopting a reinforcement learning algorithm and a PID algorithm based on the measured value and the set value of the pressure difference of the supply water and the return water;
s5, feeding back the acquired water supply flow demand change to a digital twin model of the second-level network unit building, and searching a pressure difference set value required by a new pressure difference control point of the changed unit building; and performing simulation verification on the pressure difference regulation and control according to the digital twin model of the second-level network unit building.
In practical application, most buildings adopt unit building opening electric regulating valves, unit buildings with unfavorable working conditions adopt building distributed pumps, and the opening degree control of the electric regulating valves adopts the original regulation and control strategy, including adopting a deep learning algorithm, a reinforcement learning algorithm and a machine learning algorithm to carry out the predictive control of the opening degree; the frequency control of the building distributed pump is based on a reinforcement learning algorithm and pressure difference control; furthermore, differential pressure transmitters are typically installed at the worst case differential pressure cell.
In this embodiment, in step S1, establishing a digital twin model of the heating secondary network unit building by using a mechanism modeling and data identification method specifically includes:
establishing a digital twin model comprising a physical entity, a virtual entity, a twin data service and connecting elements among all components of a second-level network unit building;
the physical entity is the basis of the digital twin model and is a data source driven by the whole digital twin model; the virtual entities and the physical entities are mapped one by one and interacted in real time, elements of a physical space are depicted from multiple dimensions and multiple scales, the actual process of the physical entities is simulated, and element data is analyzed, evaluated, predicted and controlled; the twin data service integrates physical space information and virtual space information, ensures real-time performance of data transmission, provides knowledge base data comprising an intelligent algorithm, a model, rule standards and expert experience, and forms a twin database by fusing the physical information, multi-space-time correlation information and the knowledge base data; the connection among the components is to realize the interconnection of the components, and the real-time acquisition and feedback of data are realized between the physical entity and the twin data service through a sensor and a protocol transmission specification; data transmission is carried out between the physical entity and the virtual entity through a protocol, physical information is transmitted to the virtual space in real time to update the correction model, and the virtual entity controls the physical entity in real time through an actuator; the information transmission between the virtual entity and the twin data service is realized through a database interface;
and identifying the digital twin model, accessing the multi-working-condition real-time operation data of the second-level network unit building into the established digital twin model, and performing self-adaptive identification correction on the simulation result of the digital twin model by adopting a reverse identification method to obtain the identified and corrected second-level network unit building digital twin model.
It should be noted that, in the heating system, because of uncertainty of user autonomous adjustment, the hydraulic working condition of the pipe network system changes greatly, and the hydraulic working condition of the system is stabilized, so as to ensure that other adjusting users can be stabilized in a given flow condition while the flow of the user is reduced, and maintain the indoor temperature of the user, the user autonomous adjustment process is essentially the change process of the impedance of the pipe network or the user system.
Graph theory hydraulic working condition analysis basic principle: any fluid network is a geometric figure formed by connecting a plurality of nodes and pipelines, and is a directed graph because water flow has a certain direction. The heat supply network hydraulic model is established according to a flow balance equation and a pressure balance equation.
In order to ensure that the system has enough circulating power and can ensure that all users in a pipe network can obtain required water flow under the design working condition, a loop with the loop resistance which is the largest relative to other loops is usually selected, and the rated lift of the circulating water pump is determined according to the required pressure head of the users on the loop under the design working condition. This loop with the greatest resistance is generally referred to as the least favorable loop. In most cases, the most unfavorable loop is the loop in which the user farthest from the circulating water pump is located. At present, in the operation regulation stage of the system, the most unfavorable hydraulic loop is usually introduced into the design link of the control strategy as a reference object, for example, the reference differential pressure regulated by the water pump selects the user differential pressure on the most unfavorable loop, and the selection of the differential pressure set value usually refers to the qualification differential pressure of the user under the design working condition, or refers to the differential pressure set value level required by ensuring the user flow supply on the least unfavorable hydraulic loop.
The worst thermodynamic loop has a certain identification method, which is as follows: only one worst thermodynamic loop in the system is provided, and the branch is the worst hydraulic loop in the pipe network; the most unfavorable thermodynamic loop in the system still has only one branch, but the branch is different from the most unfavorable thermodynamic loop, and a branch in the middle of the system; there are multiple most adverse thermal loops in the system. In this moment, the most unfavorable degree of the loops is larger than that of the loops in the time interval, and the loop with the largest most unfavorable degree of the loops is selected as a reference loop for controlling the differential pressure of the water pump, so that the requirements of all users can be met.
In this embodiment, in step S3, the dynamically predicting the unit building through the depth-enhanced learning algorithm to obtain a predicted value of the heat load of the unit building in the next time period specifically includes:
obtaining historical heat supply data of a unit building, preprocessing the historical heat supply data to obtain a sample set of a load prediction model, wherein the historical heat supply data of the unit building at least comprises indoor temperature, weather data, water supply and return temperature of the unit building, water supply flow of the unit building and instantaneous heat supply of the unit building;
modeling a unit building thermal load prediction problem into a Markov decision process model, and defining a state, an action and a reward function in the Markov decision process model;
establishing a unit building thermal load prediction model by adopting a deep reinforcement learning algorithm, inputting historical heat supply data into the unit building thermal load prediction model, and training the unit building thermal load prediction model;
and outputting the heat load demand value of the unit building through the heat load prediction model of the unit building.
In this embodiment, modeling the unit building thermal load prediction problem as a markov decision process model, and defining a state, an action, and a reward function therein specifically includes:
the unit building heat load data has a time sequence, and a unit building heat load data training sample set of k previous i moments is constructed by taking time-by-time load as a unit and is expressed as follows: x { (q) 1 ,q 2 ,…,q i ),(q 2 ,q 3 ,…,q i+1 ),…,(q k ,q k+1 ,…,q k+i )};
Defining the initial state of the thermal load of the unit building as s 0 =[q 1 ,q 2 ,…,q k ]The action taken is denoted by a, and the next time is the predicted next time unit building thermal load, the next time transitions to state s 1 =[q 1 ,q 2 ,…,q k+1 ](ii) a The constructed dynamic space set A ═ { a ═ a 1 ,a 2 ,…,a k };
Constructing a reward set R ═ R 1 ,r 2 ,…,r k },r k =-|a k -q k+i L, |; the reward value is the negative number of the absolute value of the difference between the action value taken by each state and the real value of the load at the next moment, and the sample set comprises k reward values and corresponds to each training sample in the training sample set one by one;
the best action is obtained by maximizing the cumulative reward Q (s, a), and under continuous iteration, the Q learning process is continuously updated by the reward after the action is completed, and simultaneously a good strategy is learned, so that the target reward value is maximized.
Fig. 2 is a structural schematic diagram of a DQN model according to the present invention.
Fig. 3 is a diagram of a DQN model training process according to the invention.
As shown in fig. 2-3, in this embodiment, a unit building thermal load prediction model is established by using a deep reinforcement learning algorithm, historical heat supply data is input into the prediction model, and the model is trained, which specifically includes:
adding an experience playback mechanism into the DQN algorithm, and initializing a playback memory unit;
taking a deep neural network as a Q value network, and updating parameters of the deep neural network by using a gradient descent algorithm;
obtaining Q (s, a) in any state s by passing unit building heat supply data through a current value network, after a value function is calculated through the current value network, selecting an action a by using an e-greedy strategy, marking the action as a time step t by each state transition, and adding the data obtained at each time step into a playback memory unit;
during the training process, a current value function is represented by a current value network, and a target value network is used to generate a target Q value, Q (s, a | θ) i ) RepresentAn output action value function of the current network for evaluating the current state action;
Figure BDA0003611589240000091
representing the output of the network of target values, using
Figure BDA0003611589240000092
Calculating an approximate action value function of the target value network;
updating parameters of a current value network by using a mean square error between a current Q value and a target Q value as an error function; the error function is expressed as: l (theta) i )=E s,a,r,s′ [(Y i -Q(s,a|θ i )) 2 ];
Randomly selecting one (s, a, r, s ') from the playback memory unit, and respectively transmitting the (s, a), s' and r to the current value network, the target value network and the error function for L (theta) i ) About theta i And updating by using a gradient method to obtain a predicted value, wherein the DQN algorithm updates the value function in the following way:
Figure BDA0003611589240000093
wherein γ is a discount factor; in the iteration process, only the parameter theta of the current action value function is updated in real time, and the parameter of the current value network is copied to the target value network after N iterations.
In this embodiment, the step S3 further includes: simulating and generating a virtual sample based on current historical sample data by adopting a GAN algorithm, wherein the real historical sample data is stored in a real sample pool and is used for training a GAN algorithm model; virtual samples generated by the GAN algorithm are stored in a virtual sample pool; historical sample data and virtual sample data are used as input information of a DQN model of a deep reinforcement learning algorithm for training and learning, interaction is carried out with the environment through a trial and error mechanism, and load prediction of a unit building is achieved through a mode of maximizing accumulated rewards.
In practical applications, in the GAN model structure, the generator model G and the discriminator model D are represented by differentiable functions, and their respective inputs are random noise z and true data x. G (z) represents a sample generated by the generator model G that is as much as possible subject to the true data distribution; the goal of the discriminator model D is to discriminate the source of the data, labeled 1 if the discrimination input is from true data, and labeled 0 if the input is from the generator model G. In the course of continuous optimization, the generator model G aims to make the label D (G (z)) of the generated pseudo data G (z) on the discriminator model D coincide with the label D (x) of the real data x on the discriminator model D. In the learning process, the mutual confrontation between the two models and the iterative optimization process can continuously improve the performance of the generator model G, and meanwhile, when the discrimination capability of the discriminator model D is improved to the extent that the data source cannot be correctly judged, the generator model can be considered to be in the fact that the distribution of real data is learned.
It should be noted that a reinforcement learning algorithm based on generation of a countermeasure network is proposed. The algorithm collects experience samples through a random strategy at the initial training stage and adds the experience samples into a real sample pool, a confrontation network is generated through sample training in the real sample pool, then a new sample is generated through the generated confrontation network and added into a virtual sample pool, and finally training samples are selected in batch by combining the real sample pool and the virtual sample pool. The algorithm effectively solves the problem that the samples are insufficient in the early training stage of reinforcement learning, and accelerates the learning and convergence speed. Aiming at the problem that the Q learning algorithm is applied to the nonlinear load prediction performance is low, a deep Q learning load prediction algorithm based on a generation countermeasure network is provided. The algorithm introduces a deep neural network, constructs a deep Q network as a nonlinear function approximator to approximate an action value function, and solves the problems that the Q learning algorithm has poor algorithm performance and even can not be converged in a large state space by using a value function approximation method.
Fig. 4 is a block diagram of an adaptive PID controller based on an Actor-Critic architecture and an RBF network according to the present invention.
FIG. 5 is a schematic diagram of an Actor-critical learning structure based on RBF according to the present invention.
As shown in fig. 4-5, in this embodiment, in the step S4, based on the measured value and the set value of the pressure difference between the supply water and the return water, the adjusting the frequency of the variable frequency pump by using a reinforcement learning algorithm and a PID algorithm specifically includes:
designing a self-adaptive PID control algorithm based on an Actor-Critic structure and an RBF network;
based on the measured value and the set value of the pressure difference between the supplied water and the returned water, the PID parameters are adaptively adjusted by adopting an adaptive PID control algorithm, the PID parameters act on the controlled object variable frequency pump, the frequency of the controlled object variable frequency pump is adjusted, and the pressure difference value between the supplied water and the returned water is changed.
The control principle of the adaptive PID control algorithm based on the Actor-Critic structure and the RBF network is designed as follows: defining the measured value and the set value of the pressure difference of the water supply and return as an error e (t), and converting the error e (t) into a state vector x (t) ([ e (t)) delta (t) needed by RBF network learning through a state converter 2 e(t)] T (ii) a The state vector x (t) is used as the input of the RBF network, and the Actor outputs a preliminary PID parameter value K ' (t) ═ K ' after calculation of an implicit layer and an output layer ' I k′ P k′ D ]Outputting a value function V (t) by Critic; the random motion corrector corrects K' (t) according to a value function V (t), and obtains a final PID parameter K (t) ═ K I k P k D ]。
In this embodiment, the output Δ u (t) k of the PID controller p Δe(t)+k I e(t)+k D Δ 2 e(t);
The RBF network comprises an input layer, a hidden layer and an output layer, wherein the input layer comprises three input nodes, namely, e (t), delta e (t) and delta 2 e (t); the hidden layer comprises h nodes, the activation function selects a Gaussian kernel function, and the output of the nodes is calculated; the output layer consists of an Actor and a Critic, shares the resources of the input layer and the hidden layer of the RBF network, and comprises four output nodes, wherein the first three outputs are three components of K' (t) output by the Actor, and the fourth node outputs a Critic value function V (t) which is respectively expressed as:
Figure BDA0003611589240000111
Figure BDA0003611589240000112
wherein j is 1,2,3,4,5, which is the hidden layer node number; m is 1,2 and 3 are the numbers of the nodes of the output layer; w is a jm The weight value between the jth node of the bank and the mth node of the output layer Actor.
In this embodiment, the Actor is used for learning a strategy, and the parameter modification method is to superimpose a gaussian interference K on K' (t) η (ii) a The Critic is used for an evaluation value function, a TD algorithm is adopted for learning, and a TD error is defined through the value function and a return function r (t): delta TD And r (t) + gamma V (t +1) -V (t), updating the Actor and criticic weights and the RBF network parameters according to the errors.
It should be noted that the RBF network has the characteristics of strong mapping capability and simple learning rule, and is combined with an Actor-Critic structure to be used for approximating an Actor-Critic value function and a policy function. The new control algorithm of the adaptive PID control algorithm based on the Actor-Critic structure and the RBF network is designed, the PID parameters can be adjusted rapidly, the tracking of input signals is realized, and compared with the control effects of the traditional PID and other algorithms, the new controller has the advantages of quicker response and smaller overshoot.
In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. The system embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims (9)

1. A secondary network balance regulation and control method based on reinforcement learning algorithm and pressure difference control is characterized by comprising the following steps:
s1, establishing a digital twin model of the heat supply secondary network unit building by adopting a mechanism modeling and data identification method;
step S2, installing equipment of the heating secondary network unit building, which at least comprises: installing a variable frequency pump on a water supply pipe of a unit building with unfavorable working conditions, installing an electric regulating valve on the mouths of other unit buildings, installing a heat meter on a water supply main pipe of each unit building, installing a differential pressure transmitter on the unit building and installing a room temperature collector on residents of the unit building;
step S3, dynamically predicting the unit building through a depth reinforcement learning algorithm to obtain a predicted value of the heat load of the unit building in the next time period;
s4, when the predicted value of the heat load of the unit building in the current time period is inconsistent with the current actual heat load, adjusting the frequency of the variable frequency pump by adopting a reinforcement learning algorithm and a PID algorithm based on the measured value and the set value of the pressure difference of the supply water and the return water;
s5, feeding back the acquired water supply flow demand change to a digital twin model of the second-level network unit building, and searching a pressure difference set value required by a new pressure difference control point of the changed unit building; and performing simulation verification on the pressure difference regulation and control according to the digital twin model of the second-level network unit building.
2. The secondary grid balance control method according to claim 1, wherein in step S1, a mechanism modeling and data identification method is used to establish a digital twin model of the heating secondary grid unit building, and specifically includes:
establishing a digital twin model comprising a physical entity, a virtual entity, a twin data service and connecting elements among all components of a second-level network unit building;
the physical entity is a data source of the whole digital twin model;
the virtual entity carries out simulation on the actual process of the physical entity and carries out data analysis, evaluation, prediction and control on the element data;
the twin data service integrates physical space information and virtual space information, provides knowledge base data including intelligent algorithms, models, rule standards and expert experiences, and forms a twin database by fusing the physical information, multi-temporal-spatial correlation information and the knowledge base data;
the connection among the components is used for realizing the interconnection and intercommunication of the components, and the real-time acquisition and feedback of data are realized between the physical entity and the twin data service through a sensor and a protocol transmission specification;
data transmission is carried out between the physical entity and the virtual entity through a protocol, physical information is transmitted to a virtual space in real time to update a correction model, and the virtual entity controls the physical entity in real time through an actuator;
the virtual entity and the twin data service are in information transmission through a database interface;
and identifying the digital twin model, accessing the multi-working-condition real-time operation data of the second-level network unit building into the established digital twin model, and performing self-adaptive identification correction on the simulation result of the digital twin model by adopting a reverse identification method to obtain the identified and corrected second-level network unit building digital twin model.
3. The secondary grid balance control method according to claim 1, wherein in step S3, the dynamic prediction of the unit building through the deep reinforcement learning algorithm is performed to obtain a predicted value of the heat load of the unit building in the next time period, and specifically includes:
obtaining historical heat supply data of a unit building, preprocessing the historical heat supply data to obtain a sample set of a load prediction model, wherein the historical heat supply data of the unit building at least comprises indoor temperature, weather data, unit building water supply and return temperature, unit building water supply flow and unit building instantaneous heat supply;
modeling a unit building thermal load prediction problem into a Markov decision process model, and defining states, actions and reward functions in the Markov decision process model;
establishing a unit building thermal load prediction model by adopting a deep reinforcement learning algorithm, inputting historical heat supply data into the unit building thermal load prediction model, and training the unit building thermal load prediction model;
and outputting the heat load demand value of the unit building through the heat load prediction model of the unit building.
4. The two-level net balance control method according to claim 3, wherein the unit building thermal load prediction problem is modeled as a Markov decision process model, and states, actions and reward functions therein are defined, and the method specifically comprises:
the unit building heat load data has time sequence, and a unit building heat load data training sample set of k previous i moments is constructed by taking time-by-time load as a unit and is expressed as follows:
X={(q 1 ,q 2 ,…,q i ),(q 2 ,q 3 ,…,q i+1 ),…,(q k ,q k+1 ,…,q k+i )};
defining the initial state of the thermal load of the unit building as s 0 =[q 1 ,q 2 ,…,q k ]The action taken is denoted by a, and the next time is the predicted next time unit building thermal load, the next time transitions to state s 1 =[q 1 ,q 2 ,…,q k+1 ](ii) a The constructed dynamic space set A ═ a 1 ,a 2 ,…,a k };
Constructing a reward set R ═ R 1 ,r 2 ,…,r k },r k =-|a k -q k+i L; the reward value is the negative number of the absolute value of the difference between the action value taken by each state and the real value of the load at the next moment, and the sample set comprises k reward values and corresponds to each training sample in the training sample set one by one;
the best action is obtained by maximizing the cumulative reward Q (s, a), and under continuous iteration, the Q learning process is continuously updated by the reward after the action is completed, and simultaneously a good strategy is learned, so that the target reward value is maximized.
5. The secondary network balance regulation and control method of claim 3, wherein a deep reinforcement learning algorithm is adopted to establish the unit building heat load prediction model, historical heat supply data is input into the unit building heat load prediction model, and the unit building heat load prediction model is trained, and the method specifically comprises the following steps:
adding an experience playback mechanism into the DQN algorithm, and initializing a playback memory unit;
taking a deep neural network as a Q value network, and updating parameters of the deep neural network by using a gradient descent algorithm;
obtaining Q (s, a) in any state s by passing unit building heat supply data through a current value network, after a value function is calculated through the current value network, selecting an action a by using an e-greedy strategy, marking the action as a time step t by each state transition, and adding the data obtained at each time step into a playback memory unit;
during the training process, representing a current value function through a current value network, and generating a target Q value by using a target value network; q (s, a | theta) i ) An output action value function representing a current network for evaluating a current state action;
Figure FDA0003611589230000031
representing the output of the network of target values, using
Figure FDA0003611589230000032
Calculating an approximate action value function of the target value network;
updating parameters of a current value network by using a mean square error between a current Q value and a target Q value as an error function; the error function is expressed as: l (theta) i )=E s,a,r,s′ [(Y i -Q(s,a|θ i )) 2 ];
Randomly selecting one (s, a, r, s ') from the playback memory unit, and respectively transmitting the (s, a), s' and r to the current value network, the target value network and the error function for L (theta) i ) About theta i Updating by using a gradient method to obtain a predicted value, wherein the DQN algorithm updates the value function in the following way:
Figure FDA0003611589230000033
wherein γ is a discount factor; in the iteration process, only the parameter theta of the current action value function is updated in real time, and the parameter of the current value network is copied to the target value network after N iterations.
6. The secondary grid balance regulating method according to claim 3, wherein the step S3 further comprises:
simulating and generating a virtual sample based on current historical sample data by adopting a GAN algorithm, wherein the real historical sample data is stored in a real sample pool and is used for training a GAN algorithm model;
virtual samples generated by the GAN algorithm are stored in a virtual sample pool;
historical sample data and virtual sample data are used as input information of a DQN model of a deep reinforcement learning algorithm for training and learning, interaction is carried out with the environment through a trial and error mechanism, and load prediction of a unit building is achieved through a mode of maximizing accumulated rewards.
7. The secondary grid balance control method according to claim 1, wherein in step S4, based on the measured value and the set value of the pressure difference between the supply water and the return water, the frequency of the variable frequency pump is adjusted by using a reinforcement learning algorithm and a PID algorithm, and specifically comprises:
designing a self-adaptive PID control algorithm based on an Actor-Critic structure and an RBF network;
based on the measured value and the set value of the pressure difference between the supplied water and the returned water, a self-adaptive PID control algorithm is adopted to self-adaptively adjust PID parameters, act on a controlled object variable frequency pump, adjust the frequency of the controlled object variable frequency pump and change the pressure difference value between the supplied water and the returned water;
the control principle of the adaptive PID control algorithm based on the Actor-Critic structure and the RBF network is designed as follows: defining the measured value and the set value of the pressure difference of the water supply and return as an error e (t), and converting the error e (t) into a state vector x (t) ([ e (t)) delta (t) needed by RBF network learning through a state converter 2 e(t)] T (ii) a The state vector x (t) is used as the input of the RBF network, and the Actor outputs a preliminary PID parameter value K ' (t) ═ K ' after calculation of an implicit layer and an output layer ' I k′ P k′ D ]Output value from CriticFunction V (t); the random motion corrector corrects K' (t) according to a value function V (t), and obtains a final PID parameter K (t) ═ K I k P k D ]。
8. The secondary network balance regulation method of claim 7 wherein the output of the PID controller Δ u (t) k p Δe(t)+k I e(t)+k D Δ 2 e(t);
The RBF network comprises an input layer, a hidden layer and an output layer, wherein the input layer comprises three input nodes, namely, e (t), delta e (t) and delta 2 e (t); the hidden layer comprises h nodes, the activation function selects a Gaussian kernel function, and the output of the nodes is calculated; the output layer consists of an Actor and a Critic, shares the resources of the input layer and the hidden layer of the RBF network, and comprises four output nodes, wherein the first three outputs are three components of K' (t) output by the Actor, and the fourth node outputs a Critic value function V (t) which is respectively expressed as:
Figure FDA0003611589230000041
Figure FDA0003611589230000042
wherein j is 1,2,3,4,5, which is the hidden layer node number; m is 1,2 and 3 are the numbers of the nodes of the output layer; w is a jm The weight value between the jth node of the bank and the mth node of the output layer Actor.
9. The secondary network balance regulation method of claim 8, wherein the Actor is used for learning a strategy, and the parameter modification method is to superimpose a gaussian interference K on K' (t) η (ii) a The Critic is used for an evaluation value function, a TD algorithm is adopted for learning, and a TD error is defined through the value function and a return function r (t): delta TD R (t) + gamma V (t +1) -V (t), updating Actor and criticic weights, RBF network according to errorAnd (4) parameters.
CN202210432777.XA 2022-04-24 2022-04-24 Two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control Active CN114909706B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210432777.XA CN114909706B (en) 2022-04-24 2022-04-24 Two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210432777.XA CN114909706B (en) 2022-04-24 2022-04-24 Two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control

Publications (2)

Publication Number Publication Date
CN114909706A true CN114909706A (en) 2022-08-16
CN114909706B CN114909706B (en) 2024-05-07

Family

ID=82764249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210432777.XA Active CN114909706B (en) 2022-04-24 2022-04-24 Two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control

Country Status (1)

Country Link
CN (1) CN114909706B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115129430A (en) * 2022-09-01 2022-09-30 山东德晟机器人股份有限公司 Robot remote control instruction issuing method and system based on 5g network
CN117830033A (en) * 2024-03-06 2024-04-05 深圳市前海能源科技发展有限公司 Regional cooling and heating system regulation and control method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2182296A2 (en) * 2008-10-28 2010-05-05 Oy Scancool Ab District heating arrangement and method
CN108916986A (en) * 2018-09-10 2018-11-30 常州英集动力科技有限公司 The secondary network flow-changing water dynamic balance of information physical fusion regulates and controls method and system
CN113091123A (en) * 2021-05-11 2021-07-09 杭州英集动力科技有限公司 Building unit heat supply system regulation and control method based on digital twin model
CN113446661A (en) * 2021-07-30 2021-09-28 西安热工研究院有限公司 Intelligent and efficient heat supply network operation adjusting method
CN113657031A (en) * 2021-08-12 2021-11-16 杭州英集动力科技有限公司 Digital twin-based heat supply scheduling automation realization method, system and platform
CN113757788A (en) * 2021-09-15 2021-12-07 河北工大科雅能源科技股份有限公司 Station-load linked two-network balance online dynamic intelligent regulation and control method and system
CN114183796A (en) * 2021-11-26 2022-03-15 杭州英集动力科技有限公司 Optimal scheduling method and device based on electric heating and central heating multi-energy complementary system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2182296A2 (en) * 2008-10-28 2010-05-05 Oy Scancool Ab District heating arrangement and method
CN108916986A (en) * 2018-09-10 2018-11-30 常州英集动力科技有限公司 The secondary network flow-changing water dynamic balance of information physical fusion regulates and controls method and system
CN113091123A (en) * 2021-05-11 2021-07-09 杭州英集动力科技有限公司 Building unit heat supply system regulation and control method based on digital twin model
CN113446661A (en) * 2021-07-30 2021-09-28 西安热工研究院有限公司 Intelligent and efficient heat supply network operation adjusting method
CN113657031A (en) * 2021-08-12 2021-11-16 杭州英集动力科技有限公司 Digital twin-based heat supply scheduling automation realization method, system and platform
CN113757788A (en) * 2021-09-15 2021-12-07 河北工大科雅能源科技股份有限公司 Station-load linked two-network balance online dynamic intelligent regulation and control method and system
CN114183796A (en) * 2021-11-26 2022-03-15 杭州英集动力科技有限公司 Optimal scheduling method and device based on electric heating and central heating multi-energy complementary system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115129430A (en) * 2022-09-01 2022-09-30 山东德晟机器人股份有限公司 Robot remote control instruction issuing method and system based on 5g network
CN115129430B (en) * 2022-09-01 2022-11-22 山东德晟机器人股份有限公司 Robot remote control instruction issuing method and system based on 5g network
CN117830033A (en) * 2024-03-06 2024-04-05 深圳市前海能源科技发展有限公司 Regional cooling and heating system regulation and control method and device, electronic equipment and storage medium
CN117830033B (en) * 2024-03-06 2024-06-04 深圳市前海能源科技发展有限公司 Regional cooling and heating system regulation and control method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114909706B (en) 2024-05-07

Similar Documents

Publication Publication Date Title
Li et al. Coordinated load frequency control of multi-area integrated energy system using multi-agent deep reinforcement learning
CN109270842B (en) Bayesian network-based regional heat supply model prediction control system and method
CN112232980B (en) Regulation and control method for heat pump unit of regional energy heat supply system
CN114909706B (en) Two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control
CN107726358A (en) Boiler Combustion Optimization System and method based on CFD numerical simulations and intelligent modeling
CN107203687B (en) Multi-target cooperative intelligent optimization control method for desulfurization process of absorption tower
CN102129259B (en) Neural network proportion integration (PI)-based intelligent temperature control system and method for sand dust environment test wind tunnel
CN112460741B (en) Control method of building heating, ventilation and air conditioning system
CN114777192B (en) Secondary network heat supply autonomous optimization regulation and control method based on data association and deep learning
CN113241762B (en) Echo state network self-adaptive load frequency control method based on event triggering
CN114811713B (en) Two-level network inter-user balanced heat supply regulation and control method based on mixed deep learning
CN111461466A (en) Heating household valve adjusting method, system and equipment based on L STM time sequence
CN108376294A (en) A kind of heat load prediction method of energy supply feedback and meteorologic factor
CN114909707B (en) Heat supply secondary network regulation and control method based on intelligent balance device and reinforcement learning
CN116755409A (en) Coal-fired power generation system coordination control method based on value distribution DDPG algorithm
CN116070853A (en) Autonomous emergency scheduling method and system under accident condition of heating system
Salvador et al. Historian data based predictive control of a water distribution network
CN115034133A (en) Jet pump heat supply system implementation method based on information physical fusion
CN114970080A (en) Multi-zone cooperative heat supply scheduling method based on multi-agent adjustment cost consistency
CN112947606A (en) Boiler liquid level control system and method based on BP neural network PID predictive control
CN115013862B (en) Autonomous optimal operation method of heating system based on jet pump and auxiliary circulating pump
CN116300755A (en) Double-layer optimal scheduling method and device for heat storage-containing heating system based on MPC
CN113759714A (en) Fuzzy clustering prediction control method for ultra-supercritical thermal power generating unit
CN117389132A (en) Heating system multi-loop PID intelligent setting system based on cloud edge end cooperation
CN115310760A (en) Gas system dynamic scheduling method based on improved near-end strategy optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant