CN114739229A - Heat exchange process important parameter control method based on reinforcement learning - Google Patents

Heat exchange process important parameter control method based on reinforcement learning Download PDF

Info

Publication number
CN114739229A
CN114739229A CN202210221479.6A CN202210221479A CN114739229A CN 114739229 A CN114739229 A CN 114739229A CN 202210221479 A CN202210221479 A CN 202210221479A CN 114739229 A CN114739229 A CN 114739229A
Authority
CN
China
Prior art keywords
heat exchange
exchange process
control
reinforcement learning
important
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210221479.6A
Other languages
Chinese (zh)
Inventor
王轩
王瑞
田华
蔡金文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202210221479.6A priority Critical patent/CN114739229A/en
Publication of CN114739229A publication Critical patent/CN114739229A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F28HEAT EXCHANGE IN GENERAL
    • F28FDETAILS OF HEAT-EXCHANGE AND HEAT-TRANSFER APPARATUS, OF GENERAL APPLICATION
    • F28F27/00Control arrangements or safety devices specially adapted for heat-exchange or heat-transfer apparatus
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B30/00Energy efficient heating, ventilation or air conditioning [HVAC]
    • Y02B30/70Efficient control or regulation technologies, e.g. for control of refrigerant flow, motor or heating
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/12Improving ICE efficiencies

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Thermal Sciences (AREA)
  • Mechanical Engineering (AREA)
  • General Engineering & Computer Science (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a heat exchange process important parameter control method based on reinforcement learning, which comprises the following steps: firstly, using an algorithm framework of reinforcement learning, taking a controller of heat exchange process parameters as an intelligent agent of reinforcement learning, and controlling actions output by the intelligent agent, namely control variables; and secondly, taking the heat exchanger boundary condition control signal or an actuator control signal which directly influences the heat exchanger boundary condition as the output action of the intelligent body, taking the important state parameter of the heat exchange process as the observed quantity of the intelligent body, constructing a reward function by the instant reward which is closer to the control target, and converging the output action of the intelligent body towards the maximum direction of the reward function through the continuous training of the reinforcement learning algorithm to finally obtain the intelligent body which can accurately control the change of the important state parameter of the heat exchange process. The invention can accurately and reliably control the important parameters of the heat exchanger in the heat exchange process, and improve the control precision of the parameters of the heat exchange process under the boundary condition of severe and frequent fluctuation.

Description

Heat exchange process important parameter control method based on reinforcement learning
Technical Field
The invention relates to the technical field of energy utilization, in particular to a heat exchange process important parameter control method based on reinforcement learning.
Background
At present, the heat exchange process widely exists in the industries of energy, chemical industry, power and the like, and is a very important link in industrial production.
The heat exchange process has a crucial influence on the steady-state and dynamic performance of the whole system, and therefore, in order to ensure the safe and efficient operation of the system, some important parameters in the heat exchanger, such as the temperature and pressure of the fluid, need to be effectively controlled. However, since the boundary conditions in the actual heat exchange process are changed frequently, especially the boundary conditions are changed frequently, for example, the heat exchanger for recovering the residual heat of the flue gas of the internal combustion engine for a vehicle has the situation that the fluctuation of the flue gas is frequent and severe, which brings great challenges to the control of the parameters.
When the conventional PID control (i.e., PID control) faces such a strongly fluctuating boundary condition, it often fails to exhibit a satisfactory control effect, and there is a problem of poor control accuracy.
Therefore, there is an urgent need to develop a technology that can accurately and reliably control important parameters of a heat exchanger in the heat exchange process and improve the control accuracy of the parameters of the heat exchange process under the boundary condition of severe and frequent fluctuation.
Disclosure of Invention
The invention aims to provide a heat exchange process important parameter control method based on reinforcement learning aiming at the technical defects in the prior art.
Therefore, the invention provides a heat exchange process important parameter control method based on reinforcement learning, which comprises the following steps:
firstly, using an algorithm framework of reinforcement learning, taking a controller of heat exchange process parameters as an intelligent agent of reinforcement learning, and controlling actions output by the intelligent agent, namely control variables;
and secondly, taking a heat exchanger boundary condition control signal or an actuator control signal which directly influences the heat exchanger boundary condition as the output action of the intelligent body, taking an important state parameter of the heat exchange process as the observed quantity of the intelligent body, constructing a reward function by rewarding the more close to a control target and the larger the real-time reward, and enabling the output action of the intelligent body to converge towards the maximum direction of the reward function through continuous training of an enhanced learning algorithm, thereby finally obtaining the intelligent body capable of accurately controlling the change of the important state parameter of the heat exchange process.
Preferably, in the second step, when the heat exchanger is an evaporator, the action output by the intelligent agent is the inlet flow of the cold fluid of the evaporator if the boundary condition control signal of the evaporator is adopted; if an actuator control signal is used which directly influences the boundary conditions, this is the rotational speed signal of the pump.
Preferably, in the second step, the important state parameters of the heat exchange process, which are observed quantities of the intelligent agent, specifically include the temperature and pressure of the cold fluid and the hot fluid, the change rate of the temperature and pressure of the cold fluid and the hot fluid, and the flow rate of the cold fluid and the hot fluid and the control error of the important state parameters of the controlled heat exchange process.
Preferably, in the second step, the control objectives are, in particular: target values of important state parameters in the heat exchange process;
when the heat exchanger is an evaporator, the target value of the superheat degree of the cold fluid outlet of the evaporator is obtained.
Preferably, in the second step, the pre-constructed reward function is a function formed by an error obtained by directly subtracting the actual value of the controlled heat exchanger parameter from the control reference value;
the smaller the error, the larger the value of the reward function;
the controlled heat exchanger parameters are important state parameters of the heat exchange process.
Preferably, when the important state parameter of the controlled heat exchange process is the superheat degree of the working medium at the outlet of the evaporator, the pre-constructed reward function is as follows:
Figure BDA0003533679600000021
in formula (1), e represents an error between the degree of superheat and the target value, a represents motion, sup represents the degree of superheat, and subscript t represents time.
Compared with the prior art, the heat exchange process important parameter control method based on reinforcement learning is scientific in design, solves the problem that the parameter control precision of the heat exchange process is poor under the condition of transient fluctuation, can accurately and reliably control important parameters of a heat exchanger in the heat exchange process, improves the parameter control precision of the heat exchange process under the boundary condition of severe and frequent fluctuation, and has great practical significance.
Drawings
FIG. 1 is a flow chart of a method for controlling important parameters of a heat exchange process based on reinforcement learning according to the present invention;
fig. 2a is a schematic diagram of a principle of a first control structure adopted in an embodiment of a method for controlling important parameters of a heat exchange process based on reinforcement learning, that is, a control principle diagram of an action of an agent is a boundary condition control signal (a heat exchanger boundary condition control signal);
FIG. 2b is a schematic diagram of a second control structure adopted in the embodiment of the method for controlling important parameters of a heat exchange process based on reinforcement learning, which is provided by the present invention, that is, a control schematic diagram of an agent action using an actuator control signal (an actuator signal directly influencing the boundary condition of a heat exchanger) as an agent action
FIG. 3 is a schematic diagram of the control effect of the trained agent under a fluctuating heat source using the control structure of FIG. 2a according to the present invention;
fig. 4 is a schematic diagram of a DDPG algorithm framework.
Detailed Description
In order that those skilled in the art will better understand the technical solution of the present invention, the present invention will be further described in detail with reference to the accompanying drawings and embodiments.
Referring to fig. 1 to 4, the invention provides a heat exchange process important parameter control method based on reinforcement learning, which comprises the following steps:
firstly, using an algorithm framework of reinforcement learning, taking a controller of heat exchange process parameters (when a heat exchanger is an evaporator in a Rankine cycle, a cold fluid temperature controller of the evaporator) as an intelligent agent of reinforcement learning, wherein the intelligent agent outputs actions (namely the actions of the intelligent agent) which are control variables;
it should be noted that, when the controller of the heat exchange process parameter is an evaporator in a rankine cycle (that is, when the control of the superheat degree of a cold fluid outlet of the evaporator in the rankine cycle is taken as an example), the method for controlling the important parameter of the heat exchange process based on reinforcement learning, that is, the method for controlling the superheat degree based on reinforcement learning, provided by the present invention.
And secondly, taking a boundary condition control signal (such as a control signal of flow of cold and hot fluids) of a heat exchanger (such as an evaporator) or an actuator control signal (such as a rotating speed control signal of a pump) which directly influences the boundary condition of the heat exchanger as an output action of the intelligent body, taking important state parameters of a heat exchange process as an observed quantity of the intelligent body, constructing a reward function by instantly rewarding the intelligent body to be larger as the heat exchange process is closer to a control target, and enabling the output action of the intelligent body to converge towards the direction of the maximum reward function through continuous training of an enhanced learning algorithm to finally obtain the intelligent body which can accurately control the change of the important state parameters of the heat exchange process, or the intelligent body can be called as an intelligent controller.
In the second step, in terms of implementation, the control targets may specifically be: target values of important state parameters in the heat exchange process; when the heat exchanger is an evaporator in a Rankine cycle, the target superheat value of the cold fluid outlet of the evaporator is obtained.
It should be noted that, for the present invention, the control method of the present invention uses an algorithm framework of reinforcement learning, that is, an intelligence body of reinforcement learning is used as a controller of a heat exchange process parameter (for example, superheat degree of cold fluid of an evaporator), and an output action of the intelligence body is a control variable, the intelligence body obtains an instant reward by observing an environmental state (for example, important state parameter of heat exchange process is used as an observed quantity, and for example, important state parameter of cold fluid is used as an observed quantity), and then the intelligence body interacts with the environment (for example, heat exchange process) continuously according to a constructed reward function, and converges towards a direction of large reward continuously, and finally trains to obtain an intelligence body capable of accurately controlling the parameter, or can be called as an intelligent controller.
For the invention, the optimal control strategy is continuously learned through a reinforcement learning algorithm. In view of the strong optimization decision-making capability of the reinforcement learning algorithm, particularly the deep reinforcement learning algorithm, which has been proved in other fields, the control method can achieve the effect exceeding that of the traditional control method on the parameter control of the heat exchange process.
In the second step, specifically, when the heat exchanger is an evaporator in a rankine cycle, the action (i.e., the control variable) output by the intelligent agent is the inlet flow rate of the cold fluid of the evaporator if the boundary condition control signal of the evaporator is adopted, and the control structure is as shown in fig. 2 a; if an actuator (e.g. a working fluid pump) control signal which directly influences the boundary conditions is used, the control structure is shown in fig. 2b, and the control signal is a pump rotating speed signal.
In the present invention, in terms of specific implementation, important parameters of the heat exchange process to be controlled specifically include parameters such as temperature and pressure of the hot fluid or the cold fluid, for example, the superheat degree of the outlet of the cold fluid of the evaporator.
In particular, the boundary conditions of the heat exchanger are the inlet and outlet flow rates of the cold and hot fluids and the inlet temperature. For example, the boundary conditions may be the inlet temperature and flow rate of the cold fluid (working fluid) and hot fluid (heat source) in the evaporator, and the outlet flow rate, which are parameters necessary for the heat exchange process.
The cold fluid inlet flow is the flow at the evaporator cold fluid inlet, and the cold fluid outlet flow is the flow at the evaporator cold fluid outlet. The flow rate of the hot fluid (i.e., the heat source) is the flow rate at the hot fluid inlet of the evaporator, and the hot fluid outlet flow rate is the flow rate at the hot fluid outlet of the evaporator.
In the present invention, as shown in fig. 2b, in the embodiment, when the heat exchanger is an evaporator, the actuator directly influencing the boundary condition is only a working medium pump, and the working medium pump is used for changing the boundary condition of the flow rate of the cold fluid.
In the second step, the output action of the agent is a control signal (heat exchanger boundary condition control signal) for changing the boundary condition of the heat exchanger of the controlled parameter, or an actuator signal directly influencing the boundary condition of the heat exchanger. If the flow of cold and hot fluid is adjusted (the rotating speed of the corresponding fluid pump is changed when the flow is reflected on the actuator), the temperature of the fluid after heat exchange can be changed.
It should be noted that, in the present invention, the heat exchanger boundary condition control signal is a change signal thereof, such as a change signal of the boundary condition flow. The control of these boundary conditions requires corresponding actuators, for example, the control of flow requires an actuator pump, so that a pump control signal is required to adjust the rotation speed of the pump, thereby achieving the purpose of controlling the flow rate of the boundary conditions.
In the invention, the heat exchanger boundary condition control signal and the actuator control signal directly influencing the boundary condition of the heat exchanger are used for influencing the parameters of the heat exchange process. For example, referring to fig. 2a and 2b, when the heat exchanger is an evaporator, the boundary condition of control is the cold fluid flow, and the boundary condition of control needs to be achieved by controlling the pump rotation speed, so that the control signal of the working medium pump rotation speed directly influences the boundary condition cold fluid flow.
In the invention, the heat exchanger boundary condition control signal is provided by an intelligent agent, namely a controller of heat exchange process parameters (when the heat exchanger is an evaporator in a Rankine cycle, a cold fluid temperature controller of the evaporator). The first type of control structure can transmit the heat exchanger boundary condition control signals to the PID controller through digital signal transmission, and the second type of control structure can transmit the actuator control signals which directly influence the heat exchanger boundary condition to the pump (such as a working medium pump).
In the second step, in concrete implementation, if the output action of the intelligent agent is a heat exchanger boundary condition control signal, but not an actuator control signal directly influencing the boundary condition, the intelligent agent further comprises a control actuator for tracking the output action of the intelligent agent;
for example, a PID controller (proportional-integral-derivative controller) may be used to control the actuator.
In the present invention, when the heat exchanger is an evaporator, the flow rate control signal, which is a boundary condition of the output of the agent (i.e., the output operation), needs to be changed by changing the rotational speed of the pump, and therefore, the flow rate is tracked by adjusting the rotational speed of the pump by using this signal as a tracking signal of the pump. The flow rate of the pump is the working medium flow rate at the inlet of the evaporator. The PID controller (proportional-integral-derivative controller) has the function of tracking a reference signal, namely, a flow signal is sent to the PID controller of the pump to be used as the reference signal, and after receiving the reference flow signal, the PID controller inputs a control signal to the pump to adjust the rotating speed of the pump so that the flow of the pump changes according to the flow control signal.
It should be noted that, in the second step, the control signal of the boundary condition is used as the action output by the agent, which is suitable for the situation that many other processes (i.e. other heat exchange processes for changing the temperature of the fluid) exist between the actuator and the controlled heat exchanger, so that the control signal of the boundary condition is directly used as the action, which is not affected by other processes, and other processes are not required to be taken as the environment during training, thereby saving the training cost. If the actuator control signal (namely the actuator control signal directly influencing the boundary condition of the heat exchanger) is directly adopted as the action output by the intelligent agent, because the action effect of the actuator is influenced by the processes, other processes must be taken as the environment during training. If no other process (namely no other heat exchange process for changing the temperature of the fluid) exists between the actuator and the controlled heat exchanger, the method is suitable for directly adopting the actuator control signal (namely the actuator control signal directly influencing the boundary condition of the heat exchanger) as the action output by the intelligent agent.
In the second step, in the concrete implementation, a preset high-precision heat exchanger dynamic simulation model or an actual heat exchange process is used as an interactive environment of the reinforcement learning intelligent agent.
It should be noted that the environment interacting with the agent is a heat exchange process, which includes both an actual heat exchange process and a high-precision dynamic simulation model of the heat exchange process. The observed quantities of the intelligent agent to the environment are important state quantities of the heat exchange process, including the temperature and pressure of the cold fluid and the hot fluid, and the change rate of the cold fluid and the hot fluid (namely the temperature and the pressure of the cold fluid and the hot fluid), and the flow rate of the cold fluid and the hot fluid and the control error of the parameters of the controlled heat exchanger. Wherein, for example, the pressure is the pressure of the cold and hot fluid in the evaporator.
In the present invention, the environment in which the agent interacts, i.e. the controlled process, is used to interact with the agent.
In the invention, the preset high-precision heat exchanger dynamic simulation model is a mathematical model, can be obtained by the existing mathematical mechanism and the existing conventional modeling mode, and is used for simulating the actual heat exchange process.
For the present invention, the agent evaluates the performance of the action by the reward earned, which is calculated by a reward function, characterized in that the reward earned by the action is greater if the action brings the controlled parameter closer to the target value. In the training process of the intelligent agent, the intelligent agent inputs an action to the environment each time, the environment immediately generates corresponding output, and an instant reward is returned to the intelligent agent to evaluate the action. In the process of exploring a large amount of action outputs of the intelligent agent, the training algorithm of reinforcement learning enables the action of the intelligent agent to be gradually developed towards the direction that the reward is increased, namely the direction that the control precision is increased. After many times of training, an intelligent agent or an intelligent controller which can accurately control the change of parameters (namely important state parameters in the heat exchange process) is finally obtained.
In the second step, in particular implementation, important state parameters of the heat exchange process, as observed quantities of the intelligent agent, specifically include the temperature and pressure of the cold fluid and the hot fluid, and the change rate of the cold fluid and the hot fluid (i.e., the temperature and pressure of the cold fluid and the hot fluid), and control errors of the flow rate of the cold fluid and the hot fluid and the controlled heat exchanger parameters (i.e., important state parameters of the heat exchange process).
In the second step, in particular, the pre-constructed reward function is a function formed by an error (namely, a control error of the controlled heat exchanger parameter) obtained by directly subtracting an actual value of the controlled heat exchanger parameter (namely, an important state parameter in the heat exchange process) from a control reference value. The reward function is characterized in that the smaller the error, the larger the value of the reward function.
It should be noted that the actual value of the controlled heat exchanger parameter (i.e. the important state parameter in the heat exchange process) is the measured value in the heat exchange process, and can be obtained by model calculation or actual measurement;
the control reference value of the controlled heat exchanger parameter (i.e. the important state parameter of the heat exchange process) is a generic term, i.e. it refers to a target value, which is an artificially given target value.
It should be noted that the error between the actual value of the controlled heat exchanger parameter (i.e. the important state parameter of the heat exchange process) and the control reference value is obtained by directly subtracting the actual value of the controlled heat exchanger parameter from the control reference value.
It should be noted that the reward function has no general formula, but has general characteristics that the larger the error between the actual value of the controlled heat exchanger parameter and the control reference value is, the smaller the reward is.
In the second step, in terms of specific implementation, when the controlled heat exchanger parameter (i.e., the important state parameter in the heat exchange process) is the superheat degree of the working medium at the outlet of the evaporator, the pre-constructed reward function is as follows:
Figure BDA0003533679600000071
in formula (1), e represents an error between the degree of superheat and the target value, a represents motion, sup represents the degree of superheat, and subscript t represents time.
In the formula (1), the first five items are used for judging the performance of the reference tracking, the tracking error is small, and the reward is large. The sixth term is to avoid too frequent fluctuations in pump speed. The seventh term indicates that if the superheat is below the lower limit or above the upper limit, the training is stopped to save training time and a large penalty is returned.
In the invention, the control method is trained by a reinforcement learning algorithm, and can comprise various reinforcement learning algorithms, such as a deep reinforcement learning algorithm.
In order to more clearly understand the technical solution of the present invention, the technical solution of the present invention is described below by specific examples.
Examples are given.
The control of the degree of superheat at the cold fluid outlet of the evaporator in the rankine cycle is taken as an example. In this embodiment, the environment is a dynamic simulation model of the evaporator (the model is a mathematical model, and is obtained by modeling in an existing conventional manner through an existing mathematical mechanism and is used for simulating an actual heat exchange process).
The boundary conditions of the model are the inlet temperature and flow rate of the cold fluid (i.e., working fluid) and the hot fluid (i.e., heat source), and the outlet flow rate. Both the inlet temperature (i.e., inlet temperature) and the flow rate can generally be actively regulated, and thus the data can be given directly, or as control variables.
In this embodiment, the inlet temperatures of the cold and hot fluids and the temperature of the hot fluid are artificially assigned, and the inlet flow rate of the cold fluid is used as a control variable and is directly controlled by the rotation speed of the pump, so the environment model in this embodiment includes a model of the pump (the model is a mathematical model and is obtained by modeling through a pump performance curve in the conventional manner to calculate the flow rate through the pump).
The working medium outlet of the evaporator is connected with an expander, so that the change of the flow of the working medium outlet can be calculated by using an expansion valve model (the model is a mathematical model, is obtained by modeling through a valve performance curve in the conventional mode and is used for calculating the flow passing through the valve). The hot fluid (i.e., the heat source) is an open system, with a pressure approximately equal to atmospheric pressure, so that the flow at its outlet can be calculated from the inlet flow.
It should be noted that the inlet and outlet flows of the heat source are artificially given, the inlet flow of the cold source is calculated by the pump model, and the outlet flow is calculated by the valve model.
In the invention, the model of the heat exchanger (such as an evaporator) can be established by adopting a traditional finite volume method, so that the establishment of the interactive environment model of the intelligent agent is completed.
As mentioned above, the flow rate of the cold fluid (i.e. working fluid) is a control variable, whereas the flow rate of the cold fluid (i.e. working fluid) is changed by changing the rotational speed of the working fluid pump in this embodiment. If the control signal of the working medium flow (namely the boundary condition control signal of the heat exchanger) is adopted as the action output by the intelligent agent, the control structure shown in the attached figure 2a is adopted; if the rotating speed control signal of the working medium pump (namely the actuator control signal which directly influences the boundary condition of the heat exchanger) is directly adopted as the action output by the intelligent body, the control structure shown in the attached figure 2b is adopted.
The observed quantities of the intelligent agent are the outlet pressure and temperature of the cold fluid (i.e. the working fluid) and the rate of change of the outlet pressure and temperature of the cold fluid, and also the flow rate of the intelligent agent and the error between the actual value and the target value of the controlled heat exchanger parameter (i.e. the control error of the controlled heat exchanger parameter).
According to the principle that the closer the controlled heat exchanger parameter (i.e. the superheat degree of the working medium at the outlet of the evaporator) is to the target value, the larger the reward is, the reward function shown in the following formula (1) is constructed in the example.
Figure BDA0003533679600000091
In formula (1), e represents an error between the degree of superheat and the target value, a represents motion, sup represents the degree of superheat, and subscript t represents time.
In the formula (1), the first five items are used for judging the performance of the reference tracking, the tracking error is small, and the reward is large. The sixth term is to avoid too frequent fluctuations in pump speed. The seventh term indicates that if the superheat is below the lower limit or above the upper limit, the training is stopped to save training time and a large penalty is returned.
In this embodiment, the control of the superheat degree of the evaporator outlet working medium in the organic rankine cycle is regarded as a continuous problem, and a DDPG (Deep Deterministic Policy Gradient) algorithm in an reinforcement learning algorithm is adopted. DDPG is a strategy-based deep reinforcement learning algorithm, and the aim of the method is to directly optimize strategies (actors) and train reviewers to evaluate action values. The actor selects an action and the reviewer tells the actor whether the action is appropriate. In the process, the actor performs continuous iteration to obtain an optimal action strategy, and the reviewer performs continuous iteration to improve the accuracy of value function approximation. Both actors and reviewers use deep neural networks for representation.
DDPG Algorithm framework As shown in FIG. 4, actor network μ (s | θ)μ) The observation s is used as input, and the corresponding action for maximizing the long-term return is output. Criticizing network Q (s, a | omega [ ]Q) And taking the observation s and the action a as input, and outputting the action value. To improve the stability of the training optimization process, the DDPG agent creates two additional networks mu' (s | theta)μ’) And Q' (s, a | ω)Q’) And are referred to as target actors and target critics. ThetaμandωQIs a network parameter, the training process includes optimizing θμAnd omegaQ. According to theta respectivelyμAnd ωQUpdate of latest value θμ’And omegaQ’. The specific training process is as follows:
with the same random parameter ω0Initializing criticizing network Q (s, a | omega [ ]Q) And Q' (s, a | ω)Q’). With the same random parameter theta0Initializing actor network μ (s | θ)μ) And μ' (s | θ)μ’)。
The following steps are repeated for each step of training:
1. and taking the pump rotating speed change action a as mu(s) + NM for the current observed value s, namely the outlet pressure and the temperature of the working medium, and the change rate of the outlet pressure and the temperature of the working medium, the error of the actual value and the target value of the controlled parameter and the accumulated amount of the error, wherein NM is a random noise model. The evaporator simulation model performs action a, returns the next set of observations s' and calculates the return instant prize by the prize function equation (1). The experience (s, a, r, s') is stored in a playback memory buffer. Each sample of training data is represented as(s)i,ai,ri,si+1)。
2. Randomly taking N samples from a playback memory buffer, and updating the criticizing parameter omega by minimizing a loss function LQ. In formula (2), γ is a discount factor for calculating a long-term prize.
Figure BDA0003533679600000101
yi=ri+γQ′(Si+1,μ′(Si+1μ′)|ωQ′) Equation (3)
3. Updating actor network parameter θ with policy gradients in equations (4) through (6)μThe expected discount reward is maximized.
Figure BDA0003533679600000102
Figure BDA0003533679600000103
Figure BDA0003533679600000104
4. Updating theta of the target actor network and the target criticizer network according to formulas (7) and (8)μ’And ωQ’The value is obtained.
θQ′=τθQ+(1-τ)θQ′Equation (7);
θμ′=τθμ+(1-τ)θμ′equation (8);
this process is repeated for each training segment until the actor network and critic network converge, at which point training of the intelligent controller is complete. With the control structure of fig. 2a, the control effect of the trained agent under the fluctuating heat source is shown in fig. 3. In fig. 3, comparing the control effect of the control method with that of the conventional PID control method, it is obvious that the control method of the present invention has a much higher following effect on the target superheat degree than the conventional PID control method, and the control performance is excellent.
In conclusion, compared with the prior art, the method for controlling the important parameters of the heat exchange process based on reinforcement learning provided by the invention has the advantages that the design is scientific, the problem of poor parameter control precision of the heat exchange process under the condition of transient fluctuation is solved, the method can accurately and reliably control the important parameters of the heat exchanger in the heat exchange process, the control precision of the parameters of the heat exchange process under the boundary condition of severe frequent fluctuation is improved, and the method has great practical significance.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (6)

1. A heat exchange process important parameter control method based on reinforcement learning is characterized by comprising the following steps:
step one, using an algorithm framework of reinforcement learning, taking a controller of heat exchange process parameters as an intelligent agent of reinforcement learning, and controlling the action output by the intelligent agent, namely a control variable;
and secondly, taking the heat exchanger boundary condition control signal or an actuator control signal which directly influences the heat exchanger boundary condition as the output action of the intelligent body, taking the important state parameter of the heat exchange process as the observed quantity of the intelligent body, constructing a reward function by the instant reward which is closer to the control target, and converging the output action of the intelligent body towards the maximum direction of the reward function through the continuous training of the reinforcement learning algorithm to finally obtain the intelligent body which can accurately control the change of the important state parameter of the heat exchange process.
2. The reinforcement learning-based heat exchange process important parameter control method according to claim 1, wherein in the second step, when the heat exchanger is an evaporator, the action output by the intelligent agent is the inlet flow of the cold fluid of the evaporator if the boundary condition control signal of the evaporator is adopted; if an actuator control signal is used which directly influences the boundary conditions, this is the rotational speed signal of the pump.
3. The reinforcement learning-based control method for important parameters of heat exchange process according to claim 1, wherein in the second step, the important state parameters of the heat exchange process, as observed quantities of the intelligent agent, specifically include the temperatures and pressures of the cold fluid and the hot fluid, and the rates of change of the temperatures and pressures of the cold fluid and the hot fluid, and the control errors of the flow rates of the cold fluid and the hot fluid and the important state parameters of the controlled heat exchange process.
4. The method for controlling important parameters of the heat exchange process based on reinforcement learning as claimed in claim 1, wherein in the second step, the control objective is specifically: target values of important state parameters in the heat exchange process;
when the heat exchanger is an evaporator, the target value of the superheat degree of the cold fluid outlet of the evaporator is obtained.
5. The method for controlling important parameters of the heat exchange process based on reinforcement learning as claimed in claim 1, wherein in the second step, the pre-constructed reward function is a function formed by an error obtained by directly subtracting the actual value of the parameter of the controlled heat exchanger from the control reference value;
the smaller the error, the larger the value of the reward function;
the controlled heat exchanger parameters are important state parameters of the heat exchange process.
6. The reinforcement learning-based heat exchange process important parameter control method as claimed in any one of claims 1 to 5, wherein when the important state parameter of the controlled heat exchange process is the superheat degree of the working medium at the outlet of the evaporator, the pre-constructed reward function is as follows:
Figure FDA0003533679590000021
in formula (1), e represents an error between the degree of superheat and the target value, a represents motion, sup represents the degree of superheat, and subscript t represents time.
CN202210221479.6A 2022-03-07 2022-03-07 Heat exchange process important parameter control method based on reinforcement learning Pending CN114739229A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210221479.6A CN114739229A (en) 2022-03-07 2022-03-07 Heat exchange process important parameter control method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210221479.6A CN114739229A (en) 2022-03-07 2022-03-07 Heat exchange process important parameter control method based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN114739229A true CN114739229A (en) 2022-07-12

Family

ID=82274971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210221479.6A Pending CN114739229A (en) 2022-03-07 2022-03-07 Heat exchange process important parameter control method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN114739229A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111006451A (en) * 2019-12-10 2020-04-14 江西艾维斯机械有限公司 Integrated air compressor and control method thereof
US20210180891A1 (en) * 2019-12-11 2021-06-17 Baltimore Aircoil Company, Inc. Heat Exchanger System with Machine-Learning Based Optimization
CN113255206A (en) * 2021-04-02 2021-08-13 河海大学 Hydrological prediction model parameter calibration method based on deep reinforcement learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111006451A (en) * 2019-12-10 2020-04-14 江西艾维斯机械有限公司 Integrated air compressor and control method thereof
US20210180891A1 (en) * 2019-12-11 2021-06-17 Baltimore Aircoil Company, Inc. Heat Exchanger System with Machine-Learning Based Optimization
CN113255206A (en) * 2021-04-02 2021-08-13 河海大学 Hydrological prediction model parameter calibration method based on deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG XUAN ET AL.: "Control of superheat of organic Rankine cycle under transient heat source based on deep reinforcement learning", APPLIED ENERGY, vol. 278, pages 1 - 12 *

Similar Documents

Publication Publication Date Title
Wang et al. Adaptive neural network model based predictive control for air–fuel ratio of SI engines
US9926866B2 (en) System and method for exhaust gas recirculation flow correction using temperature measurements
Wang et al. Control of superheat of organic Rankine cycle under transient heat source based on deep reinforcement learning
CN101498534A (en) Multi-target intelligent control method for electronic expansion valve of refrigeration air conditioner heat pump system
Li et al. A novel cascade temperature control system for a high-speed heat-airflow wind tunnel
JP2017129120A (en) Discrete time rate-based model predictive control method for internal combustion engine air path control
Shi et al. Dual-mode fast DMC algorithm for the control of ORC based waste heat recovery system
CN109268159B (en) Control method of fuel-air ratio system of lean-burn gasoline engine
CN105240846A (en) Method for controlling combustion process of circulating fluidized bed boiler on basis of multivariable generalized predictive control optimization
CN113093526A (en) Overshoot-free PID controller parameter setting method based on reinforcement learning
Wang et al. Deep reinforcement learning-PID based supervisor control method for indirect-contact heat transfer processes in energy systems
Glenn et al. Observer design of critical states for air path flow regulation in a variable geometry turbocharger exhaust gas recirculation diesel engine
CN115857330A (en) Rankine cycle waste heat recovery system optimization control method based on deep reinforcement learning
Zhang et al. Modeling and output feedback control of automotive air conditioning system
Shih et al. Reinforcement-learning-based output-feedback control of nonstrict nonlinear discrete-time systems with application to engine emission control
CN114739229A (en) Heat exchange process important parameter control method based on reinforcement learning
CN114660928B (en) BP neural network and fuzzy adaptive coupled PID temperature regulation system and method
US20190211753A1 (en) Feedforward and feedback architecture for air path model predictive control of an internal combustion engine
Wang et al. Adaptive neural network model based predictive control of an internal combustion engine with a new optimization algorithm
Molana et al. Analysis and simulation of active surge control in centrifugal compressor based on multiple model controllers
CN105673094A (en) Turbine rotating speed control method based on active-disturbance-rejection control
CN113977571B (en) Flexible joint robot output torque control method
CN113219841B (en) Nonlinear control method for underwater multi-joint hydraulic mechanical arm based on adaptive robustness
CN116047897A (en) Gas turbine predictive control method based on parameter self-adaptive disturbance rejection controller
El Hadef et al. New physics-based turbocharger data-maps extrapolation algorithms: validation on a spark-ignited engine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20220712