CN112460741B - Control method of building heating, ventilation and air conditioning system - Google Patents

Control method of building heating, ventilation and air conditioning system Download PDF

Info

Publication number
CN112460741B
CN112460741B CN202011319558.8A CN202011319558A CN112460741B CN 112460741 B CN112460741 B CN 112460741B CN 202011319558 A CN202011319558 A CN 202011319558A CN 112460741 B CN112460741 B CN 112460741B
Authority
CN
China
Prior art keywords
model
building
neural network
real
environment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011319558.8A
Other languages
Chinese (zh)
Other versions
CN112460741A (en
Inventor
赵俊华
赵焕
何秉昊
梁高琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese University of Hong Kong Shenzhen
Original Assignee
Chinese University of Hong Kong Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese University of Hong Kong Shenzhen filed Critical Chinese University of Hong Kong Shenzhen
Priority to CN202011319558.8A priority Critical patent/CN112460741B/en
Publication of CN112460741A publication Critical patent/CN112460741A/en
Application granted granted Critical
Publication of CN112460741B publication Critical patent/CN112460741B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/30Control or safety arrangements for purposes related to the operation of the system, e.g. for safety or monitoring
    • F24F11/46Improving electric energy efficiency or saving
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/62Control or safety arrangements characterised by the type of control or by internal processing, e.g. using fuzzy logic, adaptive control or estimation of values
    • F24F11/63Electronic processing
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/62Control or safety arrangements characterised by the type of control or by internal processing, e.g. using fuzzy logic, adaptive control or estimation of values
    • F24F11/63Electronic processing
    • F24F11/64Electronic processing using pre-stored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Mechanical Engineering (AREA)
  • Combustion & Propulsion (AREA)
  • Chemical & Material Sciences (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Fuzzy Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Air Conditioning Control Device (AREA)

Abstract

The invention provides a building heating ventilation air-conditioning system control method, which comprises the following steps: establishing a knowledge model for a heating ventilation air-conditioning system with variable air volume and constant air supply temperature, and pre-training an intelligent agent based on a depth certainty strategy gradient according to the knowledge model; and performing online iterative learning on the pre-trained intelligent agent in a real environment based on the knowledge model and the mixed reinforcement learning algorithm of the deep deterministic strategy gradient, and updating the environment model based on data driving in real time. The intelligent agent is pre-trained on the basis of the depth certainty strategy gradient, and the learning cost generated when the intelligent agent interacts in a real environment is reduced in the pre-training process. The hybrid reinforcement learning algorithm based on the knowledge model and the depth certainty strategy gradient is used for online iterative learning, and the data-driven environment model is updated in real time in the iterative learning process, so that online training is more stable, and meanwhile, the learning cost can be reduced.

Description

Control method of building heating, ventilation and air conditioning system
Technical Field
The invention relates to the field of power distribution networks, in particular to a control method of a building heating, ventilating and air conditioning system.
Background
Building energy consumption accounts for a large part of the total electricity consumption in China, and almost half of the building energy consumption is caused by heating, ventilating and air conditioning systems. Scholars at home and abroad carry out a series of work aiming at the energy consumption prediction of large buildings. The energy consumption prediction model is mainly divided into a physical model, a data driving model and a gray box model. The use of a simplified physical model may reduce the amount of verification data and save computation time. Due to the flexible operation mode of the building controllable load, the power dispatching center can effectively manage the energy consumption of a building system through direct control or price incentive measures, reduce the building operation cost and improve the operation economy and safety of a power grid.
In the current intelligent building flexible load demand management method, the demand response potential of the building flexible load cannot be fully mined, the uncertainty of renewable energy and load prediction is continuously enlarged along with the increase of the prediction time scale, and the accumulated error of prediction data is gradually increased, so that the current optimization scheduling result is difficult to meet the actual operation demand of the system, and great obstruction is brought to the operation of the power distribution network based on the building energy utilization flexibility. Due to the dynamic characteristics of the system and the environment, such as temperature and electricity price, the model predictive control of the heating, ventilation and air conditioning is a complex task, and the learning efficiency and the learning cost are the main obstacles for implementing the deep reinforcement learning method.
Therefore, an energy-saving control method for a building heating, ventilation and air conditioning system, which can accelerate the learning efficiency and reduce the learning cost, is urgently needed.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the energy-saving control method for the building heating ventilation air-conditioning system can accelerate learning efficiency and reduce learning cost.
In order to solve the technical problems, the invention adopts the technical scheme that: the control method of the building heating ventilation air-conditioning system comprises the following steps:
establishing a knowledge model for a heating ventilation air-conditioning system with variable air volume and constant air supply temperature, and pre-training an intelligent agent based on a depth certainty strategy gradient according to the knowledge model;
and performing online iterative learning on the pre-trained intelligent agent in a real environment based on the knowledge model and the mixed reinforcement learning algorithm of the deep deterministic strategy gradient, and updating the environment model based on data driving in real time.
Further, the knowledge model comprises a building heating ventilation air-conditioning electricity consumption cost model, a building heating ventilation air-conditioning power model, a building area temperature change model, an electricity price dynamic change model, an external temperature dynamic change model and a heating ventilation air-conditioning control model;
the building heating ventilation air-conditioning electricity consumption cost model is as follows:
c1,t=pf,tλt×Δt
wherein, c1,tIs the electricity consumption cost of the building heating, ventilation and air conditioning, pf,tIs the power, lambda, of the building heating, ventilating and air conditioning system at time ttIs the electricity charge at time t, Δ t is the discrete time interval in the model;
the building heating ventilation air-conditioning power model is as follows:
Figure RE-GDA0002895642480000021
wherein p isf,t(kw) is the power of the HVAC system at time t, kfIs the fan efficiency of the hvac system, N is the number of zones divided in the building, fz,tIs the air flow at time t at zone z in the building,
Figure RE-GDA0002895642480000022
is the total flow of air in all areas in the building;
the building area temperature change model is as follows:
Hw,z,t=UwAw(Ta,t-Tz,t)
Hh,z,t=fz,tca(Ts-Tz,t)
Tz,t+1-Tz,t=Δt(Hw,z,t+Hh,z,t)/(VzρaCa)
wherein Hw,z,t(kw) is the power at t for obtaining heat on the wall w of the zone z in the building, Uw(kw/m2C.) is the coefficient of thermal conversion of the wall w, Aw(m2) Is the area of the wall w, Ta,t(. degree. C.) is the ambient temperature at T, Tz,t(° c) is the temperature at t in the building at zone z, Hh,z,t(kw) is the power at t for the building zone z to obtain heat from the HVAC system, Ca(kJ/kg ℃) is the specific heat carried by the air, TSIs the supply air temperature, T, of the heating, ventilating and air conditioning systemz,t+1-Tz,t(° c) is the amount of temperature change, V, at zone z in a buildingz(m3) Is the total volume of air, ρ, in the building at zone za(kg/m3) Is the air density in the building;
the electricity price dynamic change model is as follows:
Figure RE-GDA0002895642480000031
wherein, ()DThe value of the data is represented by,
Figure RE-GDA0002895642480000032
is the real electricity price at d days time t, d is a day randomly selected from all real historical data;
the dynamic change model of the external temperature is as follows:
Ta,t+1=Ta,t+△TD a,t,d
wherein the content of the first and second substances,
Figure RE-GDA0002895642480000033
is the true ambient temperature at d days time t, d is a day randomly selected from all true historical data;
the heating ventilation air-conditioning control model is as follows:
Figure RE-GDA0002895642480000034
wherein, az,tIs a continuous control variable of the heating, ventilating and air conditioning at the time t of the area z,
Figure RE-GDA0002895642480000035
is the maximum air flow that can be achieved at zone z.
Further, the pre-training and the hybrid reinforcement learning algorithm based on the knowledge model and the deep deterministic strategy gradient are both markov decision processes;
ambient state s at any time ttComprises the following steps:
Figure RE-GDA0002895642480000041
wherein the content of the first and second substances,
Figure RE-GDA0002895642480000042
is the respective temperature, T, of all areas in the buildinga,tIs the outside temperature, λtIs electricity price, t' is time index;
action a at an arbitrary time ttComprises the following steps:
at=(a1,t,a1,t,...aN,t)T
Figure RE-GDA0002895642480000043
wherein, az,tIs a continuous control variable of the heating ventilation air conditioner at the area z;
the return r at any time ttComprises the following steps:
rt=-C1,t1C2,t2C3,t
Figure RE-GDA0002895642480000044
Figure RE-GDA0002895642480000045
wherein, [ x ]]+Denotes the greater of 0 and x, C1,tThe electricity consumption cost of the building heating ventilation air conditioner is low; c2,tIs the punishment of the heating, ventilating and air conditioning violation comfort level of the building,
Figure RE-GDA0002895642480000046
and
Figure RE-GDA0002895642480000047
respectively representing the maximum and minimum temperatures acceptable at zone z; c3,tIs the punishment of the heating, ventilating and air conditioning of the building violating the control variable constraint.
Further, said pre-training an agent based on depth-deterministic policy gradients according to said knowledge model comprises the steps of:
s11: randomly initializing a neural network element of the agent;
s12: the intelligent agent receives and records a first environment state sent by the knowledge model, and the neural network unit generates an initial action according to the first environment state;
s13: the agent generates and records an execution action according to the initial action, inputs the execution action into the knowledge model, and receives and records a return value of the execution action and a second environment state at the next moment;
s14: updating the neural network unit and increasing the numerical value of the iteration times by 1 according to the data recorded by the intelligent agent;
s15: judging whether the numerical value of the iteration times reaches a first preset value or not; if the value of the iteration times is smaller than the first preset value, returning to the step S12; and if the numerical value of the iteration times is larger than or equal to the first preset value, ending the pre-training.
Further, the neural network unit comprises an online strategy neural network, a target strategy neural network, an online Q neural network and a target Q neural network;
the expression of the online strategy neural network is as follows:
Q(a,s|θQ)
the expression of the target strategy neural network is as follows:
Q’(a,s|θQ’)
the expression of the online Q neural network is as follows:
μ(s|θμ)
the expression of the target Q neural network is as follows:
μ'(s|θμ')
where a denotes an execution action, s denotes an environmental state, and θ denotes(.)Is a network parameter;
execution action a generated by the agent at time ttComprises the following steps:
at=μ(stμ)+Nt
wherein s istIs the received first environmental state at time t, NtRepresenting gaussian noise.
Further, the updating the neural network element includes:
and synchronously updating the online strategy neural network, the target strategy neural network, the online Q neural network and the target Q neural network.
Further, the hybrid reinforcement learning algorithm based on the knowledge model and the depth-deterministic strategy gradient comprises the steps of:
s21: pre-training the data-driven-based environment model according to the knowledge model;
s22: receiving and recording a first real environment state in a real environment by a pre-trained intelligent agent, wherein a neural network unit of the pre-trained intelligent agent generates a real initial action according to the first real environment state;
s23: the pre-trained agent generates and records a real execution action according to the real initial action, inputs the real execution action into the pre-trained data-driven-based environment model, and receives and records a prediction return value of the real execution action;
s24: judging whether the prediction return value is smaller than a second preset value or not;
if the predicted return value is smaller than the second preset value, executing S25;
if the predicted return value is greater than or equal to the second preset value, executing S23;
s25: executing the real execution action in a real environment, and observing and recording a return value of the real execution action and a second real environment state at the next moment;
s26: updating the neural network unit according to the data recorded by the pre-trained agent, and adding 1 to the numerical value of the iteration times;
s27: judging whether the numerical value of the iteration times reaches a third preset value or not; if the value of the iteration times is smaller than the third preset value, returning to the step S22; and if the numerical value of the iteration times is greater than or equal to the third preset value, ending the training.
Further, the data-driven-based environment model is a multilayer artificial neural network;
the input value of the environment model based on the data driving is an environment state stAnd performing action atThe output value of the data-driven environment model is a predicted return value rtAnd the environmental state s at the next momentt+1
Expression M of the data-driven-based environment modeldComprises the following steps:
(rt,st+1)=Md(st,at)。
7. further, expression M of the knowledge modelkComprises the following steps:
Figure RE-GDA0002895642480000071
wherein the content of the first and second substances,
Figure RE-GDA0002895642480000072
and
Figure RE-GDA0002895642480000073
respectively represent MkAnd (d) obtaining a predicted return value according to the input (s, a) and the predicted environment state at the next moment.
Further, the pre-training of the data-driven-based environment model according to the knowledge model comprises the steps of:
s31: randomly selecting M groups of data(s)i,ai),i∈{1,2,3...,M};
S32: the data(s)i,ai) Inputting the data-driven environment model MdObtaining an output value
Figure RE-GDA0002895642480000074
S33: the data(s)i,ai) Inputting the knowledge model MkObtaining an output value
Figure RE-GDA0002895642480000075
S34: updating the data-driven environment model M by using a random gradient descent methoddTo minimize a loss function;
the expression of the minimization loss function is:
Figure RE-GDA0002895642480000076
wherein, IE [.]Representing expectation, | | · | | luminance2Is L-2 norm.
The building heating ventilation air-conditioning system control method provided by the invention comprises the steps of firstly establishing a knowledge model for a heating ventilation air-conditioning system with variable air volume and constant air supply temperature, interacting an Agent with the knowledge model, pre-training the Agent based on a depth certainty strategy gradient, and reducing the learning cost generated when the Agent interacts in a real environment in the pre-training process. Meanwhile, the pre-trained intelligent agent is subjected to online iterative learning in a real environment based on a hybrid reinforcement learning algorithm of a knowledge model and a depth certainty strategy gradient, and the environment model based on data driving is updated in real time in the iterative learning process, so that online training is more stable, and meanwhile, the learning cost can be reduced.
Drawings
The specific structure of the invention is detailed below with reference to the accompanying drawings:
FIG. 1 is a flow chart of a control method of a building heating, ventilating and air conditioning system according to the invention;
FIG. 2 is a flow chart of the present invention for pre-training based on a depth-deterministic strategy gradient;
FIG. 3 is a flow chart of a hybrid reinforcement learning algorithm of the present invention;
FIG. 4 is a distribution diagram of electricity price and ambient temperature in an embodiment of the present invention.
Detailed Description
In order to explain technical contents, structural features, and objects and effects of the present invention in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Due to the dynamic characteristics of the system and the environment, such as temperature and electricity price, the model predictive control of the heating, ventilation and air conditioning is a complex task, and the learning efficiency and the learning cost are the main obstacles for implementing the deep reinforcement learning method. The invention provides a control method of a building heating, ventilating and air conditioning system, and the method is shown in figure 1 and specifically comprises the following steps:
and S1, establishing a knowledge model for the heating, ventilating and air conditioning system with variable air volume and constant air supply temperature, and pre-training the intelligent agent based on the depth certainty strategy gradient according to the knowledge model.
It can be understood that a knowledge model is established for the heating ventilation air conditioning system with variable air volume and constant air supply temperature, an Agent (Agent) is interacted with the knowledge model, the Agent is pre-trained on the basis of a depth certainty strategy gradient, and the learning cost generated when the Agent is interacted in a real environment is reduced in the pre-training process.
And S2, performing online iterative learning on the pre-trained intelligent agent in a real environment based on the knowledge model and the hybrid reinforcement learning algorithm of the depth certainty strategy gradient, and updating the environment model based on data driving in real time.
It can be understood that a knowledge model is established for the heating ventilation air conditioning system with variable air volume and constant air supply temperature, an Agent (Agent) is interacted with the knowledge model, the Agent is pre-trained on the basis of a depth certainty strategy gradient, and the learning cost generated when the Agent is interacted in a real environment is reduced in the pre-training process.
According to the control method of the space heating ventilation air conditioning system, the intelligent body and the knowledge model are interacted, and the learning cost generated when the intelligent body interacts in a real environment is reduced. Specifically, the knowledge model comprises a building heating ventilation air-conditioning electricity consumption cost model, a building heating ventilation air-conditioning power model, a building region temperature change model, an electricity price dynamic change model, an external temperature dynamic change model and a heating ventilation air-conditioning control model.
The building heating ventilation air-conditioning electricity consumption cost model is as follows:
c1,t=pf,tλt×Δt
wherein, c1,tIs the electricity consumption cost of the building heating, ventilation and air conditioning, pf,tIs the power, lambda, of the building heating, ventilating and air conditioning system at time ttIs the electricity charge at time t, Δ t is the discrete time interval in the model;
the building heating ventilation air-conditioning power model is as follows:
Figure RE-GDA0002895642480000101
wherein p isf,t(kw) is the power of the HVAC system at time t, kfIs the fan efficiency of the hvac system, N is the number of zones divided in the building, fz,tIs the air flow at time t at zone z in the building,
Figure RE-GDA0002895642480000102
is the total flow of air in all areas in the building;
the building area temperature change model is as follows:
Hw,z,t=UwAw(Ta,t-Tz,t)
Hh,z,t=fz,tca(Ts-Tz,t)
Tz,t+1-Tz,t=Δt(Hw,z,t+Hh,z,t)/(VzρaCa)
wherein Hw,z,t(kw) is the power at t for obtaining heat on the wall w of the zone z in the building, Uw(kw/m2C.) is the coefficient of thermal conversion of the wall w, Aw(m2) Is the area of the wall w, Ta,t(. degree. C.) is the ambient temperature at T, Tz,t(° c) is the temperature at t in the building at zone z, Hh,z,t(kw) is the power at t for the building zone z to obtain heat from the HVAC system, Ca(kJ/kg ℃) is the specific heat carried by the air, TSIs the supply air temperature, T, of the heating, ventilating and air conditioning systemz,t+1-Tz,t(° c) is the amount of temperature change, V, at zone z in a buildingz(m3) Is the total volume of air, ρ, in the building at zone za(kg/m3) Is the air density in the building;
the electricity price dynamic change model is as follows:
Figure RE-GDA0002895642480000103
wherein, ()DThe value of the data is represented by,
Figure RE-GDA0002895642480000104
is the real electricity price at d days time t, d is a day randomly selected from all real historical data;
the dynamic change model of the outside temperature is as follows:
Ta,t+1=Ta,t+△TD a,t,d
wherein the content of the first and second substances,
Figure RE-GDA0002895642480000111
is the true ambient temperature at d days time t, d is a day randomly selected from all true historical data;
the heating ventilation air-conditioning control model is as follows:
Figure RE-GDA0002895642480000112
wherein, az,tIs a continuous control variable of the heating, ventilating and air conditioning at the time t of the area z,
Figure RE-GDA0002895642480000113
is the maximum air flow that can be achieved at zone z.
In the control method of the space heating ventilation air conditioning system, pre-training and a hybrid reinforcement learning algorithm based on the knowledge model and the depth certainty strategy gradient are both Markov decision processes;
ambient state s at any time ttComprises the following steps:
Figure RE-GDA0002895642480000114
wherein the content of the first and second substances,
Figure RE-GDA0002895642480000115
is the respective temperature, T, of all areas in the buildinga,tIs the outside temperature, λtIs electricity price, t' is time index;
action a at an arbitrary time ttComprises the following steps:
at=(a1,t,a1,t,...aN,t)T
Figure RE-GDA0002895642480000116
wherein, az,tIs a continuous control variable of the heating ventilation air conditioner at the area z;
the return r at any time ttComprises the following steps:
rt=-C1,t1C2,t2C3,t
Figure RE-GDA0002895642480000117
Figure RE-GDA0002895642480000118
wherein, [ x ]]+Denotes the greater of 0 and x, C1,tThe electricity consumption cost of the building heating ventilation air conditioner is low; c2,tIs the punishment of the heating, ventilating and air conditioning violation comfort level of the building,
Figure RE-GDA0002895642480000119
and
Figure RE-GDA00028956424800001110
respectively representing the maximum and minimum temperatures acceptable at zone z; c3,tIs the punishment of the heating, ventilating and air conditioning of the building violating the control variable constraint.
In the control method of the space heating ventilation air conditioning system, the intelligent agent is pre-trained based on the depth certainty strategy gradient according to the knowledge model, so that the learning cost generated when the intelligent agent interacts in a real environment is reduced. Specifically, the pre-training of the agent based on the depth deterministic strategy gradient according to the knowledge model comprises the following steps:
s11: randomly initializing a neural network element of the agent;
s12: the intelligent agent receives and records a first environment state sent by the knowledge model, and the neural network unit generates an initial action according to the first environment state;
s13: the agent generates and records an execution action according to the initial action, inputs the execution action into the knowledge model, and receives and records a return value of the execution action and a second environment state at the next moment;
s14: updating the neural network unit and increasing the numerical value of the iteration times by 1 according to the data recorded by the intelligent agent;
s15: judging whether the numerical value of the iteration times reaches a first preset value or not; if the value of the iteration times is smaller than the first preset value, returning to the step S12; and if the numerical value of the iteration times is larger than or equal to the first preset value, ending the pre-training.
In the control method of the space heating ventilation air conditioning system, the pre-trained intelligent agent is subjected to online iterative learning in a real environment based on a knowledge model and a hybrid reinforcement learning algorithm of a depth certainty strategy gradient, and a neural network unit is updated in real time in the iterative learning process, so that online training is more stable, and meanwhile, the learning cost can be reduced.
Specifically, the neural network unit comprises an online strategy neural network, a target strategy neural network, an online Q neural network and a target Q neural network;
the expression of the online policy neural network is:
Q(a,s|θQ)
the expression of the target strategy neural network is as follows:
Q’(a,s|θQ’)
the expression of the online Q neural network is:
μ(s|θμ)
the expression of the target Q neural network is:
μ'(s|θμ')
where a denotes an execution action, s denotes an environmental state, and θ denotes(.)Is a network parameter;
execution action a generated by agent at time ttComprises the following steps:
at=μ(stμ)+Nt
wherein s istIs the received first environmental state at time t, NtRepresenting gaussian noise.
The neural network unit is updated in real time in the iterative learning process, so that the online training is more stable, and meanwhile, the learning cost can be reduced. Specifically, updating the neural network unit includes:
and synchronously updating the online strategy neural network, the target strategy neural network, the online Q neural network and the target Q neural network.
Specifically, the process of updating the online Q neural network is as follows:
randomly extracting K groups of data(s) from data recorded by intelligent agenti,ai,ri,si+1) I ∈ {1, 2...., K }, wherein, s ∈ K }, whereiIs the state received at the ith iteration, aiIs the execution action at the i-th iteration, riIs the reward value received at the ith iteration.
The minimization loss function is expressed as follows, r ∈ (0,1] is the discount rate:
Figure RE-GDA0002895642480000131
yi=ri+rQ'(si+1,μ'(si+1μ')|θQ)。
specifically, the expression of the strategy gradient for the process of updating the online strategy neural network is as follows:
Figure RE-GDA0002895642480000141
updating the network parameters of the online strategy neural network by using a gradient ascending method, wherein the process is as follows:
Figure RE-GDA0002895642480000142
where J is the expectation of cumulative discount returns and α is the learning rate.
Specifically, the process of updating the target Q neural network is as follows:
θQ'=τθQ+(1-τ)θQ'
wherein 0 < τ < 1 is the update rate.
Specifically, the process of updating the target policy neural network is as follows:
θμ'=τθμ+(1-τ)θμ'
in the control method of the space heating ventilation air conditioning system, the pre-trained intelligent agent is subjected to online iterative learning in a real environment based on a knowledge model and a hybrid reinforcement learning algorithm of a depth certainty strategy gradient, and a neural network unit is updated in real time in the iterative learning process, so that online training is more stable, and meanwhile, the learning cost can be reduced. Specifically, the hybrid reinforcement learning algorithm based on the knowledge model and the depth certainty strategy gradient includes the following steps:
s21: pre-training the data-driven-based environment model according to the knowledge model;
s22: receiving and recording a first real environment state in a real environment by a pre-trained intelligent agent, wherein a neural network unit of the pre-trained intelligent agent generates a real initial action according to the first real environment state;
s23: the pre-trained agent generates and records a real execution action according to the real initial action, inputs the real execution action into the pre-trained data-driven-based environment model, and receives and records a prediction return value of the real execution action;
s24: judging whether the prediction return value is smaller than a second preset value or not;
if the predicted return value is smaller than the second preset value, executing S25;
if the predicted return value is greater than or equal to the second preset value, executing S23;
s25: executing the real execution action in a real environment, and observing and recording a return value of the real execution action and a second real environment state at the next moment;
s26: updating the neural network unit according to the data recorded by the pre-trained agent, and adding 1 to the numerical value of the iteration times;
s27: judging whether the numerical value of the iteration times reaches a third preset value or not; if the value of the iteration times is smaller than the third preset value, returning to the step S22; and if the numerical value of the iteration times is greater than or equal to the third preset value, ending the training.
The building heating ventilation air-conditioning system control method provided by the invention updates the environment model based on data driving in real time in the iterative learning process, so that the online training is more stable, and meanwhile, the learning cost can be reduced. Specifically, the data-driven environment model is a multilayer artificial neural network.
The input value of the environment model based on the data driving is an environment state stAnd performing action atThe output value of the data-driven environment model is a predicted return value rtAnd the environmental state s at the next momentt+1
Expression M of the data-driven-based environment modeldComprises the following steps:
(rt,st+1)=Md(st,at)。
further, expression M of the knowledge modelkComprises the following steps:
Figure RE-GDA0002895642480000151
wherein the content of the first and second substances,
Figure RE-GDA0002895642480000152
and
Figure RE-GDA0002895642480000153
respectively represent MkAnd (d) obtaining a predicted return value according to the input (s, a) and the predicted environment state at the next moment.
The hybrid reinforcement learning algorithm based on the knowledge model and the depth certainty strategy gradient comprises the following steps:
s31: randomly selecting M groups of data(s)i,ai),i∈{1,2,3...,M};
S32: the data(s)i,ai) Inputting the data-driven environment model MdObtaining an output value
Figure RE-GDA0002895642480000161
S33: the data(s)i,ai) Inputting the knowledge model MkObtaining an output value
Figure RE-GDA0002895642480000162
S34: updating the data-driven environment model M by using a random gradient descent methoddTo minimize a loss function;
the expression of the minimization loss function is:
Figure RE-GDA0002895642480000163
wherein, IE [.]Representing expectation, | | · | | luminance2Is L-2 norm.
In the control method of the space heating ventilation air conditioning system, the pre-trained intelligent agent is subjected to online iterative learning in a real environment based on a knowledge model and a hybrid reinforcement learning algorithm of a depth certainty strategy gradient.
In step S23, the pre-trained agent generates and records a real execution action according to the real initial action, inputs the real execution action into the pre-trained data-driven-based environment model, and receives and records a prediction report value of the real execution action;
s24: judging whether the prediction return value is smaller than a second preset value or not;
if the predicted return value is smaller than the second preset value, executing S25;
if the predicted reward value is greater than or equal to the second predetermined value, then S23 is executed.
The process of regenerating the real execution action includes the following steps:
wherein the building is warmKnowledge model M of ventilation and air conditioningkCan be expressed as:
Figure RE-GDA0002895642480000171
Figure RE-GDA0002895642480000172
and
Figure RE-GDA0002895642480000173
respectively represent MkAnd (d) obtaining a predicted return value according to the input (s, a) and the predicted environment state at the next moment.
The solution optimization process is as follows:
Figure RE-GDA0002895642480000174
the regenerated real execution action can therefore be expressed as:
Figure RE-GDA0002895642480000175
at=βar,l,t+(1-β)aMPC,t
wherein, here atIs a regenerated real execution action, ar,l,tIs the real execution action generated by the pre-trained agent according to the real initial action, and β is the trial-and-error rate.
The building heating ventilation air-conditioning system control method provided by the invention comprises the steps of firstly establishing a knowledge model for a heating ventilation air-conditioning system with variable air volume and constant air supply temperature, interacting an Agent with the knowledge model, pre-training the Agent based on a depth certainty strategy gradient, and reducing the learning cost generated when the Agent interacts in a real environment in the pre-training process. Meanwhile, the pre-trained intelligent agent is subjected to online iterative learning in a real environment based on a hybrid reinforcement learning algorithm of a knowledge model and a depth certainty strategy gradient, and the environment model based on data driving is updated in real time in the iterative learning process, so that online training is more stable, and meanwhile, the learning cost can be reduced.
Example 1
In order to make the objects, features and advantages of the present invention more apparent and understandable, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the embodiment of the invention, the heating, ventilating and air conditioning system with the volume of 25m by 10m and the constant air volume and the constant air supply temperature is based on the knowledge model of the heating, ventilating and air conditioning of the building. This knowledge model is considered to be the real environment model Mr,MrSee table 1 for specific parameters.
Figure RE-GDA0002895642480000181
TABLE 1
Building heating ventilation air conditioner knowledge model M for pre-trainingk,MkSee table 2 for specific parameters.
Figure RE-GDA0002895642480000182
TABLE 2
MrThe electricity price data of the Australian Energy Market Operator (AEMO) day-ahead market and the external temperature data of the state of new south wills in australia during the period from 2018, 9 and 1 to 2018, 9 and 30 are used as dynamic models of the electricity price and the external temperature, and the distribution of the electricity price and the external temperature is shown in fig. 4.
Target agent at MkIn the middle of 2880 iterations (60 days) of pre-training, then MrIn the middle of480 iterations (10 days) of real training.
At MrIn the method, different strategies are used for controlling the heating, ventilating and air conditioning system, and the comparison result of the different strategies is shown in a table 3.
Figure RE-GDA0002895642480000191
TABLE 3
In summary, according to the control method of the building heating, ventilation and air conditioning system provided by the invention, a knowledge model is established for the heating, ventilation and air conditioning system with variable air volume and constant air supply temperature, an Agent (Agent) is interacted with the knowledge model, the Agent is pre-trained based on a depth certainty strategy gradient, and the learning cost generated when the Agent is interacted in a real environment is reduced in the pre-training process. Meanwhile, the pre-trained intelligent agent is subjected to online iterative learning in a real environment based on a hybrid reinforcement learning algorithm of a knowledge model and a depth certainty strategy gradient, and the environment model based on data driving is updated in real time in the iterative learning process, so that online training is more stable, and meanwhile, the learning cost can be reduced.
The first … … and the second … … are only used for name differentiation and do not represent how different the importance and position of the two are.
Here, the upper, lower, left, right, front, and rear merely represent relative positions thereof and do not represent absolute positions thereof
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (9)

1. A control method for a building heating, ventilating and air conditioning system is characterized by comprising the following steps:
establishing a knowledge model for a heating ventilation air-conditioning system with variable air volume and constant air supply temperature, and pre-training an intelligent agent based on a depth certainty strategy gradient according to the knowledge model;
performing online iterative learning on the pre-trained intelligent agent in a real environment based on the knowledge model and the mixed reinforcement learning algorithm of the depth certainty strategy gradient, and updating the environment model based on data driving in real time;
the knowledge model comprises a building heating ventilation air-conditioning electricity consumption cost model, a building heating ventilation air-conditioning power model, a building area temperature change model, an electricity price dynamic change model, an external temperature dynamic change model and a heating ventilation air-conditioning control model;
the building heating ventilation air-conditioning electricity consumption cost model is as follows:
c1,t=pf,tλt×Δt
wherein, c1,tIs the electricity consumption cost of the building heating, ventilation and air conditioning, pf,tIs the power, lambda, of the building heating, ventilating and air conditioning system at time ttIs the electricity charge at time t, Δ t is the discrete time interval in the model;
the building heating ventilation air-conditioning power model is as follows:
Figure FDA0003286540100000011
wherein p isf,t(kw) is the power of the HVAC system at time t, kfIs the fan efficiency of the hvac system, N is the number of zones divided in the building, fz,tIs the air flow at time t at zone z in the building,
Figure FDA0003286540100000012
is the total flow of air in all areas in the building;
the building area temperature change model is as follows:
Hw,z,t=UwAw(Ta,t-Tz,t)
Hh,z,t=fz,tca(Ts-Tz,t)
Tz,t+1-Tz,t=Δt(Hw,z,t+Hh,z,t)/(VzρaCa)
wherein Hw,z,t(kw) is the power at t for obtaining heat on the wall w of the zone z in the building, Uw(kw/m2C.) is the coefficient of thermal conversion of the wall w, Aw(m2) Is the area of the wall w, Ta,t(. degree. C.) is the ambient temperature at T, Tz,t(° c) is the temperature at t in the building at zone z, Hh,z,t(kw) is the power at t for the building zone z to obtain heat from the HVAC system, Ca(kJ/kg ℃) is the specific heat carried by the air, TSIs the supply air temperature, T, of the heating, ventilating and air conditioning systemz,t+1-Tz,t(° c) is the amount of temperature change, V, at zone z in a buildingz(m3) Is the total volume of air, ρ, in the building at zone za(kg/m3) Is the air density in the building;
the electricity price dynamic change model is as follows:
Figure FDA0003286540100000021
wherein, ()DThe value of the data is represented by,
Figure FDA0003286540100000022
is the real electricity price at d days time t, d is a day randomly selected from all real historical data;
the dynamic change model of the external temperature is as follows:
Ta,t+1=Ta,t+△TD a,t,d
wherein the content of the first and second substances,
Figure FDA0003286540100000023
is the true ambient temperature at d days time t, d is a day randomly selected from all true historical data;
the heating ventilation air-conditioning control model is as follows:
Figure FDA0003286540100000024
wherein, az,tIs a continuous control variable of the heating, ventilating and air conditioning at the time t of the area z,
Figure FDA0003286540100000025
is the maximum air flow that can be achieved at zone z.
2. The building heating ventilation air conditioning system control method of claim 1, wherein the pre-training and the hybrid reinforcement learning algorithm based on the knowledge model and the deep deterministic strategy gradient are both markov decision processes;
ambient state s at any time ttComprises the following steps:
Figure FDA0003286540100000031
wherein the content of the first and second substances,
Figure FDA0003286540100000032
is the respective temperature, T, of all areas in the buildinga,tIs the outside temperature, λtIs electricity price, t' is time index;
action a at an arbitrary time ttComprises the following steps:
at=(a1,t,a1,t,...aN,t)T
Figure FDA0003286540100000033
wherein, az,tIs a continuous control variable of the heating ventilation air conditioner at the area z;
the return r at any time ttComprises the following steps:
rt=-C1,t1C2,t2C3,t
Figure FDA0003286540100000034
Figure FDA0003286540100000035
wherein, [ x ]]+Denotes the greater of 0 and x, C1,tThe electricity consumption cost of the building heating ventilation air conditioner is low; c2,tIs the punishment of the heating, ventilating and air conditioning violation comfort level of the building,
Figure FDA0003286540100000036
and
Figure FDA0003286540100000037
respectively representing the maximum and minimum temperatures acceptable at zone z; c3,tIs the punishment of the heating, ventilating and air conditioning of the building violating the control variable constraint.
3. The building heating, ventilation and air conditioning system control method of claim 1, wherein the pre-training of agents based on depth-deterministic policy gradients according to the knowledge model comprises the steps of:
s11: randomly initializing a neural network element of the agent;
s12: the intelligent agent receives and records a first environment state sent by the knowledge model, and the neural network unit generates an initial action according to the first environment state;
s13: the agent generates and records an execution action according to the initial action, inputs the execution action into the knowledge model, and receives and records a return value of the execution action and a second environment state at the next moment;
s14: updating the neural network unit and increasing the numerical value of the iteration times by 1 according to the data recorded by the intelligent agent;
s15: judging whether the numerical value of the iteration times reaches a first preset value or not; if the value of the iteration times is smaller than the first preset value, returning to the step S12; and if the numerical value of the iteration times is larger than or equal to the first preset value, ending the pre-training.
4. A building heating ventilation air conditioning system control method as claimed in claim 3, characterized in that: the neural network unit comprises an online strategy neural network, a target strategy neural network, an online Q neural network and a target Q neural network;
the expression of the online strategy neural network is as follows:
Q(a,s|θQ)
the expression of the target strategy neural network is as follows:
Q’(a,s|θQ’)
the expression of the online Q neural network is as follows:
μ(s|θμ)
the expression of the target Q neural network is as follows:
μ'(s|θμ′)
where a denotes an execution action, s denotes an environmental state, and θ denotes(.)Is a network parameter;
execution action a generated by the agent at time ttComprises the following steps:
at=μ(stμ)+Nt
wherein s istIs the received first environmental state at time t, NtRepresenting gaussian noise.
5. The building heating, ventilating and air conditioning system control method of claim 4, wherein said updating the neural network element comprises:
and synchronously updating the online strategy neural network, the target strategy neural network, the online Q neural network and the target Q neural network.
6. The building heating, ventilation and air conditioning system control method of claim 5, wherein the hybrid reinforcement learning algorithm based on the knowledge model and the deep deterministic strategy gradient comprises the steps of:
s21: pre-training the data-driven-based environment model according to the knowledge model;
s22: receiving and recording a first real environment state in a real environment by a pre-trained intelligent agent, wherein a neural network unit of the pre-trained intelligent agent generates a real initial action according to the first real environment state;
s23: the pre-trained agent generates and records a real execution action according to the real initial action, inputs the real execution action into the pre-trained data-driven-based environment model, and receives and records a prediction return value of the real execution action;
s24: judging whether the prediction return value is smaller than a second preset value or not;
if the predicted return value is smaller than the second preset value, executing S25;
if the predicted return value is greater than or equal to the second preset value, executing S23;
s25: executing the real execution action in a real environment, and observing and recording a return value of the real execution action and a second real environment state at the next moment;
s26: updating the neural network unit according to the data recorded by the pre-trained agent, and adding 1 to the numerical value of the iteration times;
s27: judging whether the numerical value of the iteration times reaches a third preset value or not; if the value of the iteration times is smaller than the third preset value, returning to the step S22; and if the numerical value of the iteration times is greater than or equal to the third preset value, ending the training.
7. The building heating, ventilating and air conditioning system control method as claimed in claim 6, characterized in that: the environment model based on data driving is a multilayer artificial neural network;
the above-mentionedThe input value of the data-driven environment model is an environment state stAnd performing action atThe output value of the data-driven environment model is a predicted return value rtAnd the environmental state s at the next momentt+1
Expression M of the data-driven-based environment modeldComprises the following steps:
(rt,st+1)=Md(st,at)。
8. the building heating, ventilating and air conditioning system control method as claimed in claim 7, wherein the expression M of the knowledge modelkComprises the following steps:
Figure FDA0003286540100000061
wherein the content of the first and second substances,
Figure FDA0003286540100000062
and
Figure FDA0003286540100000063
respectively represent MkAnd (d) obtaining a predicted return value according to the input (s, a) and the predicted environment state at the next moment.
9. The building heating, ventilating and air conditioning system control method of claim 8, wherein said pre-training said data-driven based environment model according to said knowledge model comprises the steps of:
s31: randomly selecting M groups of data(s)i,ai),i∈{1,2,3...,M};
S32: the data(s)i,ai) Inputting the data-driven environment model MdObtaining an output value
Figure FDA0003286540100000064
S33: the data(s)i,ai) Inputting the knowledge model MkObtaining an output value
Figure FDA0003286540100000065
S34: updating the data-driven environment model M by using a random gradient descent methoddTo minimize a loss function;
the expression of the minimization loss function is:
Figure FDA0003286540100000066
wherein, IE [.]Expressing expectation, | | · | | luminance2Is L-2 norm.
CN202011319558.8A 2020-11-23 2020-11-23 Control method of building heating, ventilation and air conditioning system Active CN112460741B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011319558.8A CN112460741B (en) 2020-11-23 2020-11-23 Control method of building heating, ventilation and air conditioning system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011319558.8A CN112460741B (en) 2020-11-23 2020-11-23 Control method of building heating, ventilation and air conditioning system

Publications (2)

Publication Number Publication Date
CN112460741A CN112460741A (en) 2021-03-09
CN112460741B true CN112460741B (en) 2021-11-26

Family

ID=74799562

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011319558.8A Active CN112460741B (en) 2020-11-23 2020-11-23 Control method of building heating, ventilation and air conditioning system

Country Status (1)

Country Link
CN (1) CN112460741B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113112077B (en) * 2021-04-14 2022-06-10 太原理工大学 HVAC control system based on multi-step prediction deep reinforcement learning algorithm
CN113536660B (en) * 2021-06-12 2023-05-23 武汉所为科技有限公司 Intelligent system training method, model and storage medium for heating, ventilation and cloud edge cooperation
CN113435042B (en) * 2021-06-28 2022-05-17 天津大学 Reinforced learning modeling method for demand response of building air conditioning system
CN114017904B (en) * 2021-11-04 2023-01-20 广东电网有限责任公司 Operation control method and device for building HVAC system
CN114909707B (en) * 2022-04-24 2023-10-10 浙江英集动力科技有限公司 Heat supply secondary network regulation and control method based on intelligent balance device and reinforcement learning

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106322656B (en) * 2016-08-23 2019-05-14 海信(山东)空调有限公司 A kind of air conditioning control method and server and air-conditioning system
EP3736646A1 (en) * 2019-05-09 2020-11-11 Siemens Schweiz AG Method and controller for controlling a chiller plant for a building and chiller plant
CN111124916B (en) * 2019-12-23 2023-04-07 北京云聚智慧科技有限公司 Model training method based on motion semantic vector and electronic equipment
KR102131414B1 (en) * 2019-12-31 2020-07-08 한국산업기술시험원 System for the energy saving pre-cooling/heating training of an air conditioner using deep reinforcement learning algorithm based on the user location, living climate condition and method thereof
CN111144793B (en) * 2020-01-03 2022-06-14 南京邮电大学 Commercial building HVAC control method based on multi-agent deep reinforcement learning
CN111310384B (en) * 2020-01-16 2024-05-21 香港中文大学(深圳) Wind field cooperative control method, terminal and computer readable storage medium

Also Published As

Publication number Publication date
CN112460741A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
CN112460741B (en) Control method of building heating, ventilation and air conditioning system
Li et al. Intelligent multi-zone residential HVAC control strategy based on deep reinforcement learning
Lissa et al. Deep reinforcement learning for home energy management system control
Zhang et al. Whole building energy model for HVAC optimal control: A practical framework based on deep reinforcement learning
Huang et al. A neural network-based multi-zone modelling approach for predictive control system design in commercial buildings
Zhang et al. A deep reinforcement learning approach to using whole building energy model for hvac optimal control
Yu et al. Control strategies for integration of thermal energy storage into buildings: State-of-the-art review
Homod Analysis and optimization of HVAC control systems based on energy and performance considerations for smart buildings
Yang et al. Reinforcement learning for optimal control of low exergy buildings
Yu et al. Online tuning of a supervisory fuzzy controller for low-energy building system using reinforcement learning
Zhang et al. Building energy management with reinforcement learning and model predictive control: A survey
Homod et al. Dynamics analysis of a novel hybrid deep clustering for unsupervised learning by reinforcement of multi-agent to energy saving in intelligent buildings
Saletti et al. Enabling smart control by optimally managing the State of Charge of district heating networks
Li et al. Modeling and energy dynamic control for a ZEH via hybrid model-based deep reinforcement learning
Wang et al. A chance-constrained stochastic model predictive control for building integrated with renewable resources
Marantos et al. Towards plug&play smart thermostats inspired by reinforcement learning
Shi et al. Optimization of electricity consumption in office buildings based on adaptive dynamic programming
Fu et al. Research and application of predictive control method based on deep reinforcement learning for HVAC systems
Lee et al. Artificial intelligence enabled energy-efficient heating, ventilation and air conditioning system: Design, analysis and necessary hardware upgrades
Zhang et al. Diversity for transfer in learning-based control of buildings
CN114909706A (en) Secondary network balance regulation and control method based on reinforcement learning algorithm and pressure difference control
Reynolds et al. A smart heating set point scheduler using an artificial neural network and genetic algorithm
Kim et al. Optimization of supply air flow and temperature for VAV terminal unit by artificial neural network
Zhang et al. Data-driven model predictive and reinforcement learning based control for building energy management: A survey
Masburah et al. Co-designing Intelligent Control of Building HVACs and Microgrids

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant