CN112460741B

CN112460741B - Control method of building heating, ventilation and air conditioning system

Info

Publication number: CN112460741B
Application number: CN202011319558.8A
Authority: CN
Inventors: 赵俊华; 赵焕; 何秉昊; 梁高琪
Original assignee: Chinese University of Hong Kong Shenzhen
Current assignee: Chinese University of Hong Kong Shenzhen
Priority date: 2020-11-23
Filing date: 2020-11-23
Publication date: 2021-11-26
Anticipated expiration: 2040-11-23
Also published as: CN112460741A

Abstract

The invention provides a building heating ventilation air-conditioning system control method, which comprises the following steps: establishing a knowledge model for a heating ventilation air-conditioning system with variable air volume and constant air supply temperature, and pre-training an intelligent agent based on a depth certainty strategy gradient according to the knowledge model; and performing online iterative learning on the pre-trained intelligent agent in a real environment based on the knowledge model and the mixed reinforcement learning algorithm of the deep deterministic strategy gradient, and updating the environment model based on data driving in real time. The intelligent agent is pre-trained on the basis of the depth certainty strategy gradient, and the learning cost generated when the intelligent agent interacts in a real environment is reduced in the pre-training process. The hybrid reinforcement learning algorithm based on the knowledge model and the depth certainty strategy gradient is used for online iterative learning, and the data-driven environment model is updated in real time in the iterative learning process, so that online training is more stable, and meanwhile, the learning cost can be reduced.

Description

Control method of building heating, ventilation and air conditioning system

Technical Field

The invention relates to the field of power distribution networks, in particular to a control method of a building heating, ventilating and air conditioning system.

Background

Building energy consumption accounts for a large part of the total electricity consumption in China, and almost half of the building energy consumption is caused by heating, ventilating and air conditioning systems. Scholars at home and abroad carry out a series of work aiming at the energy consumption prediction of large buildings. The energy consumption prediction model is mainly divided into a physical model, a data driving model and a gray box model. The use of a simplified physical model may reduce the amount of verification data and save computation time. Due to the flexible operation mode of the building controllable load, the power dispatching center can effectively manage the energy consumption of a building system through direct control or price incentive measures, reduce the building operation cost and improve the operation economy and safety of a power grid.

In the current intelligent building flexible load demand management method, the demand response potential of the building flexible load cannot be fully mined, the uncertainty of renewable energy and load prediction is continuously enlarged along with the increase of the prediction time scale, and the accumulated error of prediction data is gradually increased, so that the current optimization scheduling result is difficult to meet the actual operation demand of the system, and great obstruction is brought to the operation of the power distribution network based on the building energy utilization flexibility. Due to the dynamic characteristics of the system and the environment, such as temperature and electricity price, the model predictive control of the heating, ventilation and air conditioning is a complex task, and the learning efficiency and the learning cost are the main obstacles for implementing the deep reinforcement learning method.

Therefore, an energy-saving control method for a building heating, ventilation and air conditioning system, which can accelerate the learning efficiency and reduce the learning cost, is urgently needed.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the energy-saving control method for the building heating ventilation air-conditioning system can accelerate learning efficiency and reduce learning cost.

In order to solve the technical problems, the invention adopts the technical scheme that: the control method of the building heating ventilation air-conditioning system comprises the following steps:

establishing a knowledge model for a heating ventilation air-conditioning system with variable air volume and constant air supply temperature, and pre-training an intelligent agent based on a depth certainty strategy gradient according to the knowledge model;

and performing online iterative learning on the pre-trained intelligent agent in a real environment based on the knowledge model and the mixed reinforcement learning algorithm of the deep deterministic strategy gradient, and updating the environment model based on data driving in real time.

Further, the knowledge model comprises a building heating ventilation air-conditioning electricity consumption cost model, a building heating ventilation air-conditioning power model, a building area temperature change model, an electricity price dynamic change model, an external temperature dynamic change model and a heating ventilation air-conditioning control model;

the building heating ventilation air-conditioning electricity consumption cost model is as follows:

c_1,t＝p_f,tλ_t×Δt

wherein, c_1,tIs the electricity consumption cost of the building heating, ventilation and air conditioning, p_f,tIs the power, lambda, of the building heating, ventilating and air conditioning system at time t_tIs the electricity charge at time t, Δ t is the discrete time interval in the model;

the building heating ventilation air-conditioning power model is as follows:

wherein p is_f,t(kw) is the power of the HVAC system at time t, k_fIs the fan efficiency of the hvac system, N is the number of zones divided in the building, f_z,tIs the air flow at time t at zone z in the building,

is the total flow of air in all areas in the building;

the building area temperature change model is as follows:

H_w,z,t＝U_wA_w(T_a,t-T_z,t)

H_h,z,t＝f_z,tc_a(T_s-T_z,t)

T_z,t+1-T_z,t＝Δt(H_w,z,t+H_h,z,t)/(V_zρ_aC_a)

wherein H_w,z,t(kw) is the power at t for obtaining heat on the wall w of the zone z in the building, U_w(kw/m²C.) is the coefficient of thermal conversion of the wall w, A_w(m²) Is the area of the wall w, T_a,t(. degree. C.) is the ambient temperature at T, T_z,t(° c) is the temperature at t in the building at zone z, H_h,z,t(kw) is the power at t for the building zone z to obtain heat from the HVAC system, C_a(kJ/kg ℃) is the specific heat carried by the air, T_SIs the supply air temperature, T, of the heating, ventilating and air conditioning system_z,t+1-T_z,t(° c) is the amount of temperature change, V, at zone z in a building_z(m³) Is the total volume of air, ρ, in the building at zone z_a(kg/m³) Is the air density in the building;

the electricity price dynamic change model is as follows:

wherein, ()^DThe value of the data is represented by,

is the real electricity price at d days time t, d is a day randomly selected from all real historical data;

the dynamic change model of the external temperature is as follows:

T_a,t+1＝T_a,t+△T^D _a,t,d

wherein the content of the first and second substances,

is the true ambient temperature at d days time t, d is a day randomly selected from all true historical data;

the heating ventilation air-conditioning control model is as follows:

wherein, a_z,tIs a continuous control variable of the heating, ventilating and air conditioning at the time t of the area z,

is the maximum air flow that can be achieved at zone z.

Further, the pre-training and the hybrid reinforcement learning algorithm based on the knowledge model and the deep deterministic strategy gradient are both markov decision processes;

ambient state s at any time t_tComprises the following steps:

wherein the content of the first and second substances,

is the respective temperature, T, of all areas in the building_a,tIs the outside temperature, λ_tIs electricity price, t' is time index;

action a at an arbitrary time t_tComprises the following steps:

a_t＝(a_1,t,a_1,t,...a_N,t)^T

wherein, a_z,tIs a continuous control variable of the heating ventilation air conditioner at the area z;

the return r at any time t_tComprises the following steps:

r_t＝-C_1,t-α₁C_2,t-α₂C_3,t

wherein, [ x ]]⁺Denotes the greater of 0 and x, C_1,tThe electricity consumption cost of the building heating ventilation air conditioner is low; c_2,tIs the punishment of the heating, ventilating and air conditioning violation comfort level of the building,

and

respectively representing the maximum and minimum temperatures acceptable at zone z; c_3,tIs the punishment of the heating, ventilating and air conditioning of the building violating the control variable constraint.

Further, said pre-training an agent based on depth-deterministic policy gradients according to said knowledge model comprises the steps of:

s11: randomly initializing a neural network element of the agent;

s12: the intelligent agent receives and records a first environment state sent by the knowledge model, and the neural network unit generates an initial action according to the first environment state;

s13: the agent generates and records an execution action according to the initial action, inputs the execution action into the knowledge model, and receives and records a return value of the execution action and a second environment state at the next moment;

s14: updating the neural network unit and increasing the numerical value of the iteration times by 1 according to the data recorded by the intelligent agent;

s15: judging whether the numerical value of the iteration times reaches a first preset value or not; if the value of the iteration times is smaller than the first preset value, returning to the step S12; and if the numerical value of the iteration times is larger than or equal to the first preset value, ending the pre-training.

Further, the neural network unit comprises an online strategy neural network, a target strategy neural network, an online Q neural network and a target Q neural network;

the expression of the online strategy neural network is as follows:

Q(a,s|θ^Q)

the expression of the target strategy neural network is as follows:

Q’(a,s|θ^Q’)

the expression of the online Q neural network is as follows:

μ(s|θ^μ)

the expression of the target Q neural network is as follows:

μ'(s|θ^μ')

where a denotes an execution action, s denotes an environmental state, and θ denotes^(.)Is a network parameter;

execution action a generated by the agent at time t_tComprises the following steps:

a_t＝μ(s_t|θ^μ)+N_t

wherein s is_tIs the received first environmental state at time t, N_tRepresenting gaussian noise.

Further, the updating the neural network element includes:

and synchronously updating the online strategy neural network, the target strategy neural network, the online Q neural network and the target Q neural network.

Further, the hybrid reinforcement learning algorithm based on the knowledge model and the depth-deterministic strategy gradient comprises the steps of:

s21: pre-training the data-driven-based environment model according to the knowledge model;

s22: receiving and recording a first real environment state in a real environment by a pre-trained intelligent agent, wherein a neural network unit of the pre-trained intelligent agent generates a real initial action according to the first real environment state;

s23: the pre-trained agent generates and records a real execution action according to the real initial action, inputs the real execution action into the pre-trained data-driven-based environment model, and receives and records a prediction return value of the real execution action;

s24: judging whether the prediction return value is smaller than a second preset value or not;

if the predicted return value is smaller than the second preset value, executing S25;

if the predicted return value is greater than or equal to the second preset value, executing S23;

s25: executing the real execution action in a real environment, and observing and recording a return value of the real execution action and a second real environment state at the next moment;

s26: updating the neural network unit according to the data recorded by the pre-trained agent, and adding 1 to the numerical value of the iteration times;

s27: judging whether the numerical value of the iteration times reaches a third preset value or not; if the value of the iteration times is smaller than the third preset value, returning to the step S22; and if the numerical value of the iteration times is greater than or equal to the third preset value, ending the training.

Further, the data-driven-based environment model is a multilayer artificial neural network;

the input value of the environment model based on the data driving is an environment state s_tAnd performing action a_tThe output value of the data-driven environment model is a predicted return value r_tAnd the environmental state s at the next moment_t+1；

Expression M of the data-driven-based environment model_dComprises the following steps:

(r_t,s_t+1)＝M_d(s_t,a_t)。

7. further, expression M of the knowledge model_kComprises the following steps:

wherein the content of the first and second substances,

and

respectively represent M_kAnd (d) obtaining a predicted return value according to the input (s, a) and the predicted environment state at the next moment.

Further, the pre-training of the data-driven-based environment model according to the knowledge model comprises the steps of:

s31: randomly selecting M groups of data(s)_i,a_i)，i∈{1,2,3...,M}；

S32: the data(s)_i,a_i) Inputting the data-driven environment model M_dObtaining an output value

S33: the data(s)_i,a_i) Inputting the knowledge model M_kObtaining an output value

S34: updating the data-driven environment model M by using a random gradient descent method_dTo minimize a loss function;

the expression of the minimization loss function is:

wherein, IE [.]Representing expectation, | | · | | luminance₂Is L-2 norm.

The building heating ventilation air-conditioning system control method provided by the invention comprises the steps of firstly establishing a knowledge model for a heating ventilation air-conditioning system with variable air volume and constant air supply temperature, interacting an Agent with the knowledge model, pre-training the Agent based on a depth certainty strategy gradient, and reducing the learning cost generated when the Agent interacts in a real environment in the pre-training process. Meanwhile, the pre-trained intelligent agent is subjected to online iterative learning in a real environment based on a hybrid reinforcement learning algorithm of a knowledge model and a depth certainty strategy gradient, and the environment model based on data driving is updated in real time in the iterative learning process, so that online training is more stable, and meanwhile, the learning cost can be reduced.

Drawings

The specific structure of the invention is detailed below with reference to the accompanying drawings:

FIG. 1 is a flow chart of a control method of a building heating, ventilating and air conditioning system according to the invention;

FIG. 2 is a flow chart of the present invention for pre-training based on a depth-deterministic strategy gradient;

FIG. 3 is a flow chart of a hybrid reinforcement learning algorithm of the present invention;

FIG. 4 is a distribution diagram of electricity price and ambient temperature in an embodiment of the present invention.

Detailed Description

In order to explain technical contents, structural features, and objects and effects of the present invention in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Due to the dynamic characteristics of the system and the environment, such as temperature and electricity price, the model predictive control of the heating, ventilation and air conditioning is a complex task, and the learning efficiency and the learning cost are the main obstacles for implementing the deep reinforcement learning method. The invention provides a control method of a building heating, ventilating and air conditioning system, and the method is shown in figure 1 and specifically comprises the following steps:

and S1, establishing a knowledge model for the heating, ventilating and air conditioning system with variable air volume and constant air supply temperature, and pre-training the intelligent agent based on the depth certainty strategy gradient according to the knowledge model.

It can be understood that a knowledge model is established for the heating ventilation air conditioning system with variable air volume and constant air supply temperature, an Agent (Agent) is interacted with the knowledge model, the Agent is pre-trained on the basis of a depth certainty strategy gradient, and the learning cost generated when the Agent is interacted in a real environment is reduced in the pre-training process.

And S2, performing online iterative learning on the pre-trained intelligent agent in a real environment based on the knowledge model and the hybrid reinforcement learning algorithm of the depth certainty strategy gradient, and updating the environment model based on data driving in real time.

According to the control method of the space heating ventilation air conditioning system, the intelligent body and the knowledge model are interacted, and the learning cost generated when the intelligent body interacts in a real environment is reduced. Specifically, the knowledge model comprises a building heating ventilation air-conditioning electricity consumption cost model, a building heating ventilation air-conditioning power model, a building region temperature change model, an electricity price dynamic change model, an external temperature dynamic change model and a heating ventilation air-conditioning control model.

c_1,t＝p_f,tλ_t×Δt

the building heating ventilation air-conditioning power model is as follows:

is the total flow of air in all areas in the building;

the building area temperature change model is as follows:

H_w,z,t＝U_wA_w(T_a,t-T_z,t)

H_h,z,t＝f_z,tc_a(T_s-T_z,t)

T_z,t+1-T_z,t＝Δt(H_w,z,t+H_h,z,t)/(V_zρ_aC_a)

the electricity price dynamic change model is as follows:

wherein, ()^DThe value of the data is represented by,

the dynamic change model of the outside temperature is as follows:

T_a,t+1＝T_a,t+△T^D _a,t,d

wherein the content of the first and second substances,

the heating ventilation air-conditioning control model is as follows:

is the maximum air flow that can be achieved at zone z.

In the control method of the space heating ventilation air conditioning system, pre-training and a hybrid reinforcement learning algorithm based on the knowledge model and the depth certainty strategy gradient are both Markov decision processes;

ambient state s at any time t_tComprises the following steps:

wherein the content of the first and second substances,

action a at an arbitrary time t_tComprises the following steps:

a_t＝(a_1,t,a_1,t,...a_N,t)^T

the return r at any time t_tComprises the following steps:

r_t＝-C_1,t-α₁C_2,t-α₂C_3,t

and

In the control method of the space heating ventilation air conditioning system, the intelligent agent is pre-trained based on the depth certainty strategy gradient according to the knowledge model, so that the learning cost generated when the intelligent agent interacts in a real environment is reduced. Specifically, the pre-training of the agent based on the depth deterministic strategy gradient according to the knowledge model comprises the following steps:

s11: randomly initializing a neural network element of the agent;

In the control method of the space heating ventilation air conditioning system, the pre-trained intelligent agent is subjected to online iterative learning in a real environment based on a knowledge model and a hybrid reinforcement learning algorithm of a depth certainty strategy gradient, and a neural network unit is updated in real time in the iterative learning process, so that online training is more stable, and meanwhile, the learning cost can be reduced.

Specifically, the neural network unit comprises an online strategy neural network, a target strategy neural network, an online Q neural network and a target Q neural network;

the expression of the online policy neural network is:

Q(a,s|θ^Q)

the expression of the target strategy neural network is as follows:

Q’(a,s|θ^Q’)

the expression of the online Q neural network is:

μ(s|θ^μ)

the expression of the target Q neural network is:

μ'(s|θ^μ')

execution action a generated by agent at time t_tComprises the following steps:

a_t＝μ(s_t|θ^μ)+N_t

The neural network unit is updated in real time in the iterative learning process, so that the online training is more stable, and meanwhile, the learning cost can be reduced. Specifically, updating the neural network unit includes:

Specifically, the process of updating the online Q neural network is as follows:

randomly extracting K groups of data(s) from data recorded by intelligent agent_i,a_i,r_i,s_i+1) I ∈ {1, 2...., K }, wherein, s ∈ K }, where_iIs the state received at the ith iteration, a_iIs the execution action at the i-th iteration, r_iIs the reward value received at the ith iteration.

The minimization loss function is expressed as follows, r ∈ (0,1] is the discount rate:

y_i＝r_i+rQ'(s_i+1,μ'(s_i+1|θ^μ')|θ^Q)。

specifically, the expression of the strategy gradient for the process of updating the online strategy neural network is as follows:

updating the network parameters of the online strategy neural network by using a gradient ascending method, wherein the process is as follows:

where J is the expectation of cumulative discount returns and α is the learning rate.

Specifically, the process of updating the target Q neural network is as follows:

θ^Q'＝τθ^Q+(1-τ)θ^Q'

wherein 0 < τ < 1 is the update rate.

Specifically, the process of updating the target policy neural network is as follows:

θ^μ'＝τθ^μ+(1-τ)θ^μ'。

in the control method of the space heating ventilation air conditioning system, the pre-trained intelligent agent is subjected to online iterative learning in a real environment based on a knowledge model and a hybrid reinforcement learning algorithm of a depth certainty strategy gradient, and a neural network unit is updated in real time in the iterative learning process, so that online training is more stable, and meanwhile, the learning cost can be reduced. Specifically, the hybrid reinforcement learning algorithm based on the knowledge model and the depth certainty strategy gradient includes the following steps:

The building heating ventilation air-conditioning system control method provided by the invention updates the environment model based on data driving in real time in the iterative learning process, so that the online training is more stable, and meanwhile, the learning cost can be reduced. Specifically, the data-driven environment model is a multilayer artificial neural network.

(r_t,s_t+1)＝M_d(s_t,a_t)。

further, expression M of the knowledge model_kComprises the following steps:

wherein the content of the first and second substances,

and

The hybrid reinforcement learning algorithm based on the knowledge model and the depth certainty strategy gradient comprises the following steps:

s31: randomly selecting M groups of data(s)_i,a_i)，i∈{1,2,3...,M}；

the expression of the minimization loss function is:

wherein, IE [.]Representing expectation, | | · | | luminance₂Is L-2 norm.

In the control method of the space heating ventilation air conditioning system, the pre-trained intelligent agent is subjected to online iterative learning in a real environment based on a knowledge model and a hybrid reinforcement learning algorithm of a depth certainty strategy gradient.

In step S23, the pre-trained agent generates and records a real execution action according to the real initial action, inputs the real execution action into the pre-trained data-driven-based environment model, and receives and records a prediction report value of the real execution action;

if the predicted reward value is greater than or equal to the second predetermined value, then S23 is executed.

The process of regenerating the real execution action includes the following steps:

wherein the building is warmKnowledge model M of ventilation and air conditioning_kCan be expressed as:

and

The solution optimization process is as follows:

the regenerated real execution action can therefore be expressed as:

a_t＝βa_r,l,t+(1-β)a_MPC,t

wherein, here a_tIs a regenerated real execution action, a_r,l,tIs the real execution action generated by the pre-trained agent according to the real initial action, and β is the trial-and-error rate.

Example 1

In order to make the objects, features and advantages of the present invention more apparent and understandable, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the embodiment of the invention, the heating, ventilating and air conditioning system with the volume of 25m by 10m and the constant air volume and the constant air supply temperature is based on the knowledge model of the heating, ventilating and air conditioning of the building. This knowledge model is considered to be the real environment model M_r，M_rSee table 1 for specific parameters.

TABLE 1

Building heating ventilation air conditioner knowledge model M for pre-training_k，M_kSee table 2 for specific parameters.

TABLE 2

M_rThe electricity price data of the Australian Energy Market Operator (AEMO) day-ahead market and the external temperature data of the state of new south wills in australia during the period from 2018, 9 and 1 to 2018, 9 and 30 are used as dynamic models of the electricity price and the external temperature, and the distribution of the electricity price and the external temperature is shown in fig. 4.

Target agent at M_kIn the middle of 2880 iterations (60 days) of pre-training, then M_rIn the middle of480 iterations (10 days) of real training.

At M_rIn the method, different strategies are used for controlling the heating, ventilating and air conditioning system, and the comparison result of the different strategies is shown in a table 3.

TABLE 3

In summary, according to the control method of the building heating, ventilation and air conditioning system provided by the invention, a knowledge model is established for the heating, ventilation and air conditioning system with variable air volume and constant air supply temperature, an Agent (Agent) is interacted with the knowledge model, the Agent is pre-trained based on a depth certainty strategy gradient, and the learning cost generated when the Agent is interacted in a real environment is reduced in the pre-training process. Meanwhile, the pre-trained intelligent agent is subjected to online iterative learning in a real environment based on a hybrid reinforcement learning algorithm of a knowledge model and a depth certainty strategy gradient, and the environment model based on data driving is updated in real time in the iterative learning process, so that online training is more stable, and meanwhile, the learning cost can be reduced.

The first … … and the second … … are only used for name differentiation and do not represent how different the importance and position of the two are.

Here, the upper, lower, left, right, front, and rear merely represent relative positions thereof and do not represent absolute positions thereof

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A control method for a building heating, ventilating and air conditioning system is characterized by comprising the following steps:

performing online iterative learning on the pre-trained intelligent agent in a real environment based on the knowledge model and the mixed reinforcement learning algorithm of the depth certainty strategy gradient, and updating the environment model based on data driving in real time;

the knowledge model comprises a building heating ventilation air-conditioning electricity consumption cost model, a building heating ventilation air-conditioning power model, a building area temperature change model, an electricity price dynamic change model, an external temperature dynamic change model and a heating ventilation air-conditioning control model;

c_1,t＝p_f,tλ_t×Δt

the building heating ventilation air-conditioning power model is as follows:

is the total flow of air in all areas in the building;

the building area temperature change model is as follows:

H_w,z,t＝U_wA_w(T_a,t-T_z,t)

H_h,z,t＝f_z,tc_a(T_s-T_z,t)

T_z,t+1-T_z,t＝Δt(H_w,z,t+H_h,z,t)/(V_zρ_aC_a)

the electricity price dynamic change model is as follows:

wherein, ()^DThe value of the data is represented by,

the dynamic change model of the external temperature is as follows:

T_a,t+1＝T_a,t+△T^D _a,t,d

wherein the content of the first and second substances,

the heating ventilation air-conditioning control model is as follows:

is the maximum air flow that can be achieved at zone z.

2. The building heating ventilation air conditioning system control method of claim 1, wherein the pre-training and the hybrid reinforcement learning algorithm based on the knowledge model and the deep deterministic strategy gradient are both markov decision processes;

ambient state s at any time t_tComprises the following steps:

wherein the content of the first and second substances,

action a at an arbitrary time t_tComprises the following steps:

a_t＝(a_1,t,a_1,t,...a_N,t)^T

the return r at any time t_tComprises the following steps:

r_t＝-C_1,t-α₁C_2,t-α₂C_3,t

and

3. The building heating, ventilation and air conditioning system control method of claim 1, wherein the pre-training of agents based on depth-deterministic policy gradients according to the knowledge model comprises the steps of:

s11: randomly initializing a neural network element of the agent;

4. A building heating ventilation air conditioning system control method as claimed in claim 3, characterized in that: the neural network unit comprises an online strategy neural network, a target strategy neural network, an online Q neural network and a target Q neural network;

the expression of the online strategy neural network is as follows:

Q(a,s|θ^Q)

the expression of the target strategy neural network is as follows:

Q’(a,s|θ^Q’)

the expression of the online Q neural network is as follows:

μ(s|θ^μ)

the expression of the target Q neural network is as follows:

μ'(s|θ^μ′)

a_t＝μ(s_t|θ^μ)+N_t

5. The building heating, ventilating and air conditioning system control method of claim 4, wherein said updating the neural network element comprises:

6. The building heating, ventilation and air conditioning system control method of claim 5, wherein the hybrid reinforcement learning algorithm based on the knowledge model and the deep deterministic strategy gradient comprises the steps of:

7. The building heating, ventilating and air conditioning system control method as claimed in claim 6, characterized in that: the environment model based on data driving is a multilayer artificial neural network;

the above-mentionedThe input value of the data-driven environment model is an environment state s_tAnd performing action a_tThe output value of the data-driven environment model is a predicted return value r_tAnd the environmental state s at the next moment_t+1；

(r_t,s_t+1)＝M_d(s_t,a_t)。

8. the building heating, ventilating and air conditioning system control method as claimed in claim 7, wherein the expression M of the knowledge model_kComprises the following steps:

wherein the content of the first and second substances,

and

9. The building heating, ventilating and air conditioning system control method of claim 8, wherein said pre-training said data-driven based environment model according to said knowledge model comprises the steps of:

s31: randomly selecting M groups of data(s)_i,a_i)，i∈{1,2,3...,M}；

the expression of the minimization loss function is:

wherein, IE [.]Expressing expectation, | | · | | luminance₂Is L-2 norm.