CN113435042B

CN113435042B - Reinforced learning modeling method for demand response of building air conditioning system

Info

Publication number: CN113435042B
Application number: CN202110716683.0A
Authority: CN
Inventors: 丁研; 黄宸; 廉翔超; 吕亚聪
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2022-05-17
Anticipated expiration: 2041-06-28
Also published as: CN113435042A

Abstract

The invention discloses a reinforcement learning modeling method for demand response of a building air conditioning system, which comprises the following steps: developing an RC ash box model by utilizing known indoor temperature, refrigerating capacity, weather and personnel data of a building, and establishing a relation between the indoor temperature and the refrigerating capacity of an air conditioning system; establishing a reinforced model based on value function linear approximation for the control of the air conditioning system and the storage battery; training a reinforcement learning model intelligent agent by using meteorological and personnel data; and applying the intelligent body with complete training to a target time period to obtain a control strategy of the air conditioning system and the storage battery. The invention can reflect the thermal characteristics of the building and reduce the reference data and the prior knowledge. On the premise of ensuring the thermal comfort of indoor personnel, an air conditioning system with low energy consumption and low electricity consumption cost and a storage battery operation strategy can be provided.

Description

Reinforced learning modeling method for demand response of building air conditioning system

Technical Field

The invention belongs to the crossing field of building energy management and artificial intelligence, and particularly relates to a reinforcement learning modeling method for demand response of a building air conditioning system.

Background

The energy consumption of building operation is an important aspect of energy consumption in China, and the energy consumption of an air conditioner accounts for a large proportion in the operation of the building. But this adds complexity to the control of the air conditioning system due to delays and attenuations in the response of the building system to external weather conditions. Therefore, the air conditioner operation strategy is made based on the experience of the operator, namely the operator adjusts the air conditioner operation strategy according to the current weather condition, weather forecast, past experience, operation economy and other factors. The comfort level and the energy-saving condition of personnel are only subjectively judged, and the comfort level of indoor personnel and the reduction of energy consumption cannot be guaranteed.

At present, a plurality of methods for automatically controlling a building air conditioning system exist, and when feedback control is adopted, because the feedback control has the characteristic of delay, indoor temperature cannot be controlled in time and the system cannot be guaranteed to run in an efficient state; when a supervised machine learning algorithm is adopted for control, reference data and priori knowledge are often needed, and existing conditions in actual operation cannot meet the algorithm requirements easily. Meanwhile, most of the existing control methods only control the air conditioning system, do not relate to storage battery control, and cannot further achieve the purpose of power grid load transfer.

Disclosure of Invention

In view of the above, the present invention provides a reinforcement learning modeling method for demand response of a building air conditioning system, which implements automatic control of a building air conditioner and a storage battery system under the condition of reducing dependence on reference data and prior knowledge, so as to ensure comfort of indoor personnel and reduce energy consumption and electricity consumption of the air conditioning system.

In order to achieve the aim, the invention provides a reinforcement learning modeling method for demand response of a building air conditioning system, which comprises the following steps:

step 1): firstly, establishing an RC ash box model reflecting the change relation of indoor temperature and refrigerating capacity for a target building, wherein input variables of the model are the indoor personnel number n and the air-conditioning refrigerating capacity Q_LTotal solar radiation

Window area F_winWall (except window) area F_wallThe output variable of the model is the indoor hourly temperature T_in；

Step 2): using a known indoor time-by-time temperature T_inThe number n of indoor persons and the refrigerating capacity Q of the air conditioner_LTotal solar radiation

Window area F_winWall (except window) area F_wallTraining a gray box model by data, and identifying thermal resistance R of building wall_wThermal resistance R of window_winWall heat capacity C_wIndoor air heat capacity C_inWindow coefficient of heat gain c₁Wall heat gain coefficient c₂；

Step 3): establishing a reinforcement learning model for the air conditioner control system and the storage battery control system, wherein the interaction environment of the intelligent bodies in the model is the ash box model in the step 1), the parameters in the ash box model are the identification result in the step 2), and the input variables are the time-by-time indoor personnel number n and the air conditioner refrigerating capacity Q_LTotal solar radiation

Window area F_winWall (except window) area F_wallThe output variable is the refrigerating capacity Q of the space-time modulation system_LAnd a charging/discharging action Delta E of a storage battery control system_t；

Step 4): indoor hourly temperature T using continuous 60-day actual measurements_inThe number n of indoor persons and the refrigerating capacity Q of the air conditioner_LTotal solar radiation

Window area F_winWall (except window) area F_wallTraining the reinforcement learning model by data, and storing the training model;

step 5): will wait for the indoor hourly temperature T of the simulation time interval_inThe number n of indoor persons and the refrigerating capacity Q of the air conditioner_LTotal solar radiation

Window area F_winWall (except window) area F_wallAnd (4) inputting data into the well-trained reinforcement learning model to obtain a control strategy of the air conditioning system and the storage battery.

Further, in the step 3), the gray box model established in the step 1) is combined with a reinforcement learning algorithm to be used as a reinforcement learning agent interaction environment for modeling, and the RC gray box model expression is as follows:

further, the strong learning model in the step 3) adopts a reinforcement learning algorithm based on value function approximation, the approximation method is linear approximation, and the basis function is a gaussian kernel function.

Further, a return function for controlling the training effect of the strong learning model in the step 4) is as follows:

formula (3) r_AC,tRepresents the reward that the agent controlling the air conditioning system won at time t, formula r_ES,tWherein represents the reward obtained by the battery control agent at time t, formula (3) wherein alpha₁、α₂The parameters are respectively temperature importance and electricity consumption cost importance, omega₁、ω₂Is a time parameter respectively representing the indoor time of a person and the air conditioner starting time, P'_tTo normalize the energy consumption of the plant, P_maxThe maximum energy consumption of the equipment; lambda [ alpha ]_tAt the present time t, the electricity price is set,

as a temperature offset penalty function, Δ E_tThe amount of change in the electric energy of the storage battery is represented,

represents a penalty of the control result violating the temperature setting interval, -alpha₂×P′_t×λ_t×ω₂Represents a penalty for spending the electricity charge,

penalty for representing electricity cost of accumulator_tIndicating a penalty of a discharge amount greater than the remaining charge amount.

Advantageous effects

1. The method is used for modeling the building based on the RC ash box model, effectively expressing the thermodynamic characteristics of the building and accurately simulating the response of the indoor temperature to the air conditioner control strategy.

2. According to the invention, by establishing the reinforcement learning model, the information of interaction between the algorithm and the environment is effectively utilized, and the dependence on reference data and priori knowledge is reduced.

3. The invention utilizes the reinforcement learning model to carry out control strategy proposition and simulation on the air conditioning system and the storage battery, reduces the operation load of the air conditioning system and reduces the operation cost of the system while ensuring the thermal comfort of indoor personnel.

Drawings

FIG. 1 is a technical route diagram of a reinforcement learning modeling method for demand response of a building air conditioning system according to the present invention;

FIG. 2 is a comparison graph of the room temperature calculation results of the RC gray box model and the DEST room temperature simulation results in one embodiment of the present invention;

FIG. 3 is a diagram illustrating the variation of the accumulated reward function with the number of iterations in the training process according to an embodiment of the present invention;

FIG. 4 is a graph illustrating simulation results of indoor temperature and air conditioner energy consumption based on reinforcement learning control according to an embodiment of the present invention;

fig. 5 is a graph showing simulation results of indoor temperature, air conditioning, and charge/discharge energy consumption based on reinforcement learning control according to an embodiment of the present invention.

Detailed Description

In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.

The invention provides a reinforcement learning modeling method for demand response of a building air conditioning system, a flow chart of which is shown in figure 1, and the method comprises the following steps:

step 1: firstly, establishing an RC ash box model capable of reflecting the change relation of indoor temperature and refrigerating capacity for a building, wherein input variables of the model are the indoor personnel number n and the air-conditioning refrigerating capacity Q_LTotal solar radiation

Window area F_winWall (except window) area F_wallThe output variable of the model is the indoor hourly temperature T_in。

The gray box model expression is as follows:

step 2: using a known indoor time-by-time temperature T_inThe number n of indoor persons and the refrigerating capacity Q of the air conditioner_LTotal solar radiation

Window area F_winWall (except window) area F_wallThe data was trained on a gray box model. Using particle swarm optimization algorithm to R_w、R_win、C_w、C_in、c₁、c₂And (5) performing identification.

And 3, step 3: establishing a reinforcement learning model for the air conditioner control system and the storage battery control system, wherein the interaction environment of the intelligent bodies in the model is the ash box model in the step 1, the parameters in the ash box model are the identification result in the step 2, and the input variables are the time-by-time indoor personnel number n and the air conditioner refrigerating capacity Q_LTotal solar radiation

Window area F_winWall (except window) area F_wallThe output variable is the refrigerating capacity Q of the space-time modulation system_LAnd charging/discharging operation of secondary battery_t. The reinforcement learning model adopts a reinforcement learning algorithm based on value function approximation, the approximation method is linear approximation, and a Gaussian kernel function is used as a basis function.

And 4, step 4: indoor time-by-time temperature T actually measured by using continuous period of time_inThe number n of indoor persons and the refrigerating capacity Q of the air conditioner_LTotal solar radiation

Window area F_winWall (except window) area F_wallThe data trains the reinforcement learning model and stores the training model.

And 5: will wait for the indoor hourly temperature T of the simulation time interval_inThe number n of indoor persons and the refrigerating capacity Q of the air conditioner_LTotal solar radiation

Window area F_winWall (except window) area F_wallAnd (4) inputting data into the well-trained reinforcement learning model in the step 4 to obtain a control strategy of the air conditioning system and the storage battery.

Example (b):

step 1: firstly, establishing an RC ash box model reflecting the relation between indoor temperature and indoor refrigerating capacity for a target building;

Specifically, the calculated result of the RC gray box model in the present embodiment is shown in fig. 2 as a comparison with the room temperature of the DeST simulation. Through calculation, the RRMSE of the RC ash box model is 2.26%, which shows that the RC ash box model can accurately reflect the relation between the indoor temperature and the indoor load change.

And step 3: and establishing a reinforcement learning model for the air conditioner control system and the storage battery control system. The interactive environment of the intelligent agent in the model is the ash box model in the step 1, the parameters in the ash box model are the identification result in the step 2, and the input variables are the indoor personnel number n and the air conditioning refrigerating capacity Q_LTotal solar radiation

Window area F_winWall (except window) area F_wall。

In this embodiment, 1440 hours of hourly meteorological and personnel data for 60 days are used, and the number of iterations is set to 1000 generations. In the training, the accumulated return value finally tends to be stable, and the accumulated return value is changed along with the iteration number as shown in fig. 3.

Specifically, in this embodiment, the hourly weather and personnel data for 72 hours on a total basis on 3 days different from that in step 4 are selected, the iteration number is set to 1000 generations, the indoor temperature change and the air conditioner energy consumption are shown in fig. 4, and the indoor temperature change and the storage battery charge and discharge control strategy are shown in fig. 5.

Compared with a rule-based control strategy with the same control requirement, the strategy generated based on the reinforcement learning model can reduce the energy consumption of the air conditioning system by 9.8 percent and reduce the electricity consumption by 14.8 percent; after the storage battery is used, the electricity utilization cost can be further reduced by 29.7 percent compared with a rule-based control strategy.

Claims

1. A reinforcement learning modeling method for demand response of a building air conditioning system is characterized by comprising the following steps:

Window area F_winWall area F_wallThe output variable of the model is the indoor hourly temperature T_in；

Window area F_winWall area F_wallTraining a gray box model by data, and identifying thermal resistance R of building wall_wThermal resistance R of window_winWall heat capacity C_wIndoor air heat capacity C_inWindow coefficient of heat gain c₁Wall heat gain coefficient c₂；

Window area F_winWall area F_wallThe output variable is the refrigerating capacity Q of the space-time modulation system_LAnd a charging/discharging action Delta E of a storage battery control system_t；

Window area F_winWall area F_wallTraining the reinforcement learning model by data, and storing the training model;

Window area F_winWall area F_wallAnd (4) inputting data into the well-trained reinforcement learning model to obtain a control strategy of the air conditioning system and the storage battery.

2. The reinforcement learning modeling method for demand response of building air conditioning system according to claim 1, wherein in step 3), the gray box model established in step 1) is combined with a reinforcement learning algorithm to be used as reinforcement learning agent interaction environment modeling, and the RC gray box model expression is as follows:

3. the reinforcement learning modeling method for demand response of building air conditioning system according to claim 1, wherein the reinforcement learning model in step 3) adopts a reinforcement learning algorithm based on value function approximation, the approximation method is linear approximation, and the basis function is gaussian kernel function.

4. The method as claimed in claim 1, wherein the return function for controlling the training effect of the strong learning model in the step 4) is as follows: