CN113435042B - Reinforced learning modeling method for demand response of building air conditioning system - Google Patents

Reinforced learning modeling method for demand response of building air conditioning system Download PDF

Info

Publication number
CN113435042B
CN113435042B CN202110716683.0A CN202110716683A CN113435042B CN 113435042 B CN113435042 B CN 113435042B CN 202110716683 A CN202110716683 A CN 202110716683A CN 113435042 B CN113435042 B CN 113435042B
Authority
CN
China
Prior art keywords
indoor
wall
reinforcement learning
model
air conditioning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110716683.0A
Other languages
Chinese (zh)
Other versions
CN113435042A (en
Inventor
丁研
黄宸
廉翔超
吕亚聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202110716683.0A priority Critical patent/CN113435042B/en
Publication of CN113435042A publication Critical patent/CN113435042A/en
Application granted granted Critical
Publication of CN113435042B publication Critical patent/CN113435042B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/08Thermal analysis or thermal optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Air Conditioning Control Device (AREA)

Abstract

The invention discloses a reinforcement learning modeling method for demand response of a building air conditioning system, which comprises the following steps: developing an RC ash box model by utilizing known indoor temperature, refrigerating capacity, weather and personnel data of a building, and establishing a relation between the indoor temperature and the refrigerating capacity of an air conditioning system; establishing a reinforced model based on value function linear approximation for the control of the air conditioning system and the storage battery; training a reinforcement learning model intelligent agent by using meteorological and personnel data; and applying the intelligent body with complete training to a target time period to obtain a control strategy of the air conditioning system and the storage battery. The invention can reflect the thermal characteristics of the building and reduce the reference data and the prior knowledge. On the premise of ensuring the thermal comfort of indoor personnel, an air conditioning system with low energy consumption and low electricity consumption cost and a storage battery operation strategy can be provided.

Description

Reinforced learning modeling method for demand response of building air conditioning system
Technical Field
The invention belongs to the crossing field of building energy management and artificial intelligence, and particularly relates to a reinforcement learning modeling method for demand response of a building air conditioning system.
Background
The energy consumption of building operation is an important aspect of energy consumption in China, and the energy consumption of an air conditioner accounts for a large proportion in the operation of the building. But this adds complexity to the control of the air conditioning system due to delays and attenuations in the response of the building system to external weather conditions. Therefore, the air conditioner operation strategy is made based on the experience of the operator, namely the operator adjusts the air conditioner operation strategy according to the current weather condition, weather forecast, past experience, operation economy and other factors. The comfort level and the energy-saving condition of personnel are only subjectively judged, and the comfort level of indoor personnel and the reduction of energy consumption cannot be guaranteed.
At present, a plurality of methods for automatically controlling a building air conditioning system exist, and when feedback control is adopted, because the feedback control has the characteristic of delay, indoor temperature cannot be controlled in time and the system cannot be guaranteed to run in an efficient state; when a supervised machine learning algorithm is adopted for control, reference data and priori knowledge are often needed, and existing conditions in actual operation cannot meet the algorithm requirements easily. Meanwhile, most of the existing control methods only control the air conditioning system, do not relate to storage battery control, and cannot further achieve the purpose of power grid load transfer.
Disclosure of Invention
In view of the above, the present invention provides a reinforcement learning modeling method for demand response of a building air conditioning system, which implements automatic control of a building air conditioner and a storage battery system under the condition of reducing dependence on reference data and prior knowledge, so as to ensure comfort of indoor personnel and reduce energy consumption and electricity consumption of the air conditioning system.
In order to achieve the aim, the invention provides a reinforcement learning modeling method for demand response of a building air conditioning system, which comprises the following steps:
step 1): firstly, establishing an RC ash box model reflecting the change relation of indoor temperature and refrigerating capacity for a target building, wherein input variables of the model are the indoor personnel number n and the air-conditioning refrigerating capacity QLTotal solar radiation
Figure BDA0003135110760000011
Window area FwinWall (except window) area FwallThe output variable of the model is the indoor hourly temperature Tin
Step 2): using a known indoor time-by-time temperature TinThe number n of indoor persons and the refrigerating capacity Q of the air conditionerLTotal solar radiation
Figure BDA0003135110760000012
Window area FwinWall (except window) area FwallTraining a gray box model by data, and identifying thermal resistance R of building wallwThermal resistance R of windowwinWall heat capacity CwIndoor air heat capacity CinWindow coefficient of heat gain c1Wall heat gain coefficient c2
Step 3): establishing a reinforcement learning model for the air conditioner control system and the storage battery control system, wherein the interaction environment of the intelligent bodies in the model is the ash box model in the step 1), the parameters in the ash box model are the identification result in the step 2), and the input variables are the time-by-time indoor personnel number n and the air conditioner refrigerating capacity QLTotal solar radiation
Figure BDA0003135110760000028
Window area FwinWall (except window) area FwallThe output variable is the refrigerating capacity Q of the space-time modulation systemLAnd a charging/discharging action Delta E of a storage battery control systemt
Step 4): indoor hourly temperature T using continuous 60-day actual measurementsinThe number n of indoor persons and the refrigerating capacity Q of the air conditionerLTotal solar radiation
Figure BDA0003135110760000029
Window area FwinWall (except window) area FwallTraining the reinforcement learning model by data, and storing the training model;
step 5): will wait for the indoor hourly temperature T of the simulation time intervalinThe number n of indoor persons and the refrigerating capacity Q of the air conditionerLTotal solar radiation
Figure BDA00031351107600000210
Window area FwinWall (except window) area FwallAnd (4) inputting data into the well-trained reinforcement learning model to obtain a control strategy of the air conditioning system and the storage battery.
Further, in the step 3), the gray box model established in the step 1) is combined with a reinforcement learning algorithm to be used as a reinforcement learning agent interaction environment for modeling, and the RC gray box model expression is as follows:
Figure BDA0003135110760000021
Figure BDA0003135110760000022
further, the strong learning model in the step 3) adopts a reinforcement learning algorithm based on value function approximation, the approximation method is linear approximation, and the basis function is a gaussian kernel function.
Further, a return function for controlling the training effect of the strong learning model in the step 4) is as follows:
Figure BDA0003135110760000023
Figure BDA0003135110760000024
formula (3) rAC,tRepresents the reward that the agent controlling the air conditioning system won at time t, formula rES,tWherein represents the reward obtained by the battery control agent at time t, formula (3) wherein alpha1、α2The parameters are respectively temperature importance and electricity consumption cost importance, omega1、ω2Is a time parameter respectively representing the indoor time of a person and the air conditioner starting time, P'tTo normalize the energy consumption of the plant, PmaxThe maximum energy consumption of the equipment; lambda [ alpha ]tAt the present time t, the electricity price is set,
Figure BDA0003135110760000025
as a temperature offset penalty function, Δ EtThe amount of change in the electric energy of the storage battery is represented,
Figure BDA0003135110760000026
represents a penalty of the control result violating the temperature setting interval, -alpha2×P′t×λt×ω2Represents a penalty for spending the electricity charge,
Figure BDA0003135110760000027
penalty for representing electricity cost of accumulatortIndicating a penalty of a discharge amount greater than the remaining charge amount.
Advantageous effects
1. The method is used for modeling the building based on the RC ash box model, effectively expressing the thermodynamic characteristics of the building and accurately simulating the response of the indoor temperature to the air conditioner control strategy.
2. According to the invention, by establishing the reinforcement learning model, the information of interaction between the algorithm and the environment is effectively utilized, and the dependence on reference data and priori knowledge is reduced.
3. The invention utilizes the reinforcement learning model to carry out control strategy proposition and simulation on the air conditioning system and the storage battery, reduces the operation load of the air conditioning system and reduces the operation cost of the system while ensuring the thermal comfort of indoor personnel.
Drawings
FIG. 1 is a technical route diagram of a reinforcement learning modeling method for demand response of a building air conditioning system according to the present invention;
FIG. 2 is a comparison graph of the room temperature calculation results of the RC gray box model and the DEST room temperature simulation results in one embodiment of the present invention;
FIG. 3 is a diagram illustrating the variation of the accumulated reward function with the number of iterations in the training process according to an embodiment of the present invention;
FIG. 4 is a graph illustrating simulation results of indoor temperature and air conditioner energy consumption based on reinforcement learning control according to an embodiment of the present invention;
fig. 5 is a graph showing simulation results of indoor temperature, air conditioning, and charge/discharge energy consumption based on reinforcement learning control according to an embodiment of the present invention.
Detailed Description
In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.
The invention provides a reinforcement learning modeling method for demand response of a building air conditioning system, a flow chart of which is shown in figure 1, and the method comprises the following steps:
step 1: firstly, establishing an RC ash box model capable of reflecting the change relation of indoor temperature and refrigerating capacity for a building, wherein input variables of the model are the indoor personnel number n and the air-conditioning refrigerating capacity QLTotal solar radiation
Figure BDA0003135110760000033
Window area FwinWall (except window) area FwallThe output variable of the model is the indoor hourly temperature Tin
The gray box model expression is as follows:
Figure BDA0003135110760000031
Figure BDA0003135110760000032
step 2: using a known indoor time-by-time temperature TinThe number n of indoor persons and the refrigerating capacity Q of the air conditionerLTotal solar radiation
Figure BDA0003135110760000034
Window area FwinWall (except window) area FwallThe data was trained on a gray box model. Using particle swarm optimization algorithm to Rw、Rwin、Cw、Cin、c1、c2And (5) performing identification.
And 3, step 3: establishing a reinforcement learning model for the air conditioner control system and the storage battery control system, wherein the interaction environment of the intelligent bodies in the model is the ash box model in the step 1, the parameters in the ash box model are the identification result in the step 2, and the input variables are the time-by-time indoor personnel number n and the air conditioner refrigerating capacity QLTotal solar radiation
Figure BDA0003135110760000041
Window area FwinWall (except window) area FwallThe output variable is the refrigerating capacity Q of the space-time modulation systemLAnd charging/discharging operation of secondary batteryt. The reinforcement learning model adopts a reinforcement learning algorithm based on value function approximation, the approximation method is linear approximation, and a Gaussian kernel function is used as a basis function.
And 4, step 4: indoor time-by-time temperature T actually measured by using continuous period of timeinThe number n of indoor persons and the refrigerating capacity Q of the air conditionerLTotal solar radiation
Figure BDA0003135110760000042
Window area FwinWall (except window) area FwallThe data trains the reinforcement learning model and stores the training model.
And 5: will wait for the indoor hourly temperature T of the simulation time intervalinThe number n of indoor persons and the refrigerating capacity Q of the air conditionerLTotal solar radiation
Figure BDA0003135110760000043
Window area FwinWall (except window) area FwallAnd (4) inputting data into the well-trained reinforcement learning model in the step 4 to obtain a control strategy of the air conditioning system and the storage battery.
Example (b):
step 1: firstly, establishing an RC ash box model reflecting the relation between indoor temperature and indoor refrigerating capacity for a target building;
step 2: using a known indoor time-by-time temperature TinThe number n of indoor persons and the refrigerating capacity Q of the air conditionerLTotal solar radiation
Figure BDA0003135110760000044
Window area FwinWall (except window) area FwallThe data was trained on a gray box model. Using particle swarm optimization algorithm to Rw、Rwin、Cw、Cin、c1、c2And (5) performing identification.
Specifically, the calculated result of the RC gray box model in the present embodiment is shown in fig. 2 as a comparison with the room temperature of the DeST simulation. Through calculation, the RRMSE of the RC ash box model is 2.26%, which shows that the RC ash box model can accurately reflect the relation between the indoor temperature and the indoor load change.
And step 3: and establishing a reinforcement learning model for the air conditioner control system and the storage battery control system. The interactive environment of the intelligent agent in the model is the ash box model in the step 1, the parameters in the ash box model are the identification result in the step 2, and the input variables are the indoor personnel number n and the air conditioning refrigerating capacity QLTotal solar radiation
Figure BDA0003135110760000045
Window area FwinWall (except window) area Fwall
And 4, step 4: indoor time-by-time temperature T actually measured by using continuous period of timeinThe number n of indoor persons and the refrigerating capacity Q of the air conditionerLTotal solar radiation
Figure BDA0003135110760000046
Window area FwinWall (except window) area FwallThe data trains the reinforcement learning model and stores the training model.
In this embodiment, 1440 hours of hourly meteorological and personnel data for 60 days are used, and the number of iterations is set to 1000 generations. In the training, the accumulated return value finally tends to be stable, and the accumulated return value is changed along with the iteration number as shown in fig. 3.
And 5: will wait for the indoor hourly temperature T of the simulation time intervalinThe number n of indoor persons and the refrigerating capacity Q of the air conditionerLTotal solar radiation
Figure BDA0003135110760000047
Window area FwinWall (except window) area FwallAnd (4) inputting data into the well-trained reinforcement learning model in the step 4 to obtain a control strategy of the air conditioning system and the storage battery.
Specifically, in this embodiment, the hourly weather and personnel data for 72 hours on a total basis on 3 days different from that in step 4 are selected, the iteration number is set to 1000 generations, the indoor temperature change and the air conditioner energy consumption are shown in fig. 4, and the indoor temperature change and the storage battery charge and discharge control strategy are shown in fig. 5.
Compared with a rule-based control strategy with the same control requirement, the strategy generated based on the reinforcement learning model can reduce the energy consumption of the air conditioning system by 9.8 percent and reduce the electricity consumption by 14.8 percent; after the storage battery is used, the electricity utilization cost can be further reduced by 29.7 percent compared with a rule-based control strategy.

Claims (4)

1. A reinforcement learning modeling method for demand response of a building air conditioning system is characterized by comprising the following steps:
step 1): firstly, establishing an RC ash box model reflecting the change relation of indoor temperature and refrigerating capacity for a target building, wherein input variables of the model are the indoor personnel number n and the air-conditioning refrigerating capacity QLTotal solar radiation
Figure FDA0003564982560000011
Window area FwinWall area FwallThe output variable of the model is the indoor hourly temperature Tin
Step 2): using a known indoor time-by-time temperature TinThe number n of indoor persons and the refrigerating capacity Q of the air conditionerLTotal solar radiation
Figure FDA0003564982560000012
Window area FwinWall area FwallTraining a gray box model by data, and identifying thermal resistance R of building wallwThermal resistance R of windowwinWall heat capacity CwIndoor air heat capacity CinWindow coefficient of heat gain c1Wall heat gain coefficient c2
Step 3): establishing a reinforcement learning model for the air conditioner control system and the storage battery control system, wherein the interaction environment of the intelligent bodies in the model is the ash box model in the step 1), the parameters in the ash box model are the identification result in the step 2), and the input variables are the time-by-time indoor personnel number n and the air conditioner refrigerating capacity QLTotal solar radiation
Figure FDA0003564982560000013
Window area FwinWall area FwallThe output variable is the refrigerating capacity Q of the space-time modulation systemLAnd a charging/discharging action Delta E of a storage battery control systemt
Step 4): indoor hourly temperature T using continuous 60-day actual measurementsinThe number n of indoor persons and the refrigerating capacity Q of the air conditionerLTotal solar radiation
Figure FDA0003564982560000014
Window area FwinWall area FwallTraining the reinforcement learning model by data, and storing the training model;
step 5): will wait for the indoor hourly temperature T of the simulation time intervalinThe number n of indoor persons and the refrigerating capacity Q of the air conditionerLTotal solar radiation
Figure FDA0003564982560000015
Window area FwinWall area FwallAnd (4) inputting data into the well-trained reinforcement learning model to obtain a control strategy of the air conditioning system and the storage battery.
2. The reinforcement learning modeling method for demand response of building air conditioning system according to claim 1, wherein in step 3), the gray box model established in step 1) is combined with a reinforcement learning algorithm to be used as reinforcement learning agent interaction environment modeling, and the RC gray box model expression is as follows:
Figure FDA0003564982560000016
Figure FDA0003564982560000017
3. the reinforcement learning modeling method for demand response of building air conditioning system according to claim 1, wherein the reinforcement learning model in step 3) adopts a reinforcement learning algorithm based on value function approximation, the approximation method is linear approximation, and the basis function is gaussian kernel function.
4. The method as claimed in claim 1, wherein the return function for controlling the training effect of the strong learning model in the step 4) is as follows:
Figure FDA0003564982560000021
Figure FDA0003564982560000022
formula (3) rAC,tRepresents the reward that the agent controlling the air conditioning system won at time t, formula rES,tWherein represents the reward obtained by the battery control agent at time t, formula (3) wherein alpha1、α2The parameters are respectively temperature importance and electricity consumption cost importance, omega1、ω2Is a time parameter respectively representing the indoor time of a person and the air conditioner starting time, P'tTo normalize the energy consumption of the plant, PmaxThe maximum energy consumption of the equipment; lambda [ alpha ]tAt the present time t, the electricity price is set,
Figure FDA0003564982560000023
as a temperature offset penalty function, Δ EtThe amount of change in the electric energy of the storage battery is represented,
Figure FDA0003564982560000024
represents a penalty of the control result violating the temperature setting interval, -alpha2×P′t×λt×ω2Represents a penalty for spending the electricity charge,
Figure FDA0003564982560000025
penalty for representing electricity cost of accumulatortIndicating a penalty of a discharge amount greater than the remaining charge amount.
CN202110716683.0A 2021-06-28 2021-06-28 Reinforced learning modeling method for demand response of building air conditioning system Active CN113435042B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110716683.0A CN113435042B (en) 2021-06-28 2021-06-28 Reinforced learning modeling method for demand response of building air conditioning system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110716683.0A CN113435042B (en) 2021-06-28 2021-06-28 Reinforced learning modeling method for demand response of building air conditioning system

Publications (2)

Publication Number Publication Date
CN113435042A CN113435042A (en) 2021-09-24
CN113435042B true CN113435042B (en) 2022-05-17

Family

ID=77754821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110716683.0A Active CN113435042B (en) 2021-06-28 2021-06-28 Reinforced learning modeling method for demand response of building air conditioning system

Country Status (1)

Country Link
CN (1) CN113435042B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115018184B (en) * 2022-06-28 2024-04-05 天津大学 Double-layer optimal scheduling method for air conditioning system based on demand response

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109347149A (en) * 2018-09-20 2019-02-15 国网河南省电力公司电力科学研究院 Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning
CN109617205A (en) * 2018-11-28 2019-04-12 江苏理工学院 The cooperative control method of electric car composite power source power distribution
CN112460741A (en) * 2020-11-23 2021-03-09 香港中文大学(深圳) Control method of building heating, ventilation and air conditioning system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108895633A (en) * 2018-05-08 2018-11-27 林兴斌 Using building structure as the central air conditioner system control method of cool storage medium
US11002202B2 (en) * 2018-08-21 2021-05-11 Cummins Inc. Deep reinforcement learning for air handling control
CN110781551A (en) * 2019-11-14 2020-02-11 长安大学 Simulation method for thermal process of cold supply room with embedded pipe type enclosure structure
US11573540B2 (en) * 2019-12-23 2023-02-07 Johnson Controls Tyco IP Holdings LLP Methods and systems for training HVAC control using surrogate model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109347149A (en) * 2018-09-20 2019-02-15 国网河南省电力公司电力科学研究院 Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning
CN109617205A (en) * 2018-11-28 2019-04-12 江苏理工学院 The cooperative control method of electric car composite power source power distribution
CN112460741A (en) * 2020-11-23 2021-03-09 香港中文大学(深圳) Control method of building heating, ventilation and air conditioning system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"RC热网络建筑能耗预测模型综述";石欣等;《仪器仪表学报》;20141231;第35卷(第12期);全文 *
"Reinforcement learning for building controls: The opportunities and challenges";Zhe Wang.etc.;《Applied Energy》;20200701;全文 *

Also Published As

Publication number Publication date
CN113435042A (en) 2021-09-24

Similar Documents

Publication Publication Date Title
CN110458443B (en) Smart home energy management method and system based on deep reinforcement learning
US11610214B2 (en) Deep reinforcement learning based real-time scheduling of Energy Storage System (ESS) in commercial campus
CN106487011A (en) A kind of based on the family of Q study microgrid energy optimization method
Li et al. Reinforcement learning of room temperature set-point of thermal storage air-conditioning system with demand response
CN113112077B (en) HVAC control system based on multi-step prediction deep reinforcement learning algorithm
CN114370698B (en) Indoor thermal environment learning efficiency improvement optimization control method based on reinforcement learning
CN113572157A (en) User real-time autonomous energy management optimization method based on near-end policy optimization
CN105356491A (en) Power fluctuation smoothening method based on optimum control of energy storage and virtual energy storage
Georgiou et al. Implementing artificial neural networks in energy building applications—A review
CN116663820A (en) Comprehensive energy system energy management method under demand response
CN110097217A (en) A kind of building dynamic Room Temperature Prediction method based on equivalent RC model
CN113435042B (en) Reinforced learning modeling method for demand response of building air conditioning system
CN114623569A (en) Cluster air conditioner load differentiation regulation and control method based on deep reinforcement learning
CN112560160A (en) Model and data-driven heating ventilation air conditioner optimal set temperature obtaining method and equipment
CN115840987A (en) Hybrid vehicle thermal management strategy generation method based on deep reinforcement learning
CN117172499A (en) Smart community energy optimal scheduling method, system and storage medium based on reinforcement learning
CN115882463A (en) Commercial building air conditioner load schedulable potential evaluation method
CN114498649A (en) Active power distribution network building thermal load control method and device, electronic equipment and storage medium
CN112288161A (en) Method and device for optimizing peak-shifting electricity consumption of residents
CN115169839A (en) Heating load scheduling method based on data-physics-knowledge combined drive
CN116734424B (en) Indoor thermal environment control method based on RC model and deep reinforcement learning
CN115840986B (en) Energy management method based on stochastic model predictive control
Ozawa et al. Data-driven HVAC Control Using Symbolic Regression: Design and Implementation
CN114781274B (en) Comprehensive energy system control optimization method and system for simulation and decision alternate learning
CN117112202A (en) Virtual power plant distributed resource scheduling method based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant