CN113112077A - HVAC control system based on multi-step prediction deep reinforcement learning algorithm - Google Patents
HVAC control system based on multi-step prediction deep reinforcement learning algorithm Download PDFInfo
- Publication number
- CN113112077A CN113112077A CN202110403130.XA CN202110403130A CN113112077A CN 113112077 A CN113112077 A CN 113112077A CN 202110403130 A CN202110403130 A CN 202110403130A CN 113112077 A CN113112077 A CN 113112077A
- Authority
- CN
- China
- Prior art keywords
- neural network
- output
- value
- current
- environment temperature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 13
- 238000013528 artificial neural network Methods 0.000 claims abstract description 70
- 230000006870 function Effects 0.000 claims abstract description 29
- 238000000034 method Methods 0.000 claims abstract description 16
- 230000005611 electricity Effects 0.000 claims abstract description 11
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 230000009471 action Effects 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 13
- 238000011156 evaluation Methods 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 7
- 238000011478 gradient descent method Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 5
- 230000002159 abnormal effect Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000004378 air conditioning Methods 0.000 abstract description 2
- 230000003749 cleanliness Effects 0.000 abstract 1
- 239000003507 refrigerant Substances 0.000 abstract 1
- 230000006403 short-term memory Effects 0.000 abstract 1
- 238000009423 ventilation Methods 0.000 abstract 1
- 238000012360 testing method Methods 0.000 description 14
- 238000011161 development Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004134 energy conservation Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/20—Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E40/00—Technologies for an efficient electrical power generation, transmission or distribution
- Y02E40/70—Smart grids as climate change mitigation technology in the energy generation sector
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- General Business, Economics & Management (AREA)
- Biomedical Technology (AREA)
- Tourism & Hospitality (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Primary Health Care (AREA)
- Power Engineering (AREA)
- Air Conditioning Control Device (AREA)
- Feedback Control In General (AREA)
Abstract
The invention relates to an intelligent control method of a control system of temperature, humidity, Air cleanliness and Air-conditioning Ventilation (HVAC), in particular to an HVAC control system based on a Long Short-term Memory neural network (LSTM) and a Deep Reinforcement Learning (DRL) algorithm of a generalized mutual entropy (GC) loss function. The method comprises the following steps: the method comprises the steps of collecting outdoor environment temperature, indoor environment temperature and power grid electricity price information, preprocessing the collected data, predicting future multistep outdoor environment temperature by using outdoor environment temperature historical data, and controlling power output of the HVAC system by utilizing a Deep Deterministic strategy (DDPG) algorithm of a DRL (digital refrigerant phase) based on future outdoor temperature value, indoor environment temperature and power grid electricity price information. The invention can intelligently control the HVAC system in real time to reduce the cost of users and ensure the satisfaction degree of the users, and has higher practical engineering application value.
Description
Technical Field
The invention relates to a method for intelligently and optimally controlling an HVAC system, in particular to a research method for intelligently controlling the HVAC system based on a GC-LSTM neural network and a DRL algorithm.
Background
The household users are used as terminal users of the power grid, and the electricity utilization habits of the users and the addition of the distributed renewable energy sources directly cause the appearance of wave crests and wave troughs of the power grid; which can cause severe impact and serious threat to the power grid. With the development of the smart power grid and the implementation of a demand response strategy in recent years, the passive mode of a resident user is changed into the active mode to be added into the power grid; under the environment of the smart power grid, the electricity price information and the generating capacity information of the power grid are communicated with the demand information of the user in a two-way mode. In the family user, the power consumption of the air conditioning system accounts for about 35% of the power consumption of the whole user, so that on the premise of meeting certain comfort of the user, the output power of the HVAC system is intelligently controlled according to the power price of a power grid and the temperature information of the environment, and the method has important significance for reducing the use of the power, reducing the user cost and reducing the greenhouse effect.
At present, the HVAC system mainly adopts a traditional control mode closed-loop control and model predictive control algorithm, a temperature sensor is arranged in the closed-loop control system, when the indoor temperature is detected to reach a set value, the HVAC system stops working, the HVAC system based on the closed-loop control mode is simple to operate and easy to realize, but under the environment of an intelligent power grid and a corresponding strategy of demand, the power is difficult to be converted according to dynamic electricity prices so as to reach the standards of energy conservation and emission reduction; the model predictive control algorithm controls the HVAC system by establishing an accurate model of the indoor temperature variation, however, the complexity of the indoor ambient temperature variation affects the accuracy of the modeling. With the development of an intelligent algorithm, researchers also propose to optimize and control the HVAC system by using a particle swarm optimization algorithm and a genetic algorithm, the algorithm optimally controls the power output of the HVAC system under a real-time electricity price mechanism to reduce the cost of a user, the algorithm has the characteristic of difficult tuning, the problem that the power output of the HVAC system has time delay on the change of the indoor temperature is not considered, and the comfort level of the user is not really guaranteed. It is therefore necessary to predict the future outdoor ambient temperature values first.
Disclosure of Invention
The invention provides a method for controlling an HVAC control system based on a multi-step prediction deep reinforcement learning algorithm, aiming at the nonlinearity and randomness of outdoor environment temperature and intelligent power grid electricity price and the time delay of the HVAC system output power to the indoor environment temperature change.
The HVAC control system based on the multi-step prediction deep reinforcement learning algorithm is realized by adopting the following technical scheme, the model structure of the HVAC control system is shown in figure 1, and the HVAC control system comprises two stages of multi-step prediction of outdoor environment temperature and real-time control of indoor temperature, wherein the prediction stage of the outdoor environment temperature comprises the following steps:
the method comprises the following steps: according to actual data points of the outdoor environment, the outdoor environment temperature X ═ T at continuous i moments is selected1,…,Ti]As input to the multi-step temperature prediction model, with h ═ hi+1,…,hi+n]As the real output of the model, n is the step number of the multi-step prediction;
step two: preprocessing the acquired data, correcting abnormal data, and converting the time series data into the data of a supervision sequence;
step three: inputting the input quantity into a long-short term memory neural network based on a generalized mutual entropy loss function, and forgetting, memorizing and learning the input quantity by utilizing a forgetting gate, an input gate and an output gate of the long-short term memory neural network; the nonlinear regression model of the long-short term memory neural network based on the generalized mutual entropy loss function is described as follows:
1) converting the input quantity X to [ T ]1,…,Ti]Inputting the data into the first block of the long-short term memory neural network, and determining the input information X at the current moment and the output information h at the last moment by a sigmoid (sigma) function by a forgetting gatet-1How much can be reserved by the current block, i.e. the output of the forgetting gate is ft=σ(wf[ht-1,Xt]+bf) Wherein w isfAnd bfWeights and bias values for the neural network;
2) the input gate determines the information to be updated, and first determines the updated information i by the sigma functiont=σ(wi[ht-1,Xt]+bf) Second generating new candidate values by the tanh functionCandidate value c of the last current blocktThe output of the forgetting gate, the output of the input gate, the new candidate value and the candidate value of the last block are determined together, that is:
3) the output gate obtains the output of the model, and an initial output o is obtained by the sigma functiont=σ(wo[ht-1,Xt]+bo) Secondly, obtaining the candidate value information c from 2)tScaling to a value between-1 and 1 through an activation function tanh to finally obtain an output h of the modelt=ot*tanh(ct);
4) Calculating true value Y based on GC loss functiontAnd the predicted value htThe error between, as in the following equation:
n is the number of samples, gamma is a gamma function, alpha is more than 0 is a shape parameter, beta is more than 0 is a bandwidth parameter, a plurality of times of iterative training is carried out, the weight w and the offset value b of the neural network are updated through a minimum batch gradient descent method, and the error between a real value and a predicted value is minimized;
step four: finally, obtaining a nonlinear mapping model from the outdoor environment temperature of the first i moments to the outdoor environment temperature of the next n moments based on the long-short term memory neural network of the generalized mutual entropy loss function;
the real-time control of the indoor temperature comprises the following steps:
the method comprises the following steps: collecting the outdoor environment temperature X ═ T at i continuous moments1,…,Ti]Based on a long-short term memory neural network based on a generalized mutual entropy loss function, obtaining the outdoor environment temperature h ═ h at n continuous moments in the futurei+1,…,hi+n](ii) a Obtaining the power grid electricity price rho at the current momenttAnd indoor temperature Ti inH, rho are calculated according to the related informationtAnd Ti inAs the environment information, that is: st={h,ρt,Ti in};
Step two: the current state information StInputting to the Actor current neural network of the deep reinforcement learning DDPG algorithm, and based on the current strategy and Gaussian noiseTo select an actionat∈[Pmin,Pmax]Gaussian noise (Gaussian)Is to increase the exploration rate of the action, and is reduced along with the increase of the number of iterative cycles, thetaμIs the Actor's current neural network parameter, PminAnd PmaxMinimum and maximum output power of the HVAC system, respectively;
step three: performing action atControlling the output power of the air conditioner, the output of power from the HVAC system can change the ambient temperature in the room, such as:then obtain a timely reward rtAnd reaches the next state St+1;
Step four: will (S)t,at,rt,St+1) Storing the data into an experience pool buff-C;
step five: if the data volume of the experience pool buff-C is larger than CMRandomly taking M samples (S) from the experience pool buff-Ni,ai,ri,Si+1) I 1,2, …, M, the following steps are performed; otherwise, directly performing the step eleven;
step six: calculating the expected value y of the targeti=ri+γQ'(Si+1,μ'(Si+1|θμ')|θQ') Where μ' (S)i+1|θμ') Is to obtain the optimal action from the target neural network of the Actor, Q' (S)i+1,μ'(Si+1|θμ')|θQ') A target network Q' that is Ctric is a future target value that is output based on the state information and the optimal action information at the next time, γ is a discount factor, θμ'And thetaQ'Parameters of a target neural network of Actor and parameters of a target network of Ctric are respectively;
step seven: critic current neural network Q pair action a taken based on DDPG algorithmtPerforming evaluation to calculate an evaluation value of θQParameters of the Critic current neural network;
step eight: calculating an error value between a target desired value and an evaluation value of a sample using a root mean square errorAnd updating the parameter theta of the Critic current neural network by using a minimum batch gradient descent methodQ;
Step nine: updating Actor current neural network parameter theta by using sample strategy gradientμThe following equation:
step ten: soft copying the parameters of the current neural networks of Ctric and Actor to the target neural network parameters of Ctric and Actor respectively, namely:
θQ'←τθQ+(1-τ)θQ'
θμ'←τθμ+(1-τ)θμ'
step eleven: regarding the state at the next time as the state at the current time, that is: st←St+1Iteration loops from the first step to the eleventh step to finally obtain a converged Actor current neural network, and output the parameter theta of the neural networkμObtaining a final HVAC control system model, and then performing the step twelve;
step twelve: the current state information StInputting the current neural network of the Actor of the deep reinforcement learning DDPG algorithm, and selecting an a based on the optimal strategytPerforming action atAnd controlling the power output of the HVAC system.
Drawings
FIG. 1 is a schematic diagram of the establishment of an HVAC intelligent control system.
Fig. 2 is a graph of loss functions of the outdoor environment temperature training set and the test set in the debugging stage, where 1 represents a loss function curve of the outdoor environment temperature training set, and 2 represents a loss function curve of the outdoor environment temperature test set.
Fig. 3 is a graph showing a real value and a predicted value of the outdoor environment temperature test set at the debugging stage, where 3 represents the predicted value of the outdoor environment temperature test set, and 4 represents the real value of the outdoor environment temperature test set.
Detailed Description
The method takes the collected real environment temperature data as an experimental object to train and test the HVAC control system based on the multi-step prediction deep reinforcement learning algorithm
The HVAC control system based on the multi-step prediction deep reinforcement learning algorithm comprises two stages of multi-step prediction of outdoor environment temperature and real-time control of indoor temperature, wherein the prediction stage of the outdoor environment temperature comprises the following steps:
the method comprises the following steps: according to actual data points of the outdoor environment, selecting the outdoor environment temperature X ═ T at 6 continuous moments1,…,Ti]As input to the model, h ═ hi+1,…,hi+n]The sampling interval, as a real output of the model, is every 30 minutes.
Step two: preprocessing the acquired data, correcting abnormal data, converting the data of the time sequence into data of a supervision sequence, and dividing the data into 2500 training sets and 1000 testing sets.
Step three: setting the number of cells of the long-short term memory neural network as 100, the training times as 500, the learning rate as 0.001 and the batch of the minimum batch gradient descent method as 32;
step four: inputting the input quantity of the training set into a long-short term memory neural network based on a generalized mutual entropy loss function, and forgetting, memorizing and learning the input quantity by utilizing a forgetting gate, an input gate and an output gate of the long-short term memory neural network; the nonlinear regression process of the long-short term memory neural network based on the generalized mutual entropy loss function is described as follows:
1) converting the input quantity X to [ T ]1,…,Ti]Inputting the data into the first block of the long-short term memory neural network, and determining the input information X at the current moment and the output information h at the last moment by a sigmoid (sigma) function by a forgetting gatet-1How much can be reserved by the current block, i.e. the output of the forgetting gate is ft=σ(wf[ht-1,Xt]+bf) Wherein w isfAnd bfWeights and bias values for the neural network;
2) the input gate determines the information to be updated, and first determines the updated information i by the sigma functiont=σ(wi[ht-1,Xt]+bf) Second generating new candidate values by the tanh functionCandidate value c of the last current blocktThe output of the forgetting gate, the output of the input gate, the new candidate value and the candidate value of the last block are determined together, that is:
3) the output gate obtains the output of the model, and an initial output o is obtained by the sigma functiont=σ(wo[ht-1,Xt]+bo) Secondly, obtaining the candidate value information c from 2)tScaling to a value between-1 and 1 through an activation function tanh to finally obtain an output h of the modelt=ot*tanh(ct);
4) Calculating true value Y based on GC loss functiontAnd the predicted value htThe error between, as in the following equation:
n is the number of samples, gamma is a gamma function, alpha is more than 0 is a shape parameter, beta is more than 0 is a bandwidth parameter, a plurality of times of iterative training is carried out, the weight w and the offset value b of the neural network are updated through a minimum batch gradient descent method, and the error between a real value and a predicted value is minimized;
step five: finally, based on a long-short term memory neural network of the generalized mutual entropy loss function, obtaining a nonlinear mapping model from the outdoor environment temperature at the first moment i to the outdoor environment temperature at the future moment n, and using the accuracy of the test set test model;
step six: testing the accuracy of the model using the test set, using the root mean square error between the true and predicted values, the probability density distribution of the error, and R2As evaluation indexes of the model, they are defined as:
in the formula yi,hiFor the corresponding real and predicted values of each step,the mean value of real samples in each step, m is the number of samples in the test set, k (-) is a Gaussian kernel function,the probability density function of the error will be implemented in a sliding window approach.
The real-time control of the indoor temperature comprises the following steps:
the method comprises the following steps: acquiring the outdoor environment temperature X ═ T at 6 continuous moments1,…,Ti]Based on GC-LSTM neural network model, obtaining the outdoor environment temperature h ═ 3 moments in future continuouslyi+1,…,hi+n](ii) a Obtaining the power grid electricity price rho at the current momenttAnd indoor temperature Ti inAnd dividing the data into a training set 2500 and a test set 1000 according to the relevant information. H, rhotAnd Ti inAs the environment information, that is: st={h,ρt,Ti in};
Step two: setting a DDPG algorithm of deep reinforcement learning as four neural networks, wherein a current neural network of an Actor and a target neural network of the Actor have three layers of neural networks with the same structure, a hidden layer activation function is tanh, a current neural network of Critic and the target neural network of Critic have the same neural network structure, and the hidden layer activation function is relu;
step three: current state information S in training settThe current neural network input to the Actor based on the current strategy and Gaussian noiseTo select an actionat∈[Pmin,Pmax],PminAnd PmaxMinimum and maximum output power of the HVAC system, respectively;
step four: performing action atControlling the output power of the air conditioner and then obtaining a timely reward rtTo the next state St+1R is a prizetWill be related to the comfort of the user, as follows:
Tminand TmaxMinimum and maximum comfort temperature, λ, respectively1And λ2A weighting factor for balancing the reward;
step five: will (S)t,at,rt,St+1) Storing the data into an experience pool buff-C;
step six: then randomly take M samples (S) from the experience pool buff-Ni,ai,ri,Si+1),i=1,2,…,M;
Step seven: calculating the expectation y of the target based on the status of the next time and the action obtained by the target network of the Actori=ri+γQ'(Si+1,μ'(Si+1|θμ')|θQ');
Step eight: action a taken by Critic current neural network Q pair of DDPG algorithmtEvaluation is performed to calculate an evaluation value Q (S)t,ai|θQ);
Step nine: calculating an error value between a target desired value and an evaluation value of a sample using a root mean square errorUpdating the parameters of the Critic current neural network by using a minimum batch gradient descent method;
step ten: updating Actor current neural network parameter theta by using sample strategy gradientμ;
Step eleven: soft copying the parameters of the current neural networks of the Ctric and the Actor to the target neural network parameters of the Ctric and the Actor respectively;
θQ'←τθQ+(1-τ)θQ'
θμ'←τθμ+(1-τ)θμ'
step twelve: obtaining a convergent Actor current neural network through training of a training set, and outputting a parameter theta of the neural networkμUsing the reward value obtained by each iterative training and the error value L of each step as the judgment index of network convergence;
step thirteen: current state information S of test settInputting to the Actor current neural network of the DDPG algorithm, and selecting an a based on the optimal strategytPerforming action atAnd controlling the power output of the HVAC system, and using the power consumption cost of the HVAC system and the comfort of the user as the performance indexes of the system.
The invention has the advantages that: the long-short term memory neural network is used for predicting the temperature of the future outdoor environment, the comfort level of a user is improved, and the generalized mutual entropy loss function is used as the loss function of the long-short term memory neural network to improve the accuracy of prediction; then based on the DDPG algorithm, according to the electricity price change of the power grid, the change of the indoor temperature and the change of the future outdoor temperature, the power output of the HVAC system is intelligently adjusted, and the power consumption cost of a user is saved under the condition of ensuring the comfort degree of the user.
The above description is only an embodiment of the present invention, but the structural features of the present invention are not limited thereto, and any changes or modifications within the scope of the present invention by those skilled in the art are covered by the present invention.
Claims (1)
1. The HVAC control system based on the multi-step prediction deep reinforcement learning algorithm is characterized in that: the method comprises two stages of multi-step prediction of outdoor environment temperature and real-time control of indoor temperature, wherein the prediction stage of the outdoor environment temperature comprises the following steps:
the method comprises the following steps: according to actual data points of the outdoor environment, the outdoor environment temperature X ═ T at continuous i moments is selected1,…,Ti]As input to the multi-step temperature prediction model, with h ═ hi+1,…,hi+n]As the real output of the model, n is the step number of the multi-step prediction;
step two: preprocessing the acquired data, correcting abnormal data, and converting the time series data into the data of a supervision sequence;
step three: inputting the input quantity into a long-short term memory neural network based on a generalized mutual entropy loss function, and forgetting, memorizing and learning the input quantity by utilizing a forgetting gate, an input gate and an output gate of the long-short term memory neural network; the nonlinear regression model of the long-short term memory neural network based on the generalized mutual entropy loss function is described as follows:
1) converting the input quantity X to [ T ]1,…,Tt]Inputting the data into the first block of the long-short term memory neural network, and determining the input information X at the current moment and the output information h at the last moment by a sigmoid (sigma) function by a forgetting gatet-1How much can be reserved by the current block, i.e. the output of the forgetting gate is ft=σ(wf[ht-1,Xt]+bf) Wherein w isfAnd bfWeights and bias values for the neural network;
2) the input gate determines the information to be updated, and first determines the updated information i by the sigma functiont=σ(wi[ht-1,Xt]+bf) Second generating new candidate values by the tanh functionCandidate value c of the last current blocktThe output of the forgetting gate, the output of the input gate, the new candidate value and the candidate value of the last block are determined together, that is:
3) the output gate obtains the output of the model, and an initial output o is obtained by the sigma functiont=σ(wo[ht-1,Xt]+bo),Secondly, obtaining the candidate value information c in the step 2)tScaling to a value between-1 and 1 through an activation function tanh to finally obtain an output h of the modelt=ot*tanh(ct);
4) Calculating true value Y based on generalized mutual entropy loss functiontAnd the predicted value htThe error between, as in the following equation:
n is the number of samples, gamma is a gamma function, alpha is more than 0 is a shape parameter, beta is more than 0 is a bandwidth parameter, a plurality of times of iterative training is carried out, the weight w and the offset value b of the neural network are updated through a minimum batch gradient descent method, and the error between a real value and a predicted value is minimized;
step four: finally, obtaining a nonlinear mapping model from the outdoor environment temperature of the first i moments to the outdoor environment temperature of the next n moments based on the long-short term memory neural network of the generalized mutual entropy loss function;
the real-time control of the indoor temperature comprises the following steps:
the method comprises the following steps: collecting the outdoor environment temperature X ═ T at i continuous moments1,…,Ti]Obtaining the outdoor environment temperature h ═ h [ h ] at n continuous moments in the future based on the long-short term memory neural network of the generalized mutual entropy loss functioni+1,…,hi+n](ii) a Obtaining the power grid electricity price rho at the current momenttAnd indoor temperature Tt inH, rho are calculated according to the related informationtAnd Tt inAs the environment information, that is: st={h,ρt,Tt in};
Step two: the current state information StInputting to the Actor current neural network of the deep reinforcement learning DDPG algorithm, and based on the current strategy and Gaussian noiseTo select an actionat∈[Pmin,Pmax]Gaussian noise (Gaussian)Is to increase the exploration rate of the action, and is reduced along with the increase of the number of iterative cycles, thetaμIs the Actor's current neural network parameter, PminAnd PmaxMinimum and maximum output power of the HVAC system, respectively;
step three: performing action atControlling the output power of the air conditioner, the output of power from the HVAC system can change the ambient temperature in the room, such as:then obtain a timely reward rtAnd reaches the next state St+1;
Step four: will (S)t,at,rt,St+1) Storing the data into an experience pool buff-C;
step five: if the data volume of the experience pool buff-C is larger than CMRandomly taking M samples (S) from the experience pool buff-Ni,ai,ri,Si+1) I 1,2, …, M, the following steps are performed; otherwise, directly performing the step eleven;
step six: calculating the expected value y of the targeti=ri+γQ'(Si+1,μ'(Si+1|θμ')|θQ') Where μ' (S)i+1|θμ') Is to obtain the optimal action from the target neural network of the Actor, Q' (S)i+1,μ'(Si+1|θμ')|θQ') A target network Q' that is Ctric is a future target value that is output based on the state information and the optimal action information at the next time, γ is a discount factor, θμ'And thetaQ'Parameters of a target neural network of Actor and parameters of a target network of Ctric are respectively;
step seven: critic current neural network Q pair adopted action based on DDPG algorithmAs atEvaluation is performed to calculate an evaluation value Q (S)t,at|θQ) Wherein thetaQParameters of the Critic current neural network;
step eight: calculating an error value between a target desired value and an evaluation value of a sample using a root mean square errorAnd updating the parameter theta of the Critic current neural network by using a minimum batch gradient descent methodQ;
Step nine: updating Actor current neural network parameter theta by using sample strategy gradientμThe following equation:
step ten: soft copying the parameters of the current neural networks of Ctric and Actor to the target neural network parameters of Ctric and Actor respectively, namely:
θQ'←τθQ+(1-τ)θQ'
θμ'←τθμ+(1-τ)θμ'
step eleven: regarding the state at the next time as the state at the current time, that is: st←St+1Iteration loops from the first step to the eleventh step to finally obtain a converged Actor current neural network, and output the parameter theta of the neural networkμObtaining a final HVAC control system model, and then performing the step twelve;
step twelve: the current state information StInputting the current neural network of the Actor of the deep reinforcement learning DDPG algorithm, and selecting an a based on the optimal strategytPerforming action atAnd controlling the power output of the HVAC system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110403130.XA CN113112077B (en) | 2021-04-14 | 2021-04-14 | HVAC control system based on multi-step prediction deep reinforcement learning algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110403130.XA CN113112077B (en) | 2021-04-14 | 2021-04-14 | HVAC control system based on multi-step prediction deep reinforcement learning algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113112077A true CN113112077A (en) | 2021-07-13 |
CN113112077B CN113112077B (en) | 2022-06-10 |
Family
ID=76716975
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110403130.XA Active CN113112077B (en) | 2021-04-14 | 2021-04-14 | HVAC control system based on multi-step prediction deep reinforcement learning algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113112077B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113485498A (en) * | 2021-07-19 | 2021-10-08 | 北京工业大学 | Indoor environment comfort level adjusting method and system based on deep learning |
CN113659246A (en) * | 2021-10-20 | 2021-11-16 | 中国气象科学研究院 | Battery system suitable for polar region ultralow temperature environment and temperature control method thereof |
CN113741449A (en) * | 2021-08-30 | 2021-12-03 | 南京信息工程大学 | Multi-agent control method for air-sea cooperative observation task |
CN113940218A (en) * | 2021-09-30 | 2022-01-18 | 上海易航海芯农业科技有限公司 | Intelligent heat supply method and system for greenhouse |
CN114488811A (en) * | 2022-01-25 | 2022-05-13 | 同济大学 | Greenhouse environment energy-saving control method based on second-order Voltalla model prediction |
CN115412923A (en) * | 2022-10-28 | 2022-11-29 | 河北省科学院应用数学研究所 | Multi-source sensor data credible fusion method, system, equipment and storage medium |
TWI795283B (en) * | 2022-05-04 | 2023-03-01 | 台灣松下電器股份有限公司 | Control method of air conditioning system |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102353119A (en) * | 2011-08-09 | 2012-02-15 | 北京建筑工程学院 | Control method of VAV (variable air volume) air-conditioning system |
CN105805822A (en) * | 2016-03-24 | 2016-07-27 | 常州英集动力科技有限公司 | Heat supply energy saving control method and system based on neural network prediction |
CN105870483A (en) * | 2016-03-31 | 2016-08-17 | 华中科技大学 | Thermoelectric synergic control method of solid oxide fuel cell during power tracking process |
JP2016205739A (en) * | 2015-04-24 | 2016-12-08 | 京セラ株式会社 | Power control method, power control device and power control system |
US20190102668A1 (en) * | 2017-10-04 | 2019-04-04 | Hengshuai Yao | Method of prediction of a state of an object in the environment using an action model of a neural network |
CN110458443A (en) * | 2019-08-07 | 2019-11-15 | 南京邮电大学 | A kind of wisdom home energy management method and system based on deeply study |
US20190354071A1 (en) * | 2018-05-18 | 2019-11-21 | Johnson Controls Technology Company | Hvac control system with model driven deep learning |
CN111080002A (en) * | 2019-12-10 | 2020-04-28 | 华南理工大学 | Deep learning-based multi-step prediction method and system for building electrical load |
CN111365828A (en) * | 2020-03-06 | 2020-07-03 | 上海外高桥万国数据科技发展有限公司 | Model prediction control method for realizing energy-saving temperature control of data center by combining machine learning |
US20210049460A1 (en) * | 2019-08-15 | 2021-02-18 | Noodle Analytics, Inc. | Deep probabilistic decision machines |
CN112460741A (en) * | 2020-11-23 | 2021-03-09 | 香港中文大学(深圳) | Control method of building heating, ventilation and air conditioning system |
CN112561728A (en) * | 2020-10-28 | 2021-03-26 | 西安交通大学 | Attention mechanism LSTM-based comprehensive energy consumption cost optimization method, medium and equipment |
-
2021
- 2021-04-14 CN CN202110403130.XA patent/CN113112077B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102353119A (en) * | 2011-08-09 | 2012-02-15 | 北京建筑工程学院 | Control method of VAV (variable air volume) air-conditioning system |
JP2016205739A (en) * | 2015-04-24 | 2016-12-08 | 京セラ株式会社 | Power control method, power control device and power control system |
CN105805822A (en) * | 2016-03-24 | 2016-07-27 | 常州英集动力科技有限公司 | Heat supply energy saving control method and system based on neural network prediction |
CN105870483A (en) * | 2016-03-31 | 2016-08-17 | 华中科技大学 | Thermoelectric synergic control method of solid oxide fuel cell during power tracking process |
US20190102668A1 (en) * | 2017-10-04 | 2019-04-04 | Hengshuai Yao | Method of prediction of a state of an object in the environment using an action model of a neural network |
US20190354071A1 (en) * | 2018-05-18 | 2019-11-21 | Johnson Controls Technology Company | Hvac control system with model driven deep learning |
CN110458443A (en) * | 2019-08-07 | 2019-11-15 | 南京邮电大学 | A kind of wisdom home energy management method and system based on deeply study |
US20210049460A1 (en) * | 2019-08-15 | 2021-02-18 | Noodle Analytics, Inc. | Deep probabilistic decision machines |
CN111080002A (en) * | 2019-12-10 | 2020-04-28 | 华南理工大学 | Deep learning-based multi-step prediction method and system for building electrical load |
CN111365828A (en) * | 2020-03-06 | 2020-07-03 | 上海外高桥万国数据科技发展有限公司 | Model prediction control method for realizing energy-saving temperature control of data center by combining machine learning |
CN112561728A (en) * | 2020-10-28 | 2021-03-26 | 西安交通大学 | Attention mechanism LSTM-based comprehensive energy consumption cost optimization method, medium and equipment |
CN112460741A (en) * | 2020-11-23 | 2021-03-09 | 香港中文大学(深圳) | Control method of building heating, ventilation and air conditioning system |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113485498A (en) * | 2021-07-19 | 2021-10-08 | 北京工业大学 | Indoor environment comfort level adjusting method and system based on deep learning |
CN113741449A (en) * | 2021-08-30 | 2021-12-03 | 南京信息工程大学 | Multi-agent control method for air-sea cooperative observation task |
CN113741449B (en) * | 2021-08-30 | 2023-07-14 | 南京信息工程大学 | Multi-agent control method for sea-air collaborative observation task |
CN113940218A (en) * | 2021-09-30 | 2022-01-18 | 上海易航海芯农业科技有限公司 | Intelligent heat supply method and system for greenhouse |
CN113659246A (en) * | 2021-10-20 | 2021-11-16 | 中国气象科学研究院 | Battery system suitable for polar region ultralow temperature environment and temperature control method thereof |
CN113659246B (en) * | 2021-10-20 | 2022-01-25 | 中国气象科学研究院 | Battery system suitable for polar region ultralow temperature environment and temperature control method thereof |
CN114488811A (en) * | 2022-01-25 | 2022-05-13 | 同济大学 | Greenhouse environment energy-saving control method based on second-order Voltalla model prediction |
CN114488811B (en) * | 2022-01-25 | 2023-08-29 | 同济大学 | Greenhouse environment energy-saving control method based on second-order Woltai model prediction |
TWI795283B (en) * | 2022-05-04 | 2023-03-01 | 台灣松下電器股份有限公司 | Control method of air conditioning system |
CN115412923A (en) * | 2022-10-28 | 2022-11-29 | 河北省科学院应用数学研究所 | Multi-source sensor data credible fusion method, system, equipment and storage medium |
CN115412923B (en) * | 2022-10-28 | 2023-02-03 | 河北省科学院应用数学研究所 | Multi-source sensor data credible fusion method, system, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113112077B (en) | 2022-06-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113112077B (en) | HVAC control system based on multi-step prediction deep reinforcement learning algorithm | |
CN112614009B (en) | Power grid energy management method and system based on deep expectation Q-learning | |
CN110705743B (en) | New energy consumption electric quantity prediction method based on long-term and short-term memory neural network | |
CN110084367B (en) | Soil moisture content prediction method based on LSTM deep learning model | |
CN109659933B (en) | Electric energy quality prediction method for power distribution network with distributed power supply based on deep learning model | |
CN114370698B (en) | Indoor thermal environment learning efficiency improvement optimization control method based on reinforcement learning | |
CN113572157B (en) | User real-time autonomous energy management optimization method based on near-end policy optimization | |
CN107704875A (en) | Based on the building load Forecasting Methodology and device for improving IHCMAC neutral nets | |
CN116187601B (en) | Comprehensive energy system operation optimization method based on load prediction | |
CN112070262B (en) | Air conditioner load prediction method based on support vector machine | |
CN112926795A (en) | SBO (statistical analysis) -based CNN (continuous casting) optimization-based high-rise residential building group heat load prediction method and system | |
CN114239991A (en) | Building heat supply load prediction method, device and equipment based on data driving | |
CN111898856B (en) | Analysis method of physical-data fusion building based on extreme learning machine | |
CN114119273A (en) | Park comprehensive energy system non-invasive load decomposition method and system | |
Dong et al. | Short-term building cooling load prediction model based on DwdAdam-ILSTM algorithm: A case study of a commercial building | |
Godahewa et al. | Simulation and optimisation of air conditioning systems using machine learning | |
CN116880169A (en) | Peak power demand prediction control method based on deep reinforcement learning | |
Zhang et al. | Data-driven model predictive and reinforcement learning based control for building energy management: A survey | |
CN115169839A (en) | Heating load scheduling method based on data-physics-knowledge combined drive | |
CN113962454A (en) | LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization | |
CN114200839A (en) | Office building energy consumption intelligent control model for dynamic monitoring of coupled environmental behaviors | |
Yu et al. | Research on Intelligent Air Conditioning Optimization Control Algorithms Based on Neural Networks and Heuristic Algorithms | |
CN117973644B (en) | Distributed photovoltaic power virtual acquisition method considering optimization of reference power station | |
CN115840986B (en) | Energy management method based on stochastic model predictive control | |
Saranya et al. | AI buildings: design of artificially intelligent buildings in the energy sector with an autonomous federated learning approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |