CN113112077A - HVAC control system based on multi-step prediction deep reinforcement learning algorithm - Google Patents

HVAC control system based on multi-step prediction deep reinforcement learning algorithm Download PDF

Info

Publication number
CN113112077A
CN113112077A CN202110403130.XA CN202110403130A CN113112077A CN 113112077 A CN113112077 A CN 113112077A CN 202110403130 A CN202110403130 A CN 202110403130A CN 113112077 A CN113112077 A CN 113112077A
Authority
CN
China
Prior art keywords
neural network
output
value
current
environment temperature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110403130.XA
Other languages
Chinese (zh)
Other versions
CN113112077B (en
Inventor
任密蜂
刘祥飞
杨之乐
张建华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiyuan University of Technology
Original Assignee
Taiyuan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiyuan University of Technology filed Critical Taiyuan University of Technology
Priority to CN202110403130.XA priority Critical patent/CN113112077B/en
Publication of CN113112077A publication Critical patent/CN113112077A/en
Application granted granted Critical
Publication of CN113112077B publication Critical patent/CN113112077B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • General Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Power Engineering (AREA)
  • Air Conditioning Control Device (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention relates to an intelligent control method of a control system of temperature, humidity, Air cleanliness and Air-conditioning Ventilation (HVAC), in particular to an HVAC control system based on a Long Short-term Memory neural network (LSTM) and a Deep Reinforcement Learning (DRL) algorithm of a generalized mutual entropy (GC) loss function. The method comprises the following steps: the method comprises the steps of collecting outdoor environment temperature, indoor environment temperature and power grid electricity price information, preprocessing the collected data, predicting future multistep outdoor environment temperature by using outdoor environment temperature historical data, and controlling power output of the HVAC system by utilizing a Deep Deterministic strategy (DDPG) algorithm of a DRL (digital refrigerant phase) based on future outdoor temperature value, indoor environment temperature and power grid electricity price information. The invention can intelligently control the HVAC system in real time to reduce the cost of users and ensure the satisfaction degree of the users, and has higher practical engineering application value.

Description

HVAC control system based on multi-step prediction deep reinforcement learning algorithm
Technical Field
The invention relates to a method for intelligently and optimally controlling an HVAC system, in particular to a research method for intelligently controlling the HVAC system based on a GC-LSTM neural network and a DRL algorithm.
Background
The household users are used as terminal users of the power grid, and the electricity utilization habits of the users and the addition of the distributed renewable energy sources directly cause the appearance of wave crests and wave troughs of the power grid; which can cause severe impact and serious threat to the power grid. With the development of the smart power grid and the implementation of a demand response strategy in recent years, the passive mode of a resident user is changed into the active mode to be added into the power grid; under the environment of the smart power grid, the electricity price information and the generating capacity information of the power grid are communicated with the demand information of the user in a two-way mode. In the family user, the power consumption of the air conditioning system accounts for about 35% of the power consumption of the whole user, so that on the premise of meeting certain comfort of the user, the output power of the HVAC system is intelligently controlled according to the power price of a power grid and the temperature information of the environment, and the method has important significance for reducing the use of the power, reducing the user cost and reducing the greenhouse effect.
At present, the HVAC system mainly adopts a traditional control mode closed-loop control and model predictive control algorithm, a temperature sensor is arranged in the closed-loop control system, when the indoor temperature is detected to reach a set value, the HVAC system stops working, the HVAC system based on the closed-loop control mode is simple to operate and easy to realize, but under the environment of an intelligent power grid and a corresponding strategy of demand, the power is difficult to be converted according to dynamic electricity prices so as to reach the standards of energy conservation and emission reduction; the model predictive control algorithm controls the HVAC system by establishing an accurate model of the indoor temperature variation, however, the complexity of the indoor ambient temperature variation affects the accuracy of the modeling. With the development of an intelligent algorithm, researchers also propose to optimize and control the HVAC system by using a particle swarm optimization algorithm and a genetic algorithm, the algorithm optimally controls the power output of the HVAC system under a real-time electricity price mechanism to reduce the cost of a user, the algorithm has the characteristic of difficult tuning, the problem that the power output of the HVAC system has time delay on the change of the indoor temperature is not considered, and the comfort level of the user is not really guaranteed. It is therefore necessary to predict the future outdoor ambient temperature values first.
Disclosure of Invention
The invention provides a method for controlling an HVAC control system based on a multi-step prediction deep reinforcement learning algorithm, aiming at the nonlinearity and randomness of outdoor environment temperature and intelligent power grid electricity price and the time delay of the HVAC system output power to the indoor environment temperature change.
The HVAC control system based on the multi-step prediction deep reinforcement learning algorithm is realized by adopting the following technical scheme, the model structure of the HVAC control system is shown in figure 1, and the HVAC control system comprises two stages of multi-step prediction of outdoor environment temperature and real-time control of indoor temperature, wherein the prediction stage of the outdoor environment temperature comprises the following steps:
the method comprises the following steps: according to actual data points of the outdoor environment, the outdoor environment temperature X ═ T at continuous i moments is selected1,…,Ti]As input to the multi-step temperature prediction model, with h ═ hi+1,…,hi+n]As the real output of the model, n is the step number of the multi-step prediction;
step two: preprocessing the acquired data, correcting abnormal data, and converting the time series data into the data of a supervision sequence;
step three: inputting the input quantity into a long-short term memory neural network based on a generalized mutual entropy loss function, and forgetting, memorizing and learning the input quantity by utilizing a forgetting gate, an input gate and an output gate of the long-short term memory neural network; the nonlinear regression model of the long-short term memory neural network based on the generalized mutual entropy loss function is described as follows:
1) converting the input quantity X to [ T ]1,…,Ti]Inputting the data into the first block of the long-short term memory neural network, and determining the input information X at the current moment and the output information h at the last moment by a sigmoid (sigma) function by a forgetting gatet-1How much can be reserved by the current block, i.e. the output of the forgetting gate is ft=σ(wf[ht-1,Xt]+bf) Wherein w isfAnd bfWeights and bias values for the neural network;
2) the input gate determines the information to be updated, and first determines the updated information i by the sigma functiont=σ(wi[ht-1,Xt]+bf) Second generating new candidate values by the tanh function
Figure BDA0003020421850000021
Candidate value c of the last current blocktThe output of the forgetting gate, the output of the input gate, the new candidate value and the candidate value of the last block are determined together, that is:
Figure BDA0003020421850000022
3) the output gate obtains the output of the model, and an initial output o is obtained by the sigma functiont=σ(wo[ht-1,Xt]+bo) Secondly, obtaining the candidate value information c from 2)tScaling to a value between-1 and 1 through an activation function tanh to finally obtain an output h of the modelt=ot*tanh(ct);
4) Calculating true value Y based on GC loss functiontAnd the predicted value htThe error between, as in the following equation:
Figure BDA0003020421850000023
n is the number of samples, gamma is a gamma function, alpha is more than 0 is a shape parameter, beta is more than 0 is a bandwidth parameter, a plurality of times of iterative training is carried out, the weight w and the offset value b of the neural network are updated through a minimum batch gradient descent method, and the error between a real value and a predicted value is minimized;
step four: finally, obtaining a nonlinear mapping model from the outdoor environment temperature of the first i moments to the outdoor environment temperature of the next n moments based on the long-short term memory neural network of the generalized mutual entropy loss function;
the real-time control of the indoor temperature comprises the following steps:
the method comprises the following steps: collecting the outdoor environment temperature X ═ T at i continuous moments1,…,Ti]Based on a long-short term memory neural network based on a generalized mutual entropy loss function, obtaining the outdoor environment temperature h ═ h at n continuous moments in the futurei+1,…,hi+n](ii) a Obtaining the power grid electricity price rho at the current momenttAnd indoor temperature Ti inH, rho are calculated according to the related informationtAnd Ti inAs the environment information, that is: st={h,ρt,Ti in};
Step two: the current state information StInputting to the Actor current neural network of the deep reinforcement learning DDPG algorithm, and based on the current strategy and Gaussian noise
Figure BDA0003020421850000027
To select an action
Figure BDA0003020421850000028
at∈[Pmin,Pmax]Gaussian noise (Gaussian)
Figure BDA0003020421850000029
Is to increase the exploration rate of the action, and is reduced along with the increase of the number of iterative cycles, thetaμIs the Actor's current neural network parameter, PminAnd PmaxMinimum and maximum output power of the HVAC system, respectively;
step three: performing action atControlling the output power of the air conditioner, the output of power from the HVAC system can change the ambient temperature in the room, such as:
Figure BDA00030204218500000210
then obtain a timely reward rtAnd reaches the next state St+1
Step four: will (S)t,at,rt,St+1) Storing the data into an experience pool buff-C;
step five: if the data volume of the experience pool buff-C is larger than CMRandomly taking M samples (S) from the experience pool buff-Ni,ai,ri,Si+1) I 1,2, …, M, the following steps are performed; otherwise, directly performing the step eleven;
step six: calculating the expected value y of the targeti=ri+γQ'(Si+1,μ'(Si+1μ')|θQ') Where μ' (S)i+1μ') Is to obtain the optimal action from the target neural network of the Actor, Q' (S)i+1,μ'(Si+1μ')|θQ') A target network Q' that is Ctric is a future target value that is output based on the state information and the optimal action information at the next time, γ is a discount factor, θμ'And thetaQ'Parameters of a target neural network of Actor and parameters of a target network of Ctric are respectively;
step seven: critic current neural network Q pair action a taken based on DDPG algorithmtPerforming evaluation to calculate an evaluation value of θQParameters of the Critic current neural network;
step eight: calculating an error value between a target desired value and an evaluation value of a sample using a root mean square error
Figure BDA0003020421850000031
And updating the parameter theta of the Critic current neural network by using a minimum batch gradient descent methodQ
Step nine: updating Actor current neural network parameter theta by using sample strategy gradientμThe following equation:
Figure BDA0003020421850000032
step ten: soft copying the parameters of the current neural networks of Ctric and Actor to the target neural network parameters of Ctric and Actor respectively, namely:
θQ'←τθQ+(1-τ)θQ'
θμ'←τθμ+(1-τ)θμ'
step eleven: regarding the state at the next time as the state at the current time, that is: st←St+1Iteration loops from the first step to the eleventh step to finally obtain a converged Actor current neural network, and output the parameter theta of the neural networkμObtaining a final HVAC control system model, and then performing the step twelve;
step twelve: the current state information StInputting the current neural network of the Actor of the deep reinforcement learning DDPG algorithm, and selecting an a based on the optimal strategytPerforming action atAnd controlling the power output of the HVAC system.
Drawings
FIG. 1 is a schematic diagram of the establishment of an HVAC intelligent control system.
Fig. 2 is a graph of loss functions of the outdoor environment temperature training set and the test set in the debugging stage, where 1 represents a loss function curve of the outdoor environment temperature training set, and 2 represents a loss function curve of the outdoor environment temperature test set.
Fig. 3 is a graph showing a real value and a predicted value of the outdoor environment temperature test set at the debugging stage, where 3 represents the predicted value of the outdoor environment temperature test set, and 4 represents the real value of the outdoor environment temperature test set.
Detailed Description
The method takes the collected real environment temperature data as an experimental object to train and test the HVAC control system based on the multi-step prediction deep reinforcement learning algorithm
The HVAC control system based on the multi-step prediction deep reinforcement learning algorithm comprises two stages of multi-step prediction of outdoor environment temperature and real-time control of indoor temperature, wherein the prediction stage of the outdoor environment temperature comprises the following steps:
the method comprises the following steps: according to actual data points of the outdoor environment, selecting the outdoor environment temperature X ═ T at 6 continuous moments1,…,Ti]As input to the model, h ═ hi+1,…,hi+n]The sampling interval, as a real output of the model, is every 30 minutes.
Step two: preprocessing the acquired data, correcting abnormal data, converting the data of the time sequence into data of a supervision sequence, and dividing the data into 2500 training sets and 1000 testing sets.
Step three: setting the number of cells of the long-short term memory neural network as 100, the training times as 500, the learning rate as 0.001 and the batch of the minimum batch gradient descent method as 32;
step four: inputting the input quantity of the training set into a long-short term memory neural network based on a generalized mutual entropy loss function, and forgetting, memorizing and learning the input quantity by utilizing a forgetting gate, an input gate and an output gate of the long-short term memory neural network; the nonlinear regression process of the long-short term memory neural network based on the generalized mutual entropy loss function is described as follows:
1) converting the input quantity X to [ T ]1,…,Ti]Inputting the data into the first block of the long-short term memory neural network, and determining the input information X at the current moment and the output information h at the last moment by a sigmoid (sigma) function by a forgetting gatet-1How much can be reserved by the current block, i.e. the output of the forgetting gate is ft=σ(wf[ht-1,Xt]+bf) Wherein w isfAnd bfWeights and bias values for the neural network;
2) the input gate determines the information to be updated, and first determines the updated information i by the sigma functiont=σ(wi[ht-1,Xt]+bf) Second generating new candidate values by the tanh function
Figure BDA0003020421850000041
Candidate value c of the last current blocktThe output of the forgetting gate, the output of the input gate, the new candidate value and the candidate value of the last block are determined together, that is:
Figure BDA0003020421850000042
3) the output gate obtains the output of the model, and an initial output o is obtained by the sigma functiont=σ(wo[ht-1,Xt]+bo) Secondly, obtaining the candidate value information c from 2)tScaling to a value between-1 and 1 through an activation function tanh to finally obtain an output h of the modelt=ot*tanh(ct);
4) Calculating true value Y based on GC loss functiontAnd the predicted value htThe error between, as in the following equation:
Figure BDA0003020421850000043
n is the number of samples, gamma is a gamma function, alpha is more than 0 is a shape parameter, beta is more than 0 is a bandwidth parameter, a plurality of times of iterative training is carried out, the weight w and the offset value b of the neural network are updated through a minimum batch gradient descent method, and the error between a real value and a predicted value is minimized;
step five: finally, based on a long-short term memory neural network of the generalized mutual entropy loss function, obtaining a nonlinear mapping model from the outdoor environment temperature at the first moment i to the outdoor environment temperature at the future moment n, and using the accuracy of the test set test model;
step six: testing the accuracy of the model using the test set, using the root mean square error between the true and predicted values, the probability density distribution of the error, and R2As evaluation indexes of the model, they are defined as:
Figure BDA0003020421850000044
Figure BDA0003020421850000045
Figure BDA0003020421850000051
in the formula yi,hiFor the corresponding real and predicted values of each step,
Figure BDA0003020421850000052
the mean value of real samples in each step, m is the number of samples in the test set, k (-) is a Gaussian kernel function,
Figure BDA0003020421850000053
the probability density function of the error will be implemented in a sliding window approach.
The real-time control of the indoor temperature comprises the following steps:
the method comprises the following steps: acquiring the outdoor environment temperature X ═ T at 6 continuous moments1,…,Ti]Based on GC-LSTM neural network model, obtaining the outdoor environment temperature h ═ 3 moments in future continuouslyi+1,…,hi+n](ii) a Obtaining the power grid electricity price rho at the current momenttAnd indoor temperature Ti inAnd dividing the data into a training set 2500 and a test set 1000 according to the relevant information. H, rhotAnd Ti inAs the environment information, that is: st={h,ρt,Ti in};
Step two: setting a DDPG algorithm of deep reinforcement learning as four neural networks, wherein a current neural network of an Actor and a target neural network of the Actor have three layers of neural networks with the same structure, a hidden layer activation function is tanh, a current neural network of Critic and the target neural network of Critic have the same neural network structure, and the hidden layer activation function is relu;
step three: current state information S in training settThe current neural network input to the Actor based on the current strategy and Gaussian noise
Figure BDA0003020421850000057
To select an action
Figure BDA0003020421850000058
at∈[Pmin,Pmax],PminAnd PmaxMinimum and maximum output power of the HVAC system, respectively;
step four: performing action atControlling the output power of the air conditioner and then obtaining a timely reward rtTo the next state St+1R is a prizetWill be related to the comfort of the user, as follows:
Figure BDA0003020421850000059
Tminand TmaxMinimum and maximum comfort temperature, λ, respectively1And λ2A weighting factor for balancing the reward;
step five: will (S)t,at,rt,St+1) Storing the data into an experience pool buff-C;
step six: then randomly take M samples (S) from the experience pool buff-Ni,ai,ri,Si+1),i=1,2,…,M;
Step seven: calculating the expectation y of the target based on the status of the next time and the action obtained by the target network of the Actori=ri+γQ'(Si+1,μ'(Si+1μ')|θQ');
Step eight: action a taken by Critic current neural network Q pair of DDPG algorithmtEvaluation is performed to calculate an evaluation value Q (S)t,aiQ);
Step nine: calculating an error value between a target desired value and an evaluation value of a sample using a root mean square error
Figure BDA00030204218500000510
Updating the parameters of the Critic current neural network by using a minimum batch gradient descent method;
step ten: updating Actor current neural network parameter theta by using sample strategy gradientμ
Step eleven: soft copying the parameters of the current neural networks of the Ctric and the Actor to the target neural network parameters of the Ctric and the Actor respectively;
θQ'←τθQ+(1-τ)θQ'
θμ'←τθμ+(1-τ)θμ'
step twelve: obtaining a convergent Actor current neural network through training of a training set, and outputting a parameter theta of the neural networkμUsing the reward value obtained by each iterative training and the error value L of each step as the judgment index of network convergence;
step thirteen: current state information S of test settInputting to the Actor current neural network of the DDPG algorithm, and selecting an a based on the optimal strategytPerforming action atAnd controlling the power output of the HVAC system, and using the power consumption cost of the HVAC system and the comfort of the user as the performance indexes of the system.
The invention has the advantages that: the long-short term memory neural network is used for predicting the temperature of the future outdoor environment, the comfort level of a user is improved, and the generalized mutual entropy loss function is used as the loss function of the long-short term memory neural network to improve the accuracy of prediction; then based on the DDPG algorithm, according to the electricity price change of the power grid, the change of the indoor temperature and the change of the future outdoor temperature, the power output of the HVAC system is intelligently adjusted, and the power consumption cost of a user is saved under the condition of ensuring the comfort degree of the user.
The above description is only an embodiment of the present invention, but the structural features of the present invention are not limited thereto, and any changes or modifications within the scope of the present invention by those skilled in the art are covered by the present invention.

Claims (1)

1. The HVAC control system based on the multi-step prediction deep reinforcement learning algorithm is characterized in that: the method comprises two stages of multi-step prediction of outdoor environment temperature and real-time control of indoor temperature, wherein the prediction stage of the outdoor environment temperature comprises the following steps:
the method comprises the following steps: according to actual data points of the outdoor environment, the outdoor environment temperature X ═ T at continuous i moments is selected1,…,Ti]As input to the multi-step temperature prediction model, with h ═ hi+1,…,hi+n]As the real output of the model, n is the step number of the multi-step prediction;
step two: preprocessing the acquired data, correcting abnormal data, and converting the time series data into the data of a supervision sequence;
step three: inputting the input quantity into a long-short term memory neural network based on a generalized mutual entropy loss function, and forgetting, memorizing and learning the input quantity by utilizing a forgetting gate, an input gate and an output gate of the long-short term memory neural network; the nonlinear regression model of the long-short term memory neural network based on the generalized mutual entropy loss function is described as follows:
1) converting the input quantity X to [ T ]1,…,Tt]Inputting the data into the first block of the long-short term memory neural network, and determining the input information X at the current moment and the output information h at the last moment by a sigmoid (sigma) function by a forgetting gatet-1How much can be reserved by the current block, i.e. the output of the forgetting gate is ft=σ(wf[ht-1,Xt]+bf) Wherein w isfAnd bfWeights and bias values for the neural network;
2) the input gate determines the information to be updated, and first determines the updated information i by the sigma functiont=σ(wi[ht-1,Xt]+bf) Second generating new candidate values by the tanh function
Figure FDA0003020421840000011
Candidate value c of the last current blocktThe output of the forgetting gate, the output of the input gate, the new candidate value and the candidate value of the last block are determined together, that is:
Figure FDA0003020421840000012
3) the output gate obtains the output of the model, and an initial output o is obtained by the sigma functiont=σ(wo[ht-1,Xt]+bo),Secondly, obtaining the candidate value information c in the step 2)tScaling to a value between-1 and 1 through an activation function tanh to finally obtain an output h of the modelt=ot*tanh(ct);
4) Calculating true value Y based on generalized mutual entropy loss functiontAnd the predicted value htThe error between, as in the following equation:
Figure FDA0003020421840000013
n is the number of samples, gamma is a gamma function, alpha is more than 0 is a shape parameter, beta is more than 0 is a bandwidth parameter, a plurality of times of iterative training is carried out, the weight w and the offset value b of the neural network are updated through a minimum batch gradient descent method, and the error between a real value and a predicted value is minimized;
step four: finally, obtaining a nonlinear mapping model from the outdoor environment temperature of the first i moments to the outdoor environment temperature of the next n moments based on the long-short term memory neural network of the generalized mutual entropy loss function;
the real-time control of the indoor temperature comprises the following steps:
the method comprises the following steps: collecting the outdoor environment temperature X ═ T at i continuous moments1,…,Ti]Obtaining the outdoor environment temperature h ═ h [ h ] at n continuous moments in the future based on the long-short term memory neural network of the generalized mutual entropy loss functioni+1,…,hi+n](ii) a Obtaining the power grid electricity price rho at the current momenttAnd indoor temperature Tt inH, rho are calculated according to the related informationtAnd Tt inAs the environment information, that is: st={h,ρt,Tt in};
Step two: the current state information StInputting to the Actor current neural network of the deep reinforcement learning DDPG algorithm, and based on the current strategy and Gaussian noise
Figure FDA0003020421840000021
To select an action
Figure FDA0003020421840000022
at∈[Pmin,Pmax]Gaussian noise (Gaussian)
Figure FDA0003020421840000023
Is to increase the exploration rate of the action, and is reduced along with the increase of the number of iterative cycles, thetaμIs the Actor's current neural network parameter, PminAnd PmaxMinimum and maximum output power of the HVAC system, respectively;
step three: performing action atControlling the output power of the air conditioner, the output of power from the HVAC system can change the ambient temperature in the room, such as:
Figure FDA0003020421840000024
then obtain a timely reward rtAnd reaches the next state St+1
Step four: will (S)t,at,rt,St+1) Storing the data into an experience pool buff-C;
step five: if the data volume of the experience pool buff-C is larger than CMRandomly taking M samples (S) from the experience pool buff-Ni,ai,ri,Si+1) I 1,2, …, M, the following steps are performed; otherwise, directly performing the step eleven;
step six: calculating the expected value y of the targeti=ri+γQ'(Si+1,μ'(Si+1μ')|θQ') Where μ' (S)i+1μ') Is to obtain the optimal action from the target neural network of the Actor, Q' (S)i+1,μ'(Si+1μ')|θQ') A target network Q' that is Ctric is a future target value that is output based on the state information and the optimal action information at the next time, γ is a discount factor, θμ'And thetaQ'Parameters of a target neural network of Actor and parameters of a target network of Ctric are respectively;
step seven: critic current neural network Q pair adopted action based on DDPG algorithmAs atEvaluation is performed to calculate an evaluation value Q (S)t,atQ) Wherein thetaQParameters of the Critic current neural network;
step eight: calculating an error value between a target desired value and an evaluation value of a sample using a root mean square error
Figure FDA0003020421840000025
And updating the parameter theta of the Critic current neural network by using a minimum batch gradient descent methodQ
Step nine: updating Actor current neural network parameter theta by using sample strategy gradientμThe following equation:
Figure FDA0003020421840000026
step ten: soft copying the parameters of the current neural networks of Ctric and Actor to the target neural network parameters of Ctric and Actor respectively, namely:
θQ'←τθQ+(1-τ)θQ'
θμ'←τθμ+(1-τ)θμ'
step eleven: regarding the state at the next time as the state at the current time, that is: st←St+1Iteration loops from the first step to the eleventh step to finally obtain a converged Actor current neural network, and output the parameter theta of the neural networkμObtaining a final HVAC control system model, and then performing the step twelve;
step twelve: the current state information StInputting the current neural network of the Actor of the deep reinforcement learning DDPG algorithm, and selecting an a based on the optimal strategytPerforming action atAnd controlling the power output of the HVAC system.
CN202110403130.XA 2021-04-14 2021-04-14 HVAC control system based on multi-step prediction deep reinforcement learning algorithm Active CN113112077B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110403130.XA CN113112077B (en) 2021-04-14 2021-04-14 HVAC control system based on multi-step prediction deep reinforcement learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110403130.XA CN113112077B (en) 2021-04-14 2021-04-14 HVAC control system based on multi-step prediction deep reinforcement learning algorithm

Publications (2)

Publication Number Publication Date
CN113112077A true CN113112077A (en) 2021-07-13
CN113112077B CN113112077B (en) 2022-06-10

Family

ID=76716975

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110403130.XA Active CN113112077B (en) 2021-04-14 2021-04-14 HVAC control system based on multi-step prediction deep reinforcement learning algorithm

Country Status (1)

Country Link
CN (1) CN113112077B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113485498A (en) * 2021-07-19 2021-10-08 北京工业大学 Indoor environment comfort level adjusting method and system based on deep learning
CN113659246A (en) * 2021-10-20 2021-11-16 中国气象科学研究院 Battery system suitable for polar region ultralow temperature environment and temperature control method thereof
CN113741449A (en) * 2021-08-30 2021-12-03 南京信息工程大学 Multi-agent control method for air-sea cooperative observation task
CN113940218A (en) * 2021-09-30 2022-01-18 上海易航海芯农业科技有限公司 Intelligent heat supply method and system for greenhouse
CN114488811A (en) * 2022-01-25 2022-05-13 同济大学 Greenhouse environment energy-saving control method based on second-order Voltalla model prediction
CN115412923A (en) * 2022-10-28 2022-11-29 河北省科学院应用数学研究所 Multi-source sensor data credible fusion method, system, equipment and storage medium
TWI795283B (en) * 2022-05-04 2023-03-01 台灣松下電器股份有限公司 Control method of air conditioning system

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102353119A (en) * 2011-08-09 2012-02-15 北京建筑工程学院 Control method of VAV (variable air volume) air-conditioning system
CN105805822A (en) * 2016-03-24 2016-07-27 常州英集动力科技有限公司 Heat supply energy saving control method and system based on neural network prediction
CN105870483A (en) * 2016-03-31 2016-08-17 华中科技大学 Thermoelectric synergic control method of solid oxide fuel cell during power tracking process
JP2016205739A (en) * 2015-04-24 2016-12-08 京セラ株式会社 Power control method, power control device and power control system
US20190102668A1 (en) * 2017-10-04 2019-04-04 Hengshuai Yao Method of prediction of a state of an object in the environment using an action model of a neural network
CN110458443A (en) * 2019-08-07 2019-11-15 南京邮电大学 A kind of wisdom home energy management method and system based on deeply study
US20190354071A1 (en) * 2018-05-18 2019-11-21 Johnson Controls Technology Company Hvac control system with model driven deep learning
CN111080002A (en) * 2019-12-10 2020-04-28 华南理工大学 Deep learning-based multi-step prediction method and system for building electrical load
CN111365828A (en) * 2020-03-06 2020-07-03 上海外高桥万国数据科技发展有限公司 Model prediction control method for realizing energy-saving temperature control of data center by combining machine learning
US20210049460A1 (en) * 2019-08-15 2021-02-18 Noodle Analytics, Inc. Deep probabilistic decision machines
CN112460741A (en) * 2020-11-23 2021-03-09 香港中文大学(深圳) Control method of building heating, ventilation and air conditioning system
CN112561728A (en) * 2020-10-28 2021-03-26 西安交通大学 Attention mechanism LSTM-based comprehensive energy consumption cost optimization method, medium and equipment

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102353119A (en) * 2011-08-09 2012-02-15 北京建筑工程学院 Control method of VAV (variable air volume) air-conditioning system
JP2016205739A (en) * 2015-04-24 2016-12-08 京セラ株式会社 Power control method, power control device and power control system
CN105805822A (en) * 2016-03-24 2016-07-27 常州英集动力科技有限公司 Heat supply energy saving control method and system based on neural network prediction
CN105870483A (en) * 2016-03-31 2016-08-17 华中科技大学 Thermoelectric synergic control method of solid oxide fuel cell during power tracking process
US20190102668A1 (en) * 2017-10-04 2019-04-04 Hengshuai Yao Method of prediction of a state of an object in the environment using an action model of a neural network
US20190354071A1 (en) * 2018-05-18 2019-11-21 Johnson Controls Technology Company Hvac control system with model driven deep learning
CN110458443A (en) * 2019-08-07 2019-11-15 南京邮电大学 A kind of wisdom home energy management method and system based on deeply study
US20210049460A1 (en) * 2019-08-15 2021-02-18 Noodle Analytics, Inc. Deep probabilistic decision machines
CN111080002A (en) * 2019-12-10 2020-04-28 华南理工大学 Deep learning-based multi-step prediction method and system for building electrical load
CN111365828A (en) * 2020-03-06 2020-07-03 上海外高桥万国数据科技发展有限公司 Model prediction control method for realizing energy-saving temperature control of data center by combining machine learning
CN112561728A (en) * 2020-10-28 2021-03-26 西安交通大学 Attention mechanism LSTM-based comprehensive energy consumption cost optimization method, medium and equipment
CN112460741A (en) * 2020-11-23 2021-03-09 香港中文大学(深圳) Control method of building heating, ventilation and air conditioning system

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113485498A (en) * 2021-07-19 2021-10-08 北京工业大学 Indoor environment comfort level adjusting method and system based on deep learning
CN113741449A (en) * 2021-08-30 2021-12-03 南京信息工程大学 Multi-agent control method for air-sea cooperative observation task
CN113741449B (en) * 2021-08-30 2023-07-14 南京信息工程大学 Multi-agent control method for sea-air collaborative observation task
CN113940218A (en) * 2021-09-30 2022-01-18 上海易航海芯农业科技有限公司 Intelligent heat supply method and system for greenhouse
CN113659246A (en) * 2021-10-20 2021-11-16 中国气象科学研究院 Battery system suitable for polar region ultralow temperature environment and temperature control method thereof
CN113659246B (en) * 2021-10-20 2022-01-25 中国气象科学研究院 Battery system suitable for polar region ultralow temperature environment and temperature control method thereof
CN114488811A (en) * 2022-01-25 2022-05-13 同济大学 Greenhouse environment energy-saving control method based on second-order Voltalla model prediction
CN114488811B (en) * 2022-01-25 2023-08-29 同济大学 Greenhouse environment energy-saving control method based on second-order Woltai model prediction
TWI795283B (en) * 2022-05-04 2023-03-01 台灣松下電器股份有限公司 Control method of air conditioning system
CN115412923A (en) * 2022-10-28 2022-11-29 河北省科学院应用数学研究所 Multi-source sensor data credible fusion method, system, equipment and storage medium
CN115412923B (en) * 2022-10-28 2023-02-03 河北省科学院应用数学研究所 Multi-source sensor data credible fusion method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN113112077B (en) 2022-06-10

Similar Documents

Publication Publication Date Title
CN113112077B (en) HVAC control system based on multi-step prediction deep reinforcement learning algorithm
CN112614009B (en) Power grid energy management method and system based on deep expectation Q-learning
CN110705743B (en) New energy consumption electric quantity prediction method based on long-term and short-term memory neural network
CN110084367B (en) Soil moisture content prediction method based on LSTM deep learning model
CN109659933B (en) Electric energy quality prediction method for power distribution network with distributed power supply based on deep learning model
CN114370698B (en) Indoor thermal environment learning efficiency improvement optimization control method based on reinforcement learning
CN113572157B (en) User real-time autonomous energy management optimization method based on near-end policy optimization
CN107704875A (en) Based on the building load Forecasting Methodology and device for improving IHCMAC neutral nets
CN116187601B (en) Comprehensive energy system operation optimization method based on load prediction
CN112070262B (en) Air conditioner load prediction method based on support vector machine
CN112926795A (en) SBO (statistical analysis) -based CNN (continuous casting) optimization-based high-rise residential building group heat load prediction method and system
CN114239991A (en) Building heat supply load prediction method, device and equipment based on data driving
CN111898856B (en) Analysis method of physical-data fusion building based on extreme learning machine
CN114119273A (en) Park comprehensive energy system non-invasive load decomposition method and system
Dong et al. Short-term building cooling load prediction model based on DwdAdam-ILSTM algorithm: A case study of a commercial building
Godahewa et al. Simulation and optimisation of air conditioning systems using machine learning
CN116880169A (en) Peak power demand prediction control method based on deep reinforcement learning
Zhang et al. Data-driven model predictive and reinforcement learning based control for building energy management: A survey
CN115169839A (en) Heating load scheduling method based on data-physics-knowledge combined drive
CN113962454A (en) LSTM energy consumption prediction method based on dual feature selection and particle swarm optimization
CN114200839A (en) Office building energy consumption intelligent control model for dynamic monitoring of coupled environmental behaviors
Yu et al. Research on Intelligent Air Conditioning Optimization Control Algorithms Based on Neural Networks and Heuristic Algorithms
CN117973644B (en) Distributed photovoltaic power virtual acquisition method considering optimization of reference power station
CN115840986B (en) Energy management method based on stochastic model predictive control
Saranya et al. AI buildings: design of artificially intelligent buildings in the energy sector with an autonomous federated learning approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant