CN116227883A - Intelligent household energy management system prediction decision-making integrated scheduling method based on deep reinforcement learning - Google Patents

Intelligent household energy management system prediction decision-making integrated scheduling method based on deep reinforcement learning Download PDF

Info

Publication number
CN116227883A
CN116227883A CN202310260699.4A CN202310260699A CN116227883A CN 116227883 A CN116227883 A CN 116227883A CN 202310260699 A CN202310260699 A CN 202310260699A CN 116227883 A CN116227883 A CN 116227883A
Authority
CN
China
Prior art keywords
data
management system
energy management
network
intelligent household
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310260699.4A
Other languages
Chinese (zh)
Inventor
程杰
杨胜天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Gongshang University
Original Assignee
Zhejiang Gongshang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Gongshang University filed Critical Zhejiang Gongshang University
Priority to CN202310260699.4A priority Critical patent/CN116227883A/en
Publication of CN116227883A publication Critical patent/CN116227883A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Health & Medical Sciences (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Water Supply & Treatment (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Educational Administration (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a prediction decision-making integrated scheduling method of an intelligent household energy management system based on deep reinforcement learning. In the prediction and decision integrated scheduling mode, the intelligent household energy management system optimization control based on the randomness strategy Soft Actor Critic algorithm consists of a learning link and an application link, and experimental results show that the method provided by the invention can not only greatly reduce the intelligent household energy cost while guaranteeing the comfort level requirement of a user, but also effectively reduce the scheduling performance caused by environmental change.

Description

Intelligent household energy management system prediction decision-making integrated scheduling method based on deep reinforcement learning
Technical Field
The invention relates to a prediction decision-making integrated scheduling method of an intelligent household energy management system based on deep reinforcement learning, and belongs to the technical field of intelligent household energy management.
Background
In order to relieve the energy crisis, the electric power market of each country is deeply reformed, and the intelligent power grid is inoculated. Under the smart grid environment, various sustainable energy sources including the distributed photovoltaic power generation panel can be directly connected with the power grid, so that a user can save energy cost and even transmit redundant power to the power grid, and the problem of insufficient supply and demand of the power grid is relieved under certain conditions.
There are many studies on the problem of controlling the power utilization strategy of the intelligent household energy management system, and certain progress is made, for example, conventional control technology, mixed integer linear programming and other methods are used, so that the energy cost of some household users is reduced to a certain extent, and the overall effect is not satisfactory. Deep reinforcement learning combines the advantages of deep learning and reinforcement learning. Deep learning can extract high-order data features from a high-dimensional state space, and any function can be mapped through a neural network. Reinforcement learning can solve the problem of continuous decision making, and can still achieve the established objective without a model. In the face of a continuous state environment, some research works utilize a depth deterministic strategy gradient algorithm to be applied to home energy management, and the effectiveness of the algorithm is verified based on simulation results of real data. In order to obtain a better control strategy, some research works try to characterize some future features in the environmental state through prediction, and the accuracy between the predicted data and the future actual data directly influences the decision process, so that a certain influence is generated on the overall scheduling performance of the intelligent home energy management system. The prediction decision integrated scheduling mode provided by the invention extracts local characteristics from predicted data through discrete wavelet transformation, and scheduling of all devices in the intelligent home can be dynamically and comprehensively scheduled under the condition of meeting user comfort requirements so as to minimize the energy cost of the intelligent home.
Disclosure of Invention
The invention aims to solve the technical problems that some future characteristics in an environment state are represented through prediction, and an intelligent household energy management system intelligent body is used for effectively managing all equipment by combining a deep reinforcement learning algorithm, and in order to solve the problems, the invention provides an intelligent household energy management system prediction decision integrated scheduling method based on deep reinforcement learning, which comprises the following steps:
the intelligent household energy management system acquires output power, electricity price and outdoor temperature data of a distributed photovoltaic generator at the current moment and 23 past moments;
the prediction module utilizes a long-period memory network to predict the output power, electricity price and outdoor temperature data of the distributed photovoltaic generator at T moments in the future according to the output power, electricity price and outdoor temperature data of the distributed photovoltaic generator at the past 24 moments in a sliding manner;
utilizing discrete wavelet transformation to carry out decomposition, denoising and reconstruction on the predicted data of the T future moments, and obtaining local characteristics of the predicted data;
the application link controller outputs actions of all controllable devices in the intelligent home in real time according to the environmental state of the intelligent home at the current moment, wherein the environmental state comprises real-time data at the current moment, predicted data at T moments in the future and the local characteristics of the predicted data extracted by wavelet transformation;
according to the real-time action of the controllable equipment, the heating ventilation air conditioner and the energy storage equipment are controlled;
acquiring the environmental state of the smart home at the next moment and rewards at the next moment, and packaging and sending the environmental state, the actions at the current moment, the environmental state at the next moment and rewards at the next moment to an experience pool for storage;
randomly extracting a certain number of training samples from the experience pool, taking the maximum rewards as a target, and training the learning link controller by utilizing a randomness strategy Soft Actor Critic algorithm;
and copying the parameters of the learning link controller to the application link controller when the strategy rewards tend to be stable.
Further, the prediction module is based on a long-short-period memory network, sequentially connected with an input layer, a hidden layer and an output layer, wherein the input layer is used for receiving output power, electricity price and outdoor temperature data of the distributed photovoltaic generator at past 24 moments, the hidden layer is a fully connected network, the output layer outputs data at future 1 moment and slides to predict future T moments, and then the prediction data is processed through discrete wavelet transformation to obtain local characteristics of the prediction data.
Further, the processing of the discrete wavelet transform includes decomposition, denoising, and reconstruction. The wavelet decomposition selects Daubechies3 wavelet to decompose a predicted section of data into approximate (low pass) and detail (high pass) coefficients, the wavelet expansion coefficients are subjected to thresholding processing by selecting corresponding thresholds and threshold rules, the wavelet expansion coefficients after thresholding processing are obtained, and reconstruction is carried out according to the wavelet expansion coefficients after thresholding processing and the unprocessed wavelet expansion coefficients, so that local characteristics of the predicted data are obtained.
Further, the thresholding includes selecting a corresponding threshold and a threshold rule, where the threshold is μ times (0 < v < 1) of a maximum value θ in the input data, the threshold rule selects a soft threshold function, all values of a detail (high-pass) coefficient Cd with an amplitude smaller than μθ are set to 0, and all values of a detail (high-pass) coefficient Cd with an amplitude greater than or equal to μθ are subtracted by μθ, where the expression is as follows:
Figure BDA0004131233720000021
further, the learning link controller comprises an Actor network, a Critic network 1, a target Critic network 1, a Critic network 2 and a target Critic network 2, and the application link controller only comprises the Actor network;
the number of neurons of an input layer of the application link controller corresponds to the dimension of the environmental state, an activation function adopted by the hidden layer is a linear rectification function, the number of neurons of an output layer corresponds to the number of actions in an action space, and the adopted activation function is a hyperbolic tangent function;
the structure of the Critic network 1, the structure of the target Critic network 1, the structure of the Critic network 2 and the structure of the target Critic network 2 are the same, the input layers comprise environment states and action information, the corresponding neuron number corresponds to the dimension of the environment states and the sum of the action numbers, the environment states and the action information are spliced and then are connected with a plurality of hidden layers, an activation function adopted by the hidden layers is a linear rectification function, and an activation function adopted by an output layer connected with the hidden layers is a linear activation function.
Drawings
FIG. 1 is a schematic diagram of a scheduling mode of prediction and decision integration of a prediction and decision integration scheduling method of an intelligent home energy management system based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of optimizing control of a predictive decision-making integrated scheduling method of an intelligent home energy management system based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 3 is a workflow diagram of a predictive decision-making integrated scheduling method for a smart home energy management system based on deep reinforcement learning provided in an embodiment of the present invention;
fig. 4 is a graph comparing the performance of the inventive method based on a real data embodiment with other methods.
Detailed Description
In order to facilitate the understanding and implementation of the present invention by those skilled in the art, a technical solution of the present invention will be further described with reference to the accompanying drawings, and a specific embodiment of the present invention will be given.
The embodiment of the invention provides a prediction decision-making integrated scheduling method of an intelligent household energy management system based on deep reinforcement learning, and the intelligent household integrated structure comprises a distributed photovoltaic generator, an intelligent ammeter, an indoor and outdoor temperature sensor, a refrigerator, a heating ventilation air conditioner and energy storage equipment, wherein the intelligent ammeter and the indoor and outdoor temperature sensor are measuring instruments, the heating ventilation air conditioner and the energy storage equipment are controllable equipment, and the intelligent household energy management system interacts with the distributed photovoltaic power generation equipment, the heating ventilation air conditioner, the energy storage equipment, the intelligent ammeter and the indoor and outdoor temperature sensor through information flow.
In the face of environmental uncertainty, it is very difficult to design an energy management method for effectively scheduling heating, ventilation and air conditioning and energy storage equipment in an intelligent household and realizing energy cost minimization under the condition of meeting user comfort requirements. In order to solve the above-mentioned difficulty, the core design idea of the invention is as follows: firstly, under the conditions of no building thermodynamic model and maintenance of indoor temperature in a certain range, modeling the problem of minimizing the energy cost of the intelligent household as a continuous and dynamic Markov decision process, and combining real-time environment data and prediction data to form a prediction decision integrated scheduling mode, wherein the scheduling mode is a schematic diagram of the prediction decision integrated scheduling mode based on the prediction decision integrated scheduling method of the intelligent household energy management system based on deep reinforcement learning, provided by the embodiment of the invention, as shown in fig. 1; the learning link controller is then trained based on a randomness policy Soft Actor Critic algorithm, as shown in fig. 2, where the controller parameters required for application of the link controller action selection are updated periodically by the learning link controller. Depending on the nature of the markov decision process, the next moment in time home environment state should depend only on the current home environment state and the actions of all devices, independent of the previous home environment state and all device actions. Among other things, the markov decision process is an approximate description of smart home energy management problems, as certain components of the environmental state may not be markov in practice, such as photovoltaic generation power, outdoor temperature, and electricity prices. According to the existing research work, even if certain component information in the environment state does not have Markov, the corresponding problems can be solved through the excellent decision capability of deep reinforcement learning, so that prior information of any uncertainty system parameters is not needed to be known, and the method is suitable for solving the related problems in most fields.
Referring to fig. 3, a working flow chart of a prediction decision integrated scheduling method of an intelligent household energy management system based on deep reinforcement learning is provided in an embodiment of the present invention, and the method includes the following design steps:
step one, modeling the smart home energy cost minimization problem as a markov decision process without building a thermodynamic model and guaranteeing user comfort requirements, and then designing key components of the markov decision process, including environmental states, actions, and rewarding functions. The prediction module is based on long-short-term memory network design, the input layer inputs the output power, electricity price and outdoor temperature data of the distributed photovoltaic generators at the past 24 moments, and the output layer outputs the output power, electricity price and outdoor temperature data of the distributed photovoltaic generators at the future 24 moments in a sliding mode. Only 4 predicted data at the 24 predicted moments are reserved, the predicted data from the 5 th moment to the 24 th moment are decomposed into approximate (low-pass) and detail (high-pass) coefficients through discrete wavelet transformation with a wavelet base of Daubechies3, 0.4 times of the maximum value in input data is selected as a threshold value for the wavelet expansion coefficients, a soft threshold function is selected for thresholding, the wavelet expansion coefficients after thresholding are obtained, and reconstruction is carried out according to the wavelet expansion coefficients after thresholding and the unprocessed wavelet expansion coefficients, so that local features of the predicted data are obtained.
In the above problem of minimizing the energy cost of the smart home, the objective function is the energy cost of the smart home, including the electricity charge generated by the energy buying and selling of the smart home and the power grid
Figure BDA0004131233720000041
And the energy storage device is charged and discharged to generate depreciation cost +.>
Figure BDA0004131233720000042
The expression is as follows:
Figure BDA0004131233720000043
Figure BDA0004131233720000044
in the method, in the process of the invention,
Figure BDA0004131233720000045
for the electricity buying and selling cost between the intelligent household electricity and the intelligent power grid at the moment t, lambda t And->
Figure BDA0004131233720000046
Price for the purchase and sale of electric power for the user at time t, respectively,/->
Figure BDA0004131233720000047
Electric power purchased and sold for t-moment smart home and power grid, < + >>
Figure BDA0004131233720000048
Device loss cost for charging and discharging energy storage device at t moment, < >>
Figure BDA0004131233720000049
For the depreciation factor of the energy storage device, +.>
Figure BDA00041312337200000410
And->
Figure BDA00041312337200000411
And the charging power and the discharging power of the energy storage device at the moment t respectively.
Because the user comfort level requirement must be ensured, namely, the indoor temperature is maintained within a certain range, the decision variables of the Markov decision process comprise the input power of the heating ventilation air conditioner and the charge and discharge power of the energy storage equipment; constraints to be considered are: the invention sets each moment to be 1 hour for convenience, and specifically comprises the following steps:
(1) The heating ventilation air conditioner can be through continuous adjustment input power in order to guarantee user comfort level demand, namely:
Figure BDA00041312337200000412
in (1) the->
Figure BDA00041312337200000413
Is the rated power of the heating ventilation air conditioner j. Since user comfort requirements depend on a number of factors (e.g., air temperature, relative humidity, air flow rate, etc.), the present invention uses for simplicityThe comfort temperature range serves as an indication of the user's comfort needs, namely: t (T) min ≤/>
Figure BDA00041312337200000414
Wherein T is min And T max The minimum temperature value and the maximum temperature value of the indoor comfort temperature range are respectively represented.
(2) The energy storage device can perform charging and discharging actions, and a dynamic change model of the electric quantity is as follows:
Figure BDA00041312337200000415
in (1) the->
Figure BDA00041312337200000416
Representing the electric quantity of the energy storage device at the time t+1, eta c And eta disc The charging efficiency and the discharging efficiency of the energy storage device are respectively. The power capacity of the energy storage device is limited, the power quantity of the energy storage device is +.>
Figure BDA00041312337200000417
Should be at minimum charge +.>
Figure BDA00041312337200000418
And maximum electric quantity>
Figure BDA00041312337200000419
And (2) the following steps: />
Figure BDA00041312337200000420
(3) The charge and discharge power of the energy storage device is limited by its corresponding rated power, namely:
Figure BDA00041312337200000421
Figure BDA00041312337200000422
in (1) the->
Figure BDA00041312337200000423
And->
Figure BDA00041312337200000424
Maximum value of charging power and discharging power of the energy storage device respectively, < >>
Figure BDA00041312337200000425
The binary variable a is used for preventing the energy storage device from simultaneously carrying out charging and discharging actions for the charging and discharging power of the energy storage device at the moment t.
(4) In order to maintain the power balance in the smart home, the power supplied by the grid power in the smart home should be equal to the total power demand, i.e.:
Figure BDA00041312337200000426
in (1) the->
Figure BDA00041312337200000427
And->
Figure BDA00041312337200000428
Representing the power sold and sold between the grid and the home and the distributed photovoltaic generator output power, respectively. If->
Figure BDA00041312337200000429
It means that the smart home will sell power into the smart grid at time t, if +.>
Figure BDA00041312337200000430
It indicates that the smart home needs the smart grid input power to be powered at time t.
In a smart home, the environmental state at the next moment is only dependent on the current environmental state and the actions of the hvac and the energy storage device, independent of the previous environmental state and actions, so that the control of both hvac and energy storage device can be regarded as a markov decision process. In the following design, we describe the sequential decision problem related to smart home energy management as a Markov decision process. In addition, the markov decision process is only an approximate description of the smart home energy management problem, as certain components of the environmental state may not have markov properties in practice, such as distributed photovoltaic generator output power, outdoor temperature, and electricity prices. For a non-strict Markov decision process, deep reinforcement learning can be effectively processed, and the verification result of the invention can also prove the effectiveness of the deep reinforcement learning on the non-strict Markov decision process related problems.
In this embodiment, the main components of the markov decision process include: the environment state, action and rewarding function are respectively designed as follows:
(1) Environmental status. S for environmental state at time t t The expressions are as follows: output power of distributed photovoltaic generator at time t
Figure BDA0004131233720000051
Predicted distributed photovoltaic generator output power +.f from time t+1 to time t+T>
Figure BDA0004131233720000052
Outdoor temperature at time t->
Figure BDA0004131233720000053
Predicted outdoor temperature from time t+1 to time t+T +.>
Figure BDA0004131233720000054
Indoor temperature at time t->
Figure BDA0004131233720000055
Grid power price lambda at time t t Predicted grid power price lambda from time t+1 to time t+T t+1:T Electric quantity of energy storage device at time t>
Figure BDA0004131233720000056
Thus the environmental status can be designed as +.>
Figure BDA0004131233720000057
Figure BDA0004131233720000058
Since the price of the user selling power to the smart grid +.>
Figure BDA0004131233720000059
Price lambda, typically associated with the purchase of electricity t Correlation (e.g.)>
Figure BDA00041312337200000510
Delta is a constant), thus->
Figure BDA00041312337200000511
May not be part of the environmental state.
(2) And (5) acting. A for action space of heating ventilation air conditioner and energy storage equipment at t moment t Indicating the input power of the heating ventilation air conditioner j at the time t
Figure BDA00041312337200000512
Charging and discharging power of energy storage device at t moment +.>
Figure BDA00041312337200000513
If->
Figure BDA00041312337200000514
It indicates that the energy storage device is performing a charging action at time t, if +.>
Figure BDA00041312337200000515
It indicates that the energy storage device is performing a discharging action at time t, so that the action space can be designed as +.>
Figure BDA00041312337200000516
In order to ensure that the energy storage device does not exceed the battery capacity limit during the charge and discharge process, the following requirements need to be met:
Figure BDA00041312337200000517
Figure BDA00041312337200000518
(3) A bonus function. Total bonus R at time t t The representation comprises three components: electric power cost of intelligent household at t moment
Figure BDA00041312337200000519
The equipment loss cost caused by the charge and discharge actions of the energy storage system at the time t is +.>
Figure BDA00041312337200000520
The deviation between the indoor temperature and the set indoor temperature range caused by the improper input power of the heating ventilation air conditioner j at the time t generates the temperature difference cost +.>
Figure BDA00041312337200000521
Since both the electricity cost and the equipment loss cost belong to the energy cost, the bonus function can be designed as +.>
Figure BDA00041312337200000522
Where ρ is the importance coefficient of smart home energy cost versus user comfort.
And secondly, training an optimal control strategy of the intelligent home facing the heating, ventilation, air conditioning and energy storage equipment in different environments by utilizing a randomness strategy Soft Actor Critic algorithm with the aim of maximizing the total rewards and the motion entropy of each output.
At each moment, the smart home energy management system wants to control the heating, ventilation and air conditioning and the energy storage device jointly to maximize its cumulative rewards with entropy in the future, namely:
Figure BDA00041312337200000523
wherein, H (pi (|s) t ) The greater alpha, the more exploratory the policy, the more aggressive the policy.
In order to realize optimal control of heating ventilation air conditioning and energy storage equipment in different environments, the invention designs an intelligent household energy management system prediction decision integrated scheduling method based on deep reinforcement learning, and the actual operation process is as follows: (1) The intelligent household energy management system acquires output power, electricity price and outdoor temperature data of a distributed photovoltaic generator at the current moment and 23 past moments; (2) The prediction module utilizes a long-period memory network to predict the output power, electricity price and outdoor temperature data of the distributed photovoltaic generator at T moments in the future according to the output power, electricity price and outdoor temperature data of the distributed photovoltaic generator at the past 24 moments in a sliding manner; (3) Utilizing discrete wavelet transformation to carry out decomposition, denoising and reconstruction on the predicted data of the T future moments, and obtaining local characteristics of the predicted data; (4) The application link controller outputs actions of all controllable devices in the intelligent home in real time according to the environmental state of the intelligent home at the current moment, wherein the environmental state comprises real-time data at the current moment, predicted data at T moments in the future and the local characteristics of the predicted data extracted by wavelet transformation; (5) According to the real-time action of the controllable equipment, the heating ventilation air conditioner and the energy storage equipment are controlled; (6) Acquiring the environmental state of the smart home at the next moment and rewards at the next moment, and packaging and sending the environmental state, the actions at the current moment, the environmental state at the next moment and rewards at the next moment to an experience pool for storage; (7) Randomly extracting a certain number of training samples from the experience pool, taking the maximum rewards as a target, and training the learning link controller by utilizing a randomness strategy Soft Actor Critic algorithm; (8) When the strategy rewards are stable, the trained parameters of the learning link controller are copied to the application link controller, and then the method is continuously applied to control of heating, ventilation, air conditioning and energy storage equipment in the intelligent home.
The learning link controller comprises an Actor network, a Critic network 1, a target Critic network 1, a Critic network 2 and a target Critic network 2, and the application link controller only comprises the Actor network.
The number of neurons of the input layer of the application link controller corresponds to the dimension 79 of the environmental state, the activation function adopted by the hidden layer is a linear rectification function, the number of neurons of the output layer corresponds to the number of actions 2 in the action space, and the adopted activation function is a hyperbolic tangent function.
The structure of the Critic network 1, the target Critic network 1, the Critic network 2 and the target Critic network 2 comprises an input layer, a plurality of hidden layers and an output layer, wherein the input layer comprises environment state and action information, the number of neurons corresponding to the input layer corresponds to the sum of the number of dimensions and actions of the environment state, the environment state and the action information are spliced and then are connected with the hidden layers, an activation function adopted by the hidden layers is a linear rectification function, and an activation function adopted by the output layer connected with the hidden layers is a linear activation function.
The training process of the controller in the embodiment of the method is as follows. Firstly, randomly extracting a certain number of training data sets from an experience pool, obtaining the output of a Critic network 1 and a Critic network 2 based on the data, and avoiding overestimation caused by maximization by taking the minimum output value; then updating the Critic network parameters according to the difference value between the corresponding Critic network and the target Critic network; further, the environment state data in the training data is used as the input of an Actor network, the Actor network correspondingly outputs an action set, the action and the environment state in the training data are input into the Critic network together, and then an action value function is obtained, and the action value function can be used for calculating the strategy gradient; then, updating the Actor network parameters by using the strategy gradient and the entropy regularization coefficient alpha by using the entropy and the target entropy; and finally updating the target Critic network 1 and the target Critic network 2. The above process iterates until the trained strategic rewards are somewhat stable.
And thirdly, copying the trained parameters of the learning link controller to the application link controller, and then outputting action instructions of the heating ventilation air conditioner and the energy storage device by the application link controller according to the real-time environment state information, and immediately controlling the heating ventilation air conditioner and the energy storage device.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
1) The intelligent household energy management system prediction decision integrated scheduling method based on the randomness strategy Soft Actor Critic algorithm is provided, the method does not need to know prior information of any uncertainty system parameter and a building thermodynamic model, has the capability of predicting and comprehensively planning multi-period system benefits, and can effectively realize energy cost minimization;
2) The training of the controller parameters in the prediction and decision integrated scheduling mode learning link and the decision making process of the controller in the application link are not interfered with each other, the control of the application link controller on the equipment is similar to the function substitution process of an example, the time for outputting the control result can be ignored, the problem of effectiveness reduction caused by environmental change can be solved by the intelligent household energy management system, and therefore, the intelligent household energy management system has high robustness.
3) The method of the invention has high efficiency and foresight. The added prediction module can effectively predict the change trend of the output power, electricity price and outdoor temperature of the future distributed photovoltaic generator, and the performance simulation based on actual data shows that: compared with the prior art, the method can reduce 57.88% of energy cost on the premise of maintaining basic life of users.
As shown in fig. 4, the performance of the method embodiment of the present invention is compared with that of other methods, and the comparison scheme one: the heating, ventilation and air conditioning system is controlled by adopting a traditional on/off mode without considering energy storage equipment, and taking a refrigeration mode as an example, when the indoor temperature is higher than the set upper temperature limit, the heating, ventilation and air conditioning system is started; and when the indoor temperature is lower than the set temperature lower limit, closing the heating, ventilation and air conditioning system. And a comparison scheme II: considering the energy storage device, the randomness strategy Soft Actor Critic algorithm is used for controlling the heating ventilation air conditioner and the energy storage device. And a comparison scheme III: considering energy storage equipment, predicting output power, electricity price and outdoor temperature data of the distributed photovoltaic generators at 4 future moments by using a prediction module, and controlling the heating ventilation air conditioner and the energy storage equipment by using a randomness strategy Soft Actor Critic algorithm. And a comparison scheme IV: considering the energy storage equipment, a prediction module is used for predicting the output power, electricity price and outdoor temperature data of the distributed photovoltaic generators at 24 future moments, and a randomness strategy Soft Actor Critic algorithm is utilized to control the heating ventilation air conditioner and the energy storage equipment. Comparison scheme five: considering energy storage equipment, predicting output power, electricity price and outdoor temperature data of a distributed photovoltaic generator at 24 future moments by using a prediction module, processing the predicted 24 moment data by using wavelet transformation, and finally controlling the heating, ventilation and air conditioning and the energy storage equipment by using a randomness strategy Soft Actor Critic algorithm. The photovoltaic power generation power, the outdoor temperature and the electricity price data input by the environment in the system are all from a Pecan Street database in Texas, in 2020, 1 day to 9 months and 30 days. Compared with the first comparison scheme and the second comparison scheme, the method can respectively reduce 57.88% and 31.16% of energy cost on the premise of ensuring the comfort level requirement of the user, and can obtain the effect of the energy storage device on reducing the energy cost of the user and the superiority of the randomness strategy Soft Actor Critic algorithm on the heating, ventilation and air conditioning and the control strategy of the energy storage device. Compared with the third scheme, the fourth scheme and the fifth scheme, the method can ensure the comfort level requirement of the user, respectively reduce the energy cost by 14.71%, 17.14% and 12.12%, and can obtain the high efficiency of the control strategy of the heating, ventilation, air conditioning and energy storage equipment based on the combination of the long-period memory network and the randomness strategy Soft Actor Critic algorithm compared with the independent use of the randomness strategy Soft Actor Critic algorithm.
The foregoing description of the preferred embodiments of the present disclosure is provided only and not intended to limit the disclosure so that various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.
While the specific embodiments of the present disclosure have been described above with reference to the drawings, it should be understood that the present disclosure is not limited to the embodiments, and that various modifications and changes can be made by one skilled in the art without inventive effort on the basis of the technical solutions of the present disclosure while remaining within the scope of the present disclosure.

Claims (5)

1. The intelligent household energy management system prediction decision integrated scheduling method based on deep reinforcement learning is characterized by comprising the following steps of:
the intelligent household energy management system acquires output power, electricity price and outdoor temperature data of a distributed photovoltaic generator at the current moment and 23 past moments;
the prediction module utilizes a long-period memory network to predict the output power, electricity price and outdoor temperature data of the distributed photovoltaic generator at T moments in the future according to the output power, electricity price and outdoor temperature data of the distributed photovoltaic generator at the past 24 moments in a sliding manner;
utilizing discrete wavelet transformation to carry out decomposition, denoising and reconstruction on the predicted data of the T future moments, and obtaining local characteristics of the predicted data;
the application link controller outputs actions of all controllable devices in the intelligent home in real time according to the environmental state of the intelligent home at the current moment, wherein the environmental state comprises real-time data at the current moment, predicted data at T moments in the future and the local characteristics of the predicted data extracted by wavelet transformation;
according to the real-time action of the controllable equipment, the heating ventilation air conditioner and the energy storage equipment are controlled;
acquiring the environmental state of the smart home at the next moment and rewards at the next moment, and packaging and sending the environmental state, the actions at the current moment, the environmental state at the next moment and rewards at the next moment to an experience pool for storage;
randomly extracting a certain number of training samples from the experience pool, taking the maximum rewards as a target, and training the learning link controller by utilizing a randomness strategy Soft Actor Critic algorithm;
and copying the parameters of the learning link controller to the application link controller when the strategy rewards tend to be stable.
2. The intelligent household energy management system prediction decision integrated scheduling method based on deep reinforcement learning according to claim 1, wherein the prediction module is based on a long-short-period memory network, and sequentially connected with an input layer, a hidden layer and an output layer, wherein the input layer is used for receiving output power, electricity price and outdoor temperature data of a distributed photovoltaic generator at past 24 moments, the hidden layer is a fully connected network, the output layer outputs data at 1 moment in the future and slidingly predicts T moments in the future, and then the prediction data is processed through discrete wavelet transformation to obtain local characteristics of the prediction data.
3. The intelligent household energy management system prediction decision integrated scheduling method based on deep reinforcement learning of claim 1, wherein the discrete wavelet transform processing comprises decomposition, denoising and reconstruction. The wavelet decomposition selects Daubechies3 wavelet to decompose the predicted data into an approximate (low-pass) coefficient and a detail (high-pass) coefficient, the wavelet expansion coefficient is subjected to thresholding by selecting a corresponding threshold value and a threshold rule to obtain a wavelet expansion coefficient after thresholding, and the reconstruction is carried out according to the wavelet expansion coefficient after thresholding and the unprocessed wavelet expansion coefficient to obtain the local characteristics of the predicted data.
4. The intelligent household energy management system prediction decision integrated scheduling method based on deep reinforcement learning according to claim 3, wherein the thresholding process comprises selecting a corresponding threshold and a threshold rule, wherein the threshold is μ times (0 < μ < 1) of a maximum value θ in input data, the threshold rule selects a soft threshold function, all values of detail (high-pass) coefficients Cd with an amplitude smaller than μθ are set to 0, all values of detail (high-pass) coefficients Cd with an amplitude greater than or equal to μθ are subtracted by μθ, and the expression is as follows:
Figure FDA0004131233710000011
5. the intelligent home energy management system prediction decision integrated scheduling method based on deep reinforcement learning according to any one of claims 1 to 4, wherein the learning link controller comprises an Actor network, a Critic network 1, a target Critic network 1, a Critic network 2 and a target Critic network 2, and the application link controller is only the Actor network;
the number of neurons of an input layer of the application link controller corresponds to the dimension of the environmental state, an activation function adopted by the hidden layer is a linear rectification function, the number of neurons of an output layer corresponds to the number of actions in an action space, and the adopted activation function is a hyperbolic tangent function;
the structure of the Critic network 1, the structure of the target Critic network 1, the structure of the Critic network 2 and the structure of the target Critic network 2 are the same, the input layer inputs the environmental state and the action information, the corresponding neuron number corresponds to the dimension of the environmental state and the sum of the action number, the environmental state and the action information are spliced and then are connected with a plurality of hidden layers, the activation function adopted by the hidden layers is a linear rectification function, and the activation function adopted by the output layer connected with the hidden layers is a linear rectification function.
CN202310260699.4A 2023-03-13 2023-03-13 Intelligent household energy management system prediction decision-making integrated scheduling method based on deep reinforcement learning Pending CN116227883A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310260699.4A CN116227883A (en) 2023-03-13 2023-03-13 Intelligent household energy management system prediction decision-making integrated scheduling method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310260699.4A CN116227883A (en) 2023-03-13 2023-03-13 Intelligent household energy management system prediction decision-making integrated scheduling method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN116227883A true CN116227883A (en) 2023-06-06

Family

ID=86575066

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310260699.4A Pending CN116227883A (en) 2023-03-13 2023-03-13 Intelligent household energy management system prediction decision-making integrated scheduling method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN116227883A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116734424A (en) * 2023-06-13 2023-09-12 青岛理工大学 Indoor thermal environment control method based on RC model and deep reinforcement learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116734424A (en) * 2023-06-13 2023-09-12 青岛理工大学 Indoor thermal environment control method based on RC model and deep reinforcement learning
CN116734424B (en) * 2023-06-13 2023-12-22 青岛理工大学 Indoor thermal environment control method based on RC model and deep reinforcement learning

Similar Documents

Publication Publication Date Title
Pinto et al. Data-driven district energy management with surrogate models and deep reinforcement learning
CN110458443B (en) Smart home energy management method and system based on deep reinforcement learning
Pinto et al. Coordinated energy management for a cluster of buildings through deep reinforcement learning
Wei et al. Deep reinforcement learning for building HVAC control
Vazquez-Canteli et al. MARLISA: Multi-agent reinforcement learning with iterative sequential action selection for load shaping of grid-interactive connected buildings
Kim A supervised-learning-based strategy for optimal demand response of an HVAC system in a multi-zone office building
CN113572157B (en) User real-time autonomous energy management optimization method based on near-end policy optimization
CN111709672A (en) Virtual power plant economic dispatching method based on scene and deep reinforcement learning
Nazir et al. Forecasting energy consumption demand of customers in smart grid using Temporal Fusion Transformer (TFT)
Xu et al. Potential analysis of the attention-based LSTM model in ultra-short-term forecasting of building HVAC energy consumption
Ruan et al. Operation strategy optimization of combined cooling, heating, and power systems with energy storage and renewable energy based on deep reinforcement learning
Gao et al. Multi-agent reinforcement learning dealing with hybrid action spaces: A case study for off-grid oriented renewable building energy system
Gao et al. An iterative optimization and learning-based IoT system for energy management of connected buildings
Zhang et al. Deep reinforcement learning based Bi-layer optimal scheduling for microgrids considering flexible load control
CN116227883A (en) Intelligent household energy management system prediction decision-making integrated scheduling method based on deep reinforcement learning
Chu et al. Optimal home energy management strategy: A reinforcement learning method with actor-critic using Kronecker-factored trust region
Fu et al. Predictive control of power demand peak regulation based on deep reinforcement learning
Zhang et al. Metaems: A meta reinforcement learning-based control framework for building energy management system
Han et al. Data-driven heat pump operation strategy using rainbow deep reinforcement learning for significant reduction of electricity cost
Hussain et al. Energy management of buildings with energy storage and solar photovoltaic: A diversity in experience approach for deep reinforcement learning agents
Hongwei et al. Robust stochastic optimal dispatching of integrated electricity-gas-heat systems with improved integrated demand response
Xia et al. Optimal scheduling of residential heating, ventilation and air conditioning based on deep reinforcement learning
Liu et al. Enhancing HVAC energy management through multi-zone occupant-centric approach: A multi-agent deep reinforcement learning solution
Garlík Application of neural networks and evolutionary algorithms to solve energy optimization and unit commitment for a smart city
Zhang Data-driven whole building energy forecasting model for data predictive control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination