CN116227883A

CN116227883A - Intelligent household energy management system prediction decision-making integrated scheduling method based on deep reinforcement learning

Info

Publication number: CN116227883A
Application number: CN202310260699.4A
Authority: CN
Inventors: 程杰; 杨胜天
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2023-03-13
Filing date: 2023-03-13
Publication date: 2023-06-06

Abstract

The invention discloses a prediction decision-making integrated scheduling method of an intelligent household energy management system based on deep reinforcement learning. In the prediction and decision integrated scheduling mode, the intelligent household energy management system optimization control based on the randomness strategy Soft Actor Critic algorithm consists of a learning link and an application link, and experimental results show that the method provided by the invention can not only greatly reduce the intelligent household energy cost while guaranteeing the comfort level requirement of a user, but also effectively reduce the scheduling performance caused by environmental change.

Description

Intelligent household energy management system prediction decision-making integrated scheduling method based on deep reinforcement learning

Technical Field

The invention relates to a prediction decision-making integrated scheduling method of an intelligent household energy management system based on deep reinforcement learning, and belongs to the technical field of intelligent household energy management.

Background

In order to relieve the energy crisis, the electric power market of each country is deeply reformed, and the intelligent power grid is inoculated. Under the smart grid environment, various sustainable energy sources including the distributed photovoltaic power generation panel can be directly connected with the power grid, so that a user can save energy cost and even transmit redundant power to the power grid, and the problem of insufficient supply and demand of the power grid is relieved under certain conditions.

There are many studies on the problem of controlling the power utilization strategy of the intelligent household energy management system, and certain progress is made, for example, conventional control technology, mixed integer linear programming and other methods are used, so that the energy cost of some household users is reduced to a certain extent, and the overall effect is not satisfactory. Deep reinforcement learning combines the advantages of deep learning and reinforcement learning. Deep learning can extract high-order data features from a high-dimensional state space, and any function can be mapped through a neural network. Reinforcement learning can solve the problem of continuous decision making, and can still achieve the established objective without a model. In the face of a continuous state environment, some research works utilize a depth deterministic strategy gradient algorithm to be applied to home energy management, and the effectiveness of the algorithm is verified based on simulation results of real data. In order to obtain a better control strategy, some research works try to characterize some future features in the environmental state through prediction, and the accuracy between the predicted data and the future actual data directly influences the decision process, so that a certain influence is generated on the overall scheduling performance of the intelligent home energy management system. The prediction decision integrated scheduling mode provided by the invention extracts local characteristics from predicted data through discrete wavelet transformation, and scheduling of all devices in the intelligent home can be dynamically and comprehensively scheduled under the condition of meeting user comfort requirements so as to minimize the energy cost of the intelligent home.

Disclosure of Invention

The invention aims to solve the technical problems that some future characteristics in an environment state are represented through prediction, and an intelligent household energy management system intelligent body is used for effectively managing all equipment by combining a deep reinforcement learning algorithm, and in order to solve the problems, the invention provides an intelligent household energy management system prediction decision integrated scheduling method based on deep reinforcement learning, which comprises the following steps:

the intelligent household energy management system acquires output power, electricity price and outdoor temperature data of a distributed photovoltaic generator at the current moment and 23 past moments;

the prediction module utilizes a long-period memory network to predict the output power, electricity price and outdoor temperature data of the distributed photovoltaic generator at T moments in the future according to the output power, electricity price and outdoor temperature data of the distributed photovoltaic generator at the past 24 moments in a sliding manner;

utilizing discrete wavelet transformation to carry out decomposition, denoising and reconstruction on the predicted data of the T future moments, and obtaining local characteristics of the predicted data;

the application link controller outputs actions of all controllable devices in the intelligent home in real time according to the environmental state of the intelligent home at the current moment, wherein the environmental state comprises real-time data at the current moment, predicted data at T moments in the future and the local characteristics of the predicted data extracted by wavelet transformation;

according to the real-time action of the controllable equipment, the heating ventilation air conditioner and the energy storage equipment are controlled;

acquiring the environmental state of the smart home at the next moment and rewards at the next moment, and packaging and sending the environmental state, the actions at the current moment, the environmental state at the next moment and rewards at the next moment to an experience pool for storage;

randomly extracting a certain number of training samples from the experience pool, taking the maximum rewards as a target, and training the learning link controller by utilizing a randomness strategy Soft Actor Critic algorithm;

and copying the parameters of the learning link controller to the application link controller when the strategy rewards tend to be stable.

Further, the prediction module is based on a long-short-period memory network, sequentially connected with an input layer, a hidden layer and an output layer, wherein the input layer is used for receiving output power, electricity price and outdoor temperature data of the distributed photovoltaic generator at past 24 moments, the hidden layer is a fully connected network, the output layer outputs data at future 1 moment and slides to predict future T moments, and then the prediction data is processed through discrete wavelet transformation to obtain local characteristics of the prediction data.

Further, the processing of the discrete wavelet transform includes decomposition, denoising, and reconstruction. The wavelet decomposition selects Daubechies3 wavelet to decompose a predicted section of data into approximate (low pass) and detail (high pass) coefficients, the wavelet expansion coefficients are subjected to thresholding processing by selecting corresponding thresholds and threshold rules, the wavelet expansion coefficients after thresholding processing are obtained, and reconstruction is carried out according to the wavelet expansion coefficients after thresholding processing and the unprocessed wavelet expansion coefficients, so that local characteristics of the predicted data are obtained.

Further, the thresholding includes selecting a corresponding threshold and a threshold rule, where the threshold is μ times (0 < v < 1) of a maximum value θ in the input data, the threshold rule selects a soft threshold function, all values of a detail (high-pass) coefficient Cd with an amplitude smaller than μθ are set to 0, and all values of a detail (high-pass) coefficient Cd with an amplitude greater than or equal to μθ are subtracted by μθ, where the expression is as follows:

further, the learning link controller comprises an Actor network, a Critic network 1, a target Critic network 1, a Critic network 2 and a target Critic network 2, and the application link controller only comprises the Actor network;

the number of neurons of an input layer of the application link controller corresponds to the dimension of the environmental state, an activation function adopted by the hidden layer is a linear rectification function, the number of neurons of an output layer corresponds to the number of actions in an action space, and the adopted activation function is a hyperbolic tangent function;

the structure of the Critic network 1, the structure of the target Critic network 1, the structure of the Critic network 2 and the structure of the target Critic network 2 are the same, the input layers comprise environment states and action information, the corresponding neuron number corresponds to the dimension of the environment states and the sum of the action numbers, the environment states and the action information are spliced and then are connected with a plurality of hidden layers, an activation function adopted by the hidden layers is a linear rectification function, and an activation function adopted by an output layer connected with the hidden layers is a linear activation function.

Drawings

FIG. 1 is a schematic diagram of a scheduling mode of prediction and decision integration of a prediction and decision integration scheduling method of an intelligent home energy management system based on deep reinforcement learning according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of optimizing control of a predictive decision-making integrated scheduling method of an intelligent home energy management system based on deep reinforcement learning according to an embodiment of the present invention;

FIG. 3 is a workflow diagram of a predictive decision-making integrated scheduling method for a smart home energy management system based on deep reinforcement learning provided in an embodiment of the present invention;

fig. 4 is a graph comparing the performance of the inventive method based on a real data embodiment with other methods.

Detailed Description

In order to facilitate the understanding and implementation of the present invention by those skilled in the art, a technical solution of the present invention will be further described with reference to the accompanying drawings, and a specific embodiment of the present invention will be given.

The embodiment of the invention provides a prediction decision-making integrated scheduling method of an intelligent household energy management system based on deep reinforcement learning, and the intelligent household integrated structure comprises a distributed photovoltaic generator, an intelligent ammeter, an indoor and outdoor temperature sensor, a refrigerator, a heating ventilation air conditioner and energy storage equipment, wherein the intelligent ammeter and the indoor and outdoor temperature sensor are measuring instruments, the heating ventilation air conditioner and the energy storage equipment are controllable equipment, and the intelligent household energy management system interacts with the distributed photovoltaic power generation equipment, the heating ventilation air conditioner, the energy storage equipment, the intelligent ammeter and the indoor and outdoor temperature sensor through information flow.

In the face of environmental uncertainty, it is very difficult to design an energy management method for effectively scheduling heating, ventilation and air conditioning and energy storage equipment in an intelligent household and realizing energy cost minimization under the condition of meeting user comfort requirements. In order to solve the above-mentioned difficulty, the core design idea of the invention is as follows: firstly, under the conditions of no building thermodynamic model and maintenance of indoor temperature in a certain range, modeling the problem of minimizing the energy cost of the intelligent household as a continuous and dynamic Markov decision process, and combining real-time environment data and prediction data to form a prediction decision integrated scheduling mode, wherein the scheduling mode is a schematic diagram of the prediction decision integrated scheduling mode based on the prediction decision integrated scheduling method of the intelligent household energy management system based on deep reinforcement learning, provided by the embodiment of the invention, as shown in fig. 1; the learning link controller is then trained based on a randomness policy Soft Actor Critic algorithm, as shown in fig. 2, where the controller parameters required for application of the link controller action selection are updated periodically by the learning link controller. Depending on the nature of the markov decision process, the next moment in time home environment state should depend only on the current home environment state and the actions of all devices, independent of the previous home environment state and all device actions. Among other things, the markov decision process is an approximate description of smart home energy management problems, as certain components of the environmental state may not be markov in practice, such as photovoltaic generation power, outdoor temperature, and electricity prices. According to the existing research work, even if certain component information in the environment state does not have Markov, the corresponding problems can be solved through the excellent decision capability of deep reinforcement learning, so that prior information of any uncertainty system parameters is not needed to be known, and the method is suitable for solving the related problems in most fields.

Referring to fig. 3, a working flow chart of a prediction decision integrated scheduling method of an intelligent household energy management system based on deep reinforcement learning is provided in an embodiment of the present invention, and the method includes the following design steps:

step one, modeling the smart home energy cost minimization problem as a markov decision process without building a thermodynamic model and guaranteeing user comfort requirements, and then designing key components of the markov decision process, including environmental states, actions, and rewarding functions. The prediction module is based on long-short-term memory network design, the input layer inputs the output power, electricity price and outdoor temperature data of the distributed photovoltaic generators at the past 24 moments, and the output layer outputs the output power, electricity price and outdoor temperature data of the distributed photovoltaic generators at the future 24 moments in a sliding mode. Only 4 predicted data at the 24 predicted moments are reserved, the predicted data from the 5 th moment to the 24 th moment are decomposed into approximate (low-pass) and detail (high-pass) coefficients through discrete wavelet transformation with a wavelet base of Daubechies3, 0.4 times of the maximum value in input data is selected as a threshold value for the wavelet expansion coefficients, a soft threshold function is selected for thresholding, the wavelet expansion coefficients after thresholding are obtained, and reconstruction is carried out according to the wavelet expansion coefficients after thresholding and the unprocessed wavelet expansion coefficients, so that local features of the predicted data are obtained.

In the above problem of minimizing the energy cost of the smart home, the objective function is the energy cost of the smart home, including the electricity charge generated by the energy buying and selling of the smart home and the power grid

And the energy storage device is charged and discharged to generate depreciation cost +.>

The expression is as follows:

in the method, in the process of the invention,

for the electricity buying and selling cost between the intelligent household electricity and the intelligent power grid at the moment t, lambda _t And->

Price for the purchase and sale of electric power for the user at time t, respectively,/->

Electric power purchased and sold for t-moment smart home and power grid, < + >>

Device loss cost for charging and discharging energy storage device at t moment, < >>

For the depreciation factor of the energy storage device, +.>

And->

And the charging power and the discharging power of the energy storage device at the moment t respectively.

Because the user comfort level requirement must be ensured, namely, the indoor temperature is maintained within a certain range, the decision variables of the Markov decision process comprise the input power of the heating ventilation air conditioner and the charge and discharge power of the energy storage equipment; constraints to be considered are: the invention sets each moment to be 1 hour for convenience, and specifically comprises the following steps:

(1) The heating ventilation air conditioner can be through continuous adjustment input power in order to guarantee user comfort level demand, namely:

in (1) the->

Is the rated power of the heating ventilation air conditioner j. Since user comfort requirements depend on a number of factors (e.g., air temperature, relative humidity, air flow rate, etc.), the present invention uses for simplicityThe comfort temperature range serves as an indication of the user's comfort needs, namely: t (T) ^min ≤/>

Wherein T is ^min And T ^max The minimum temperature value and the maximum temperature value of the indoor comfort temperature range are respectively represented.

(2) The energy storage device can perform charging and discharging actions, and a dynamic change model of the electric quantity is as follows:

in (1) the->

Representing the electric quantity of the energy storage device at the time t+1, eta _c And eta _disc The charging efficiency and the discharging efficiency of the energy storage device are respectively. The power capacity of the energy storage device is limited, the power quantity of the energy storage device is +.>

Should be at minimum charge +.>

And maximum electric quantity>

And (2) the following steps: />

(3) The charge and discharge power of the energy storage device is limited by its corresponding rated power, namely:

in (1) the->

And->

Maximum value of charging power and discharging power of the energy storage device respectively, < >>

The binary variable a is used for preventing the energy storage device from simultaneously carrying out charging and discharging actions for the charging and discharging power of the energy storage device at the moment t.

(4) In order to maintain the power balance in the smart home, the power supplied by the grid power in the smart home should be equal to the total power demand, i.e.:

in (1) the->

And->

Representing the power sold and sold between the grid and the home and the distributed photovoltaic generator output power, respectively. If->

It means that the smart home will sell power into the smart grid at time t, if +.>

It indicates that the smart home needs the smart grid input power to be powered at time t.

In a smart home, the environmental state at the next moment is only dependent on the current environmental state and the actions of the hvac and the energy storage device, independent of the previous environmental state and actions, so that the control of both hvac and energy storage device can be regarded as a markov decision process. In the following design, we describe the sequential decision problem related to smart home energy management as a Markov decision process. In addition, the markov decision process is only an approximate description of the smart home energy management problem, as certain components of the environmental state may not have markov properties in practice, such as distributed photovoltaic generator output power, outdoor temperature, and electricity prices. For a non-strict Markov decision process, deep reinforcement learning can be effectively processed, and the verification result of the invention can also prove the effectiveness of the deep reinforcement learning on the non-strict Markov decision process related problems.

In this embodiment, the main components of the markov decision process include: the environment state, action and rewarding function are respectively designed as follows:

(1) Environmental status. S for environmental state at time t _t The expressions are as follows: output power of distributed photovoltaic generator at time t

Predicted distributed photovoltaic generator output power +.f from time t+1 to time t+T>

Outdoor temperature at time t->

Predicted outdoor temperature from time t+1 to time t+T +.>

Indoor temperature at time t->

Grid power price lambda at time t _t Predicted grid power price lambda from time t+1 to time t+T _t+1:T Electric quantity of energy storage device at time t>

Thus the environmental status can be designed as +.>

Since the price of the user selling power to the smart grid +.>

Price lambda, typically associated with the purchase of electricity _t Correlation (e.g.)>

Delta is a constant), thus->

May not be part of the environmental state.

(2) And (5) acting. A for action space of heating ventilation air conditioner and energy storage equipment at t moment _t Indicating the input power of the heating ventilation air conditioner j at the time t

Charging and discharging power of energy storage device at t moment +.>

If->

It indicates that the energy storage device is performing a charging action at time t, if +.>

It indicates that the energy storage device is performing a discharging action at time t, so that the action space can be designed as +.>

In order to ensure that the energy storage device does not exceed the battery capacity limit during the charge and discharge process, the following requirements need to be met:

(3) A bonus function. Total bonus R at time t _t The representation comprises three components: electric power cost of intelligent household at t moment

The equipment loss cost caused by the charge and discharge actions of the energy storage system at the time t is +.>

The deviation between the indoor temperature and the set indoor temperature range caused by the improper input power of the heating ventilation air conditioner j at the time t generates the temperature difference cost +.>

Since both the electricity cost and the equipment loss cost belong to the energy cost, the bonus function can be designed as +.>

Where ρ is the importance coefficient of smart home energy cost versus user comfort.

And secondly, training an optimal control strategy of the intelligent home facing the heating, ventilation, air conditioning and energy storage equipment in different environments by utilizing a randomness strategy Soft Actor Critic algorithm with the aim of maximizing the total rewards and the motion entropy of each output.

At each moment, the smart home energy management system wants to control the heating, ventilation and air conditioning and the energy storage device jointly to maximize its cumulative rewards with entropy in the future, namely:

wherein, H (pi (|s) _t ) The greater alpha, the more exploratory the policy, the more aggressive the policy.

In order to realize optimal control of heating ventilation air conditioning and energy storage equipment in different environments, the invention designs an intelligent household energy management system prediction decision integrated scheduling method based on deep reinforcement learning, and the actual operation process is as follows: (1) The intelligent household energy management system acquires output power, electricity price and outdoor temperature data of a distributed photovoltaic generator at the current moment and 23 past moments; (2) The prediction module utilizes a long-period memory network to predict the output power, electricity price and outdoor temperature data of the distributed photovoltaic generator at T moments in the future according to the output power, electricity price and outdoor temperature data of the distributed photovoltaic generator at the past 24 moments in a sliding manner; (3) Utilizing discrete wavelet transformation to carry out decomposition, denoising and reconstruction on the predicted data of the T future moments, and obtaining local characteristics of the predicted data; (4) The application link controller outputs actions of all controllable devices in the intelligent home in real time according to the environmental state of the intelligent home at the current moment, wherein the environmental state comprises real-time data at the current moment, predicted data at T moments in the future and the local characteristics of the predicted data extracted by wavelet transformation; (5) According to the real-time action of the controllable equipment, the heating ventilation air conditioner and the energy storage equipment are controlled; (6) Acquiring the environmental state of the smart home at the next moment and rewards at the next moment, and packaging and sending the environmental state, the actions at the current moment, the environmental state at the next moment and rewards at the next moment to an experience pool for storage; (7) Randomly extracting a certain number of training samples from the experience pool, taking the maximum rewards as a target, and training the learning link controller by utilizing a randomness strategy Soft Actor Critic algorithm; (8) When the strategy rewards are stable, the trained parameters of the learning link controller are copied to the application link controller, and then the method is continuously applied to control of heating, ventilation, air conditioning and energy storage equipment in the intelligent home.

The learning link controller comprises an Actor network, a Critic network 1, a target Critic network 1, a Critic network 2 and a target Critic network 2, and the application link controller only comprises the Actor network.

The number of neurons of the input layer of the application link controller corresponds to the dimension 79 of the environmental state, the activation function adopted by the hidden layer is a linear rectification function, the number of neurons of the output layer corresponds to the number of actions 2 in the action space, and the adopted activation function is a hyperbolic tangent function.

The structure of the Critic network 1, the target Critic network 1, the Critic network 2 and the target Critic network 2 comprises an input layer, a plurality of hidden layers and an output layer, wherein the input layer comprises environment state and action information, the number of neurons corresponding to the input layer corresponds to the sum of the number of dimensions and actions of the environment state, the environment state and the action information are spliced and then are connected with the hidden layers, an activation function adopted by the hidden layers is a linear rectification function, and an activation function adopted by the output layer connected with the hidden layers is a linear activation function.

The training process of the controller in the embodiment of the method is as follows. Firstly, randomly extracting a certain number of training data sets from an experience pool, obtaining the output of a Critic network 1 and a Critic network 2 based on the data, and avoiding overestimation caused by maximization by taking the minimum output value; then updating the Critic network parameters according to the difference value between the corresponding Critic network and the target Critic network; further, the environment state data in the training data is used as the input of an Actor network, the Actor network correspondingly outputs an action set, the action and the environment state in the training data are input into the Critic network together, and then an action value function is obtained, and the action value function can be used for calculating the strategy gradient; then, updating the Actor network parameters by using the strategy gradient and the entropy regularization coefficient alpha by using the entropy and the target entropy; and finally updating the target Critic network 1 and the target Critic network 2. The above process iterates until the trained strategic rewards are somewhat stable.

And thirdly, copying the trained parameters of the learning link controller to the application link controller, and then outputting action instructions of the heating ventilation air conditioner and the energy storage device by the application link controller according to the real-time environment state information, and immediately controlling the heating ventilation air conditioner and the energy storage device.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

1) The intelligent household energy management system prediction decision integrated scheduling method based on the randomness strategy Soft Actor Critic algorithm is provided, the method does not need to know prior information of any uncertainty system parameter and a building thermodynamic model, has the capability of predicting and comprehensively planning multi-period system benefits, and can effectively realize energy cost minimization;

2) The training of the controller parameters in the prediction and decision integrated scheduling mode learning link and the decision making process of the controller in the application link are not interfered with each other, the control of the application link controller on the equipment is similar to the function substitution process of an example, the time for outputting the control result can be ignored, the problem of effectiveness reduction caused by environmental change can be solved by the intelligent household energy management system, and therefore, the intelligent household energy management system has high robustness.

3) The method of the invention has high efficiency and foresight. The added prediction module can effectively predict the change trend of the output power, electricity price and outdoor temperature of the future distributed photovoltaic generator, and the performance simulation based on actual data shows that: compared with the prior art, the method can reduce 57.88% of energy cost on the premise of maintaining basic life of users.

As shown in fig. 4, the performance of the method embodiment of the present invention is compared with that of other methods, and the comparison scheme one: the heating, ventilation and air conditioning system is controlled by adopting a traditional on/off mode without considering energy storage equipment, and taking a refrigeration mode as an example, when the indoor temperature is higher than the set upper temperature limit, the heating, ventilation and air conditioning system is started; and when the indoor temperature is lower than the set temperature lower limit, closing the heating, ventilation and air conditioning system. And a comparison scheme II: considering the energy storage device, the randomness strategy Soft Actor Critic algorithm is used for controlling the heating ventilation air conditioner and the energy storage device. And a comparison scheme III: considering energy storage equipment, predicting output power, electricity price and outdoor temperature data of the distributed photovoltaic generators at 4 future moments by using a prediction module, and controlling the heating ventilation air conditioner and the energy storage equipment by using a randomness strategy Soft Actor Critic algorithm. And a comparison scheme IV: considering the energy storage equipment, a prediction module is used for predicting the output power, electricity price and outdoor temperature data of the distributed photovoltaic generators at 24 future moments, and a randomness strategy Soft Actor Critic algorithm is utilized to control the heating ventilation air conditioner and the energy storage equipment. Comparison scheme five: considering energy storage equipment, predicting output power, electricity price and outdoor temperature data of a distributed photovoltaic generator at 24 future moments by using a prediction module, processing the predicted 24 moment data by using wavelet transformation, and finally controlling the heating, ventilation and air conditioning and the energy storage equipment by using a randomness strategy Soft Actor Critic algorithm. The photovoltaic power generation power, the outdoor temperature and the electricity price data input by the environment in the system are all from a Pecan Street database in Texas, in 2020, 1 day to 9 months and 30 days. Compared with the first comparison scheme and the second comparison scheme, the method can respectively reduce 57.88% and 31.16% of energy cost on the premise of ensuring the comfort level requirement of the user, and can obtain the effect of the energy storage device on reducing the energy cost of the user and the superiority of the randomness strategy Soft Actor Critic algorithm on the heating, ventilation and air conditioning and the control strategy of the energy storage device. Compared with the third scheme, the fourth scheme and the fifth scheme, the method can ensure the comfort level requirement of the user, respectively reduce the energy cost by 14.71%, 17.14% and 12.12%, and can obtain the high efficiency of the control strategy of the heating, ventilation, air conditioning and energy storage equipment based on the combination of the long-period memory network and the randomness strategy Soft Actor Critic algorithm compared with the independent use of the randomness strategy Soft Actor Critic algorithm.

The foregoing description of the preferred embodiments of the present disclosure is provided only and not intended to limit the disclosure so that various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

While the specific embodiments of the present disclosure have been described above with reference to the drawings, it should be understood that the present disclosure is not limited to the embodiments, and that various modifications and changes can be made by one skilled in the art without inventive effort on the basis of the technical solutions of the present disclosure while remaining within the scope of the present disclosure.

Claims

1. The intelligent household energy management system prediction decision integrated scheduling method based on deep reinforcement learning is characterized by comprising the following steps of:

2. The intelligent household energy management system prediction decision integrated scheduling method based on deep reinforcement learning according to claim 1, wherein the prediction module is based on a long-short-period memory network, and sequentially connected with an input layer, a hidden layer and an output layer, wherein the input layer is used for receiving output power, electricity price and outdoor temperature data of a distributed photovoltaic generator at past 24 moments, the hidden layer is a fully connected network, the output layer outputs data at 1 moment in the future and slidingly predicts T moments in the future, and then the prediction data is processed through discrete wavelet transformation to obtain local characteristics of the prediction data.

3. The intelligent household energy management system prediction decision integrated scheduling method based on deep reinforcement learning of claim 1, wherein the discrete wavelet transform processing comprises decomposition, denoising and reconstruction. The wavelet decomposition selects Daubechies3 wavelet to decompose the predicted data into an approximate (low-pass) coefficient and a detail (high-pass) coefficient, the wavelet expansion coefficient is subjected to thresholding by selecting a corresponding threshold value and a threshold rule to obtain a wavelet expansion coefficient after thresholding, and the reconstruction is carried out according to the wavelet expansion coefficient after thresholding and the unprocessed wavelet expansion coefficient to obtain the local characteristics of the predicted data.

4. The intelligent household energy management system prediction decision integrated scheduling method based on deep reinforcement learning according to claim 3, wherein the thresholding process comprises selecting a corresponding threshold and a threshold rule, wherein the threshold is μ times (0 < μ < 1) of a maximum value θ in input data, the threshold rule selects a soft threshold function, all values of detail (high-pass) coefficients Cd with an amplitude smaller than μθ are set to 0, all values of detail (high-pass) coefficients Cd with an amplitude greater than or equal to μθ are subtracted by μθ, and the expression is as follows:

5. the intelligent home energy management system prediction decision integrated scheduling method based on deep reinforcement learning according to any one of claims 1 to 4, wherein the learning link controller comprises an Actor network, a Critic network 1, a target Critic network 1, a Critic network 2 and a target Critic network 2, and the application link controller is only the Actor network;

the structure of the Critic network 1, the structure of the target Critic network 1, the structure of the Critic network 2 and the structure of the target Critic network 2 are the same, the input layer inputs the environmental state and the action information, the corresponding neuron number corresponds to the dimension of the environmental state and the sum of the action number, the environmental state and the action information are spliced and then are connected with a plurality of hidden layers, the activation function adopted by the hidden layers is a linear rectification function, and the activation function adopted by the output layer connected with the hidden layers is a linear rectification function.