CN111242443B - Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet - Google Patents

Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet Download PDF

Info

Publication number
CN111242443B
CN111242443B CN202010010410.XA CN202010010410A CN111242443B CN 111242443 B CN111242443 B CN 111242443B CN 202010010410 A CN202010010410 A CN 202010010410A CN 111242443 B CN111242443 B CN 111242443B
Authority
CN
China
Prior art keywords
network
operator
time slot
information
power generation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010010410.XA
Other languages
Chinese (zh)
Other versions
CN111242443A (en
Inventor
孙迪
王宁
关心
林霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Heilongjiang Electric Power Co Ltd
Heilongjiang University
Original Assignee
State Grid Heilongjiang Electric Power Co Ltd
Heilongjiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Heilongjiang Electric Power Co Ltd, Heilongjiang University filed Critical State Grid Heilongjiang Electric Power Co Ltd
Priority to CN202010010410.XA priority Critical patent/CN111242443B/en
Publication of CN111242443A publication Critical patent/CN111242443A/en
Application granted granted Critical
Publication of CN111242443B publication Critical patent/CN111242443B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A virtual power plant economic dispatching method in an energy internet based on deep reinforcement learning belongs to the technical field of energy distribution of virtual power plants. The invention solves the problems of large communication load and delay, high calculation complexity and poor reliability of data transmission in the existing method. The invention provides a distributed power generation economic dispatching structure utilizing a three-layer system structure based on edge calculation, wherein: the first and second layers are edge computing layers, and the third layer is a cloud computing layer. The proposed three-layer edge computing architecture reduces the computational complexity of processing training tasks at the central node, and further reduces the communication load between the VPP operator and the DG, thereby also reducing the response time of industrial users, and simultaneously also keeping the privacy of the industrial users and improving the reliability of data transmission. The invention can be applied to the energy distribution of the virtual power plant.

Description

Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet
Technical Field
The invention belongs to the technical field of energy distribution of virtual power plants, and particularly relates to a virtual power plant economic dispatching method in an energy internet based on deep reinforcement learning.
Background
With the access of large-scale distributed power generation in the energy Internet, due to the limitation of geographical conditions, a traditional microgrid has certain limitation, the effective utilization of multi-region large-scale distributed power generation is hindered, and the electric power is cut down very frequently. Due to the mismatch between the scale of construction of renewable energy stations and the demand of local loads, the capacity of renewable energy sources is limited, resulting in a certain amount of power reduction in wind power stations and photovoltaic power station concentration areas. Compared with a micro-grid, the VPP has a larger energy load channel, can better match the construction scale of renewable energy with the demand scale of local load, and reduces power reduction.
Due to the complexity of economic dispatch scenarios, such as intelligent devices that manage distributed renewable energy and industrial users, large amounts of different types of data need to be transmitted in real time. Due to the close relationship between industrial users and VPP operators, reasonable economic scheduling should take full account of user participation. Industrial users can participate in economic dispatch by contracting with VPP operators. The VPP operator needs to receive data from the demand side industrial users and the DG units (distributed generation units). Since data transmission between a VPP operator and a device requires a certain degree of performance guarantees to achieve optimal economic scheduling, VPPs employ advanced control, sensing and communication techniques to sense and collect data and transmit it to the VPP's economic scheduling control center. VPPs achieve optimal economic scheduling in complex situations, requiring consideration of the wireless link between most devices and the VPP operator, and large data transfers can easily exceed transmission capacity limits. Thus, resource-limited bulk devices cannot directly send demand to the VPP operator, which poses a significant challenge to efficient economic scheduling.
Traditionally, VPP operators distribute geographically dispersed distributed power supplies in a centralized fashion. The information of the users and the real-time status data of the DGs from the plurality of areas are sent to the cloud for storage and processing, which results in a large network communication load and consumption of computing resources. However, this results in higher network delay and computational complexity. In practical situations, long distance data transmission from various DG and industrial users to a cloud computing center can consume a large amount of energy. Moreover, transmitted data raises privacy concerns for industrial users in different regions. In a traditional cloud computing mode, local sensitive data needs to be uploaded to a cloud computing center, and the risk of privacy disclosure of a user is increased. In addition, the generation and transmission of a large amount of data makes it difficult to accurately ensure the reliability of data transmission in a complex environment.
Disclosure of Invention
The invention aims to solve the problems of high computational complexity, large communication load and delay and poor reliability of data transmission in the conventional method, and provides a deep reinforcement learning-based economic dispatching method for a virtual power plant in an energy Internet.
The technical scheme adopted by the invention for solving the technical problems is as follows: the method for economically scheduling the virtual power plant in the energy internet based on deep reinforcement learning comprises the following steps:
step one, collecting power generation side and user side information from an area I by using an industrial side server and a power supply side server of the area I for any area I, wherein I =1,2, \ 8230, and I are the total number of the areas;
respectively training the operator-critic network by using the information collected in each area so as to respectively obtain the operator-critic network trained by using the information in each area;
step two, deploying the trained operator-critic networks at edge nodes of corresponding areas respectively;
and step three, the industrial side server and the power supply side server in each area collect information from the power generation side and the user side in real time, input the collected information into an operator-critical network on a corresponding edge node, and obtain decision information of each area in real time.
The invention has the beneficial effects that: the invention provides a deep reinforcement learning-based economic dispatching method for a virtual power plant in an energy internet. Since we consider real-time economic dispatch scenarios, demand response and energy delivery are real-time. And on the second layer, the agent manages the distributed power supplies and the industrial users of the local area to perform online scheduling, and compared with the mode that scheduling of all areas is put into a cloud center, communication delay and response time to the industrial users can be reduced. The calculation and the storage are completed in the edge node, the application program is started on the edge server, and the new energy is used for supplying power to the server nearby, so that the energy consumption can be obviously reduced. In the framework proposed by the invention, the first and second layers are edge computing layers, while the third layer is a cloud computing layer. The proposed three-layer edge computing architecture reduces the computational complexity of processing the training task at the central node, and further reduces the communication load between the VPP operator and the DG department, thereby also reducing the response time of the industrial user, and simultaneously also keeping the privacy of the industrial user, and improving the reliability of data transmission.
Drawings
FIG. 1 is a diagram of an economic dispatch architecture as set forth in the present invention;
FIG. 2 is a block diagram of the distributed power generation economic dispatch architecture utilizing a three-tier architecture based on edge computing as proposed by the present invention;
FIG. 3 is a diagram of an information delivery model for DRL-based VPP economic scheduling of the present invention;
in the figure: s i For the real-time status of zone i, a i Is a state s i Corresponding action, r i Is a return value, pi is a strategy, V is a state value function, theta is a parameter of an actor network in a thread, and theta is v Is a parameter of the critic network in the thread, and theta 'is a parameter of the global actor network, theta' v Parameters of a global critic network;
FIG. 4 is a graph of power from photovoltaic power generation, wind power generation, and controlled load, uncontrolled load power for a random day;
in the figure: PV represents photovoltaic, WT represents wind power, controllable load represents Controllable load, and uncontrollable load represents uncontrollable load;
FIG. 5 is a graph of the return value as a function of iteration number;
FIG. 6 is a graph comparing the generated power of wind power with the actual power;
FIG. 7 is a graph of generated power versus actual power for a photovoltaic cell;
FIG. 8 is a graph of power generated by a gas turbine versus actual power;
FIG. 9 is a graph of the optimization results for a controllable load;
FIG. 10 is a graph comparing the cost of the inventive process and the DPG process.
Detailed Description
The first embodiment is as follows: the method for economically scheduling the virtual power plant in the energy internet based on the deep reinforcement learning comprises the following steps:
step one, collecting power generation side and user side information from an area I by using an industrial side server and a power supply side server of the area I for any area I, wherein I =1,2, \ 8230, and I are the total number of the areas;
respectively training the operator-critic network of the VPP operator cloud server by using the information collected by each region to respectively obtain the operator-critic network trained by using the information of each region;
step two, deploying the trained operator-critic networks at edge nodes of corresponding areas respectively;
and step three, the industrial side server and the power supply side server in each area collect information from the power generation side and the user side in real time, input the collected information into an operator-critical network on a corresponding edge node, and obtain decision information of each area in real time.
The second embodiment is as follows: the first difference between the present embodiment and the specific embodiment is: in the first step, the operator-critical network of the VPP operator cloud server is trained by using the information collected in each area, an asynchronous method is adopted, and 8 threads are run in parallel.
The third concrete implementation mode: the first difference between the present embodiment and the specific embodiment is: the objective function of the operator-critical network is as follows:
Figure BDA0002356951350000031
wherein: c is the total operating cost of the area i,
Figure BDA0002356951350000041
photovoltaic investment at time slot k for region iInitial depreciation cost, K =0,1, \ 8230;, K (24 hours considered in VPP, K equals 23), -or->
Figure BDA0002356951350000042
For the photovoltaic operation and maintenance costs of zone i in time slot k, <' >>
Figure BDA0002356951350000043
For the initial depreciation cost of the wind turbine for zone i in time slot k, based on the wind turbine status of the wind turbine>
Figure BDA0002356951350000044
For the wind turbine operating and maintenance costs of zone i in time slot k, ->
Figure BDA0002356951350000045
Based on the initial depreciation cost of the micro gas turbine in the time slot k for the area i>
Figure BDA0002356951350000046
For the micro gas turbine operating and maintenance costs of zone i in time slot k, ->
Figure BDA0002356951350000047
Environmental protection costs of micro gas turbines in time slot k for zone i->
Figure BDA0002356951350000048
The costs of the micro gas turbine itself for the section i in time slot k, λ being the compensation factor, are greater or less>
Figure BDA0002356951350000049
Controllable load for zone i in time slot k, x i (k) Selection of interruptible load percentage vector, x, for zone i in time slot k i (k) Has a value range of [0,1 ]]。
The fourth concrete implementation mode: the first difference between the present embodiment and the specific embodiment is: the specific training process of the operator network in the operator-critical network comprises the following steps:
the actor network consists of a mu network and a sigma network, and the mu network and the sigma network consist of 2 full connection layers;
the 1 st full-connection layer of the mu network and the sigma network has an activation function of tanh, input dimensionality of 5 and output dimensionality of h;
activating functions of the 2 nd full connection layer of the mu network and the sigma network are softplus, input dimensionality is h, and output dimensionality is m;
inputting the information of the power generation side and the user side into the mu network and the sigma network to obtain the output of the mu network and the sigma network; and then carrying out normal random sampling on the output of the mu network and the sigma network to obtain 4-dimensional action of the operator network output.
The fifth concrete implementation mode is as follows: the fourth difference between the present embodiment and the specific embodiment is that: the specific training process of the critic network in the operator-critic network comprises the following steps:
the critic network is composed of full connection layers;
inputting the information of the power generation side and the user side and the 4-dimensional action output by the operator network into the full connection layer of the critic network, splicing the output of the full connection layer to obtain a splicing result, and performing linear transformation on the splicing result to obtain a one-dimensional return value output by the critic network.
The sixth specific implementation mode: the fifth embodiment is different from the fifth embodiment in that: the expression of the return function of the operator-critical network is as follows:
Figure BDA0002356951350000051
wherein: k 1 、K 2 、K 3 And K 4 Are all weighted values.
And guiding the training of the operator network according to the return function value output by the critic network.
Edge computing is used to provide computing services on batch processing equipment near the network edge of a VPP. First, edge computation can greatly reduce data transfer from the device to the VPP operator through pre-processing. Second, the edge computing architecture can shift the computational burden to the edge. Fig. 1 shows the economic dispatch architecture proposed by the method of the present invention, which consists of four main components: a power source side server (PSS), an industrial user side server, a proxy edge server and a VPP operator cloud server. The power source side server connects the power devices through different communication technologies (e.g., 5g, wifi). It collects and processes power generation data from distributed power equipment and transmits the data to the proxy edge server in real time. The PSS also receives scheduling information for the proxy edge server and provides power to the industrial users. The industrial customer premises server connects the power devices through different communication technologies (e.g. 5g, wifi). It collects and processes power consumption information of industrial users and transmits data to the proxy edge server in real time. And making a local economic dispatching decision according to the analysis results of the industrial user side server and the power supply side server, and interacting the proxy edge server with the servers on the two sides. The VPP operator cloud server meets the computing requirements of the proxy edge server and manages each proxy. It can not only help the proxy server to provide real-time analysis and computation, but also collect the scheduling information of the managed proxy.
FIG. 2 illustrates a distributed power generation economic dispatch architecture utilizing a three-tier architecture based on edge computation as proposed by the present invention. First, the VPP operator sets up agents to manage distributed power generation and industrial users in different regions. In terms of demand, the user's controllable load participates in demand response, which may reduce load demand during peak hours. In contrast to the VPP operator, each proxy is an edge compute server. The industrial customer side server and the power side server collect data from each distributed power generation unit and extract and aggregate the data in real time mode. These distributed power generation may be photovoltaic power generation, wind power generation and micro gas turbines. The proxy server provides the optimal economic dispatching strategy for the area and finally sends the decision information to the VPP operator. The proposed architecture is suitable for offline training and real-time online scheduling. First, in the offline training phase, the industrial-side server and the power supply server must process and collect information from the power generation side and the user side in a specific area, and transmit the collected information to the VPP operator cloud server. The VPP operator cloud server performs model training according to large-scale off-line data and transmits the trained model to the proxy edge server in a specific area. During real-time economic dispatching, data of industrial users and distributed power supplies are collected by the servers of the two parties and transmitted to the proxy edge server, and the proxy edge server is put into a model trained before as input to obtain a real-time dispatching strategy. The three-layer economic dispatching model is adaptive to the distributed characteristic of the power supply, and the problem of large-scale data transmission in VPP economic dispatching is solved. More flexible and adaptable to the expansion of dynamic networks, making it a more scalable solution.
The goal of economic dispatch by the VPP operators is to minimize the compensation to the industrial users and the operating costs of the DG (including photovoltaic, wind turbines and micro gas turbines). On the basis of minimizing the cost of a VPP operator, the optimal economic scheduling algorithm fully considers C pom ,C wom And C dom . In particular, we also consider the environmental cost C of micro gas turbines de And fuel cost C d . In general, the initial depreciation cost of DG units is taken into consideration and defined as C, respectively pdp ,C wdp And C ddp . We consider the needs of the industrial users, and also include the compensation cost for the industrial users participating in the demand response, denoted C dr . We consider industrial users as schedulable resources, participating in the economic scheduling of VPP. The proposed algorithm reduces the economic loss of VPP during peak power consumption by cutting down controllable loads, which may result in load peak-to-valley shifts due to increased user flexibility. In this case, the industrial user corresponds to a virtual power generation resource. Therefore, in the objective function of the proposed model, the compensation cost for the demand side is added to be C dr The compensation selects the user who shed the controllable load. The objective function consists of two parts, the first part is the running cost of the DG, and the second part is the compensation cost of the demand party and the controllable load when the system runs.
Figure BDA0002356951350000061
Where C is the total operating cost of managing DG in the VPP and industrial users. C i Is the operating cost for managing DG and industrial users in the management area i.
Figure BDA0002356951350000062
Is the operating cost of the DG in the i region, device for selecting or keeping>
Figure BDA0002356951350000063
Is the compensatory cost of the i region for the industrial user to participate in demand response.
In a real-time scheme, the edge proxy for the VPP is denoted by i. In our proposed optimal economic dispatch model, three types of DG, photovoltaic, wind turbine and micro gas turbine, are considered. The operating costs of a DG device include the initial depreciation of the VPP, the operating and maintenance costs. Specifically, environmental protection and fuel costs of the micro gas turbine are also considered. Where k denotes a slot interval,
Figure BDA0002356951350000064
representing the actual consumption of the photovoltaic, wind turbine and micro gas turbine respectively in time slot k;
(1) Photovoltaic: the initial depreciation cost of the photovoltaic investment may be expressed as
Figure BDA0002356951350000065
Wherein r is the annual rate of interest,
Figure BDA0002356951350000066
is the installation cost per unit volume of the photovoltaic cell, K p Is the photovoltaic capacity coefficient, n p Is the service life of the photovoltaic.
The operating and maintenance costs of the photovoltaic will be
Figure BDA0002356951350000071
Wherein
Figure BDA0002356951350000072
Is the maintenance and operation cost of the photovoltaic, and K pom Is a photovoltaic coefficient of maintenance and operation cost.
(2) A wind power generator: the initial investment cost of the wind driven generator is converted into the output power per unit time. As a depreciation cost of a wind turbine, it has been included in the operating costs of a wind turbine
Figure BDA0002356951350000073
/>
Wherein
Figure BDA0002356951350000074
Is the initial depreciation cost of the wind turbine, device for selecting or keeping>
Figure BDA0002356951350000075
Is the unit installation cost of the wind turbine, K w Is the capacity factor of the wind turbine, r is the annual rate, n w Is the service life of the wind turbine.
The operating and maintenance costs of a wind turbine during operation may be expressed as
Figure BDA0002356951350000076
Wherein, K wom Is the operating cost factor of the wind turbine.
(3) A micro gas turbine: the initial depreciation cost of a micro gas turbine is modeled as:
Figure BDA0002356951350000077
wherein
Figure BDA0002356951350000078
Installation cost per unit volume, K, of a micro gas turbine d Is the capacity coefficient, n, of the micro gas turbine d The service life of the micro gas turbine is prolonged.
Operating and maintenance costs of micro gas turbines:
Figure BDA0002356951350000079
wherein, K dom Is a factor in the operating and maintenance costs of the micro gas turbine.
The environmental protection cost of the micro gas turbine is as follows:
Figure BDA00023569513500000710
where M is the pollutant emitted, M is the total number of pollutants, β m Is the treatment cost of m discharge amount of unit pollutant, alpha dm Is the pollutant discharge amount of the micro gas turbine generating unit electricity.
The relationship function between the power generation efficiency and the output power of the micro gas turbine is as follows:
Figure BDA00023569513500000711
wherein eta d Is the power generation efficiency of the micro gas turbine,
Figure BDA00023569513500000712
is the output power of the micro gas turbine.
The consumption characteristic of the micro gas turbine can be expressed as (10)
Figure BDA0002356951350000081
Wherein
Figure BDA0002356951350000082
Is the cost of the fuel, c d Is the natural gas price and L is the lowest energy released by the natural gas.
According to the above description, the operating costs of a DG are as follows:
Figure BDA0002356951350000083
the demand response can effectively integrate the potential of the user side response, thereby enhancing the safety, stability and economy of the power grid operation. In this context, we consider the demand response of an industrial user during the model building process. In order to achieve the best economic dispatch strategy, each agent selects the controllable load size to be reduced. This is inconvenient for industrial users as the controllable load is reduced, for which purpose it needs to be compensated. The VPP operator should provide power compensation to the user who chooses to curtail the controllable load. Controlling a variable of controllable load to be X i (k) And a compensation coefficient lambda. X i (k) Is a variable derived from the power information of all industrial users in the area, defined as the percentage of the maximum interruptible controllable load in each time slot of the industrial area considering agent i, with a compensation cost at the load end of
Figure BDA0002356951350000084
This approach may reduce or reduce part of the power consumption, thereby avoiding peak loads for industrial users. The load of the industrial user is obtained from the acquisition and is classified as controllable load->
Figure BDA0002356951350000085
And a non-controllable load->
Figure BDA0002356951350000086
Since controllable loads can directly respond to economic scheduling of VPPs, consideration is given herein primarily to the reduction of controllable loads involved in the VPPs scheduling process. The compensation cost for the managed controllable load of agent i can be expressed as:
Figure BDA0002356951350000087
where λ is the compensation factor, x i (k) Expressed as a vector of percentage of selected interruptible load, the range of values is 0,1]. The objective function of economic dispatch for each agent i can be expressed as:
Figure BDA0002356951350000088
for the entire VPP system, the power balance constraint is a fundamental problem and should be fully considered in the model building process. In each management area of agent i, the total power consumption of the individual DG units should be equal to the total power consumption of the industrial users. For the total power demand of an industrial user, the curtailment of the controllable load of the industrial user by the agent i, i.e. the
Figure BDA0002356951350000091
The actual power consumption of the DG in each agent management area is limited by the actual power generation in that area. Photovoltaic, wind energy of DG, actual power of micro gas turbine is
Figure BDA0002356951350000092
Respectively as follows:
Figure BDA0002356951350000093
Figure BDA0002356951350000094
Figure BDA0002356951350000095
the percentage of interruptible load in the industrial domain managed by agent i should not exceed the percentage of maximum interrupt controllable load per timeslot, i.e. the percentage of maximum interrupt controllable load per timeslot
0≤x i (k)≤X i (k) (18)
The VPP operator manages all the regions and summarizes scheduling information of each region. Based on the above description, we define the objective function of the optimal economic scheduling policy as follows:
Figure BDA0002356951350000096
in the invention, the optimal economic dispatching strategy provided by the invention minimizes the power generation cost of the distributed power supply and simultaneously meets the limitations of power balance and power generation capacity of the VPP.
To make the solution more practical, we incorporate various cost components into the objective function. The objective function established by the invention is a non-linear cost function, although the invention does not add the constraint of non-convexity, in a real scene, the power generation unit is generally influenced by the valve point effect, and the cost function is generally non-convex. To address these difficulties, previous work has often employed heuristic methods. The deep reinforcement learning method adopted by the user can adapt to the nonlinear non-convex condition, and the nonlinear and non-convex constraints are relaxed. In a practical economic scheduling scheme, the scheduling process should generally be completed in a short time. Due to the stochastic nature of photovoltaic and wind power generation and the flexibility of the load, the state transition from the previous time slot to the next constitutes a large state space and the state information needs to be updated quickly. The DRL, as an effective artificial intelligence algorithm, has achieved great success in many areas of problem resolution, such as the internet of things, where it can find different optimization strategies within a reasonable time frame. In the invention, the provided DRL-based algorithm relaxes the constraint of nonlinear characteristics, and improves the solving precision by fitting a value function through a deep learning algorithm. The economic scheduling problem in the invention is nonlinear, the transition probability is unknown, the state space is large and continuous, and the DRL can calculate the probability distribution of state transition without environment information. The off-line training model can be directly applied to on-line economic dispatching, and the optimal economic dispatching algorithm based on the DRL provided by the invention obviously improves the calculation efficiency.
An information delivery model for the DRL based VPP economic scheduling is shown in fig. 3. The algorithm adopts an off-line data training mode, and a power supply side server and a user side server collect historical temporary data and transmit the information to a VPP cloud server. The VPP cloud server uses the DRL to train the network independently according to the data transmitted from different areas, so that economic scheduling strategies of different areas are obtained. In an online economic dispatching stage, each proxy edge server obtains a corresponding network weight value from a VPP cloud server. The power side server and the industrial customer side server gather real-time transmission information and power requirements and then transmit all the gathered information to the corresponding proxy edge server. And the proxy edge server obtains a real-time optimal economic dispatching strategy based on the historical weight and according to the real-time state information, and feeds back the result to the servers of the two parties.
The off-line training and the on-line scheduling are respectively realized at different nodes. Firstly, completely training a model based on offline data in a cloud center. Then, the proposed DRL-based method is combined with edge calculation, and the trained model is placed at the edge node, so that the model can be applied online in a real environment. If the online and offline training environments change slightly at this time, the model trained offline can learn these changes by default and dynamically adjust the actions to achieve optimal scheduling. During online scheduling, the distributed power generation data and the demand data of the industrial users can be directly transmitted to the edge nodes without being transmitted to the cloud center, and the method is more suitable for real-time economic scheduling scenes.
For VPP we consider 24 hours, denoted by k ∈ (0, 1, ..., 23). The goal of economic scheduling is to find an optimal economic scheduling solution to minimize the operating cost of the VPP. For region i, the state is set to S i ,s i ∈S i ,
Figure BDA0002356951350000101
The power supply side server and the industrial user side server are aggregated to respectively represent the photovoltaic power, the wind power, the actual power generation capacity of the micro gas turbine, the load controlled by an industrial user and the uncontrollable load demand in a time slot k. Action set A i ,a i ∈A i ,/>
Figure BDA0002356951350000102
Respectively representing the actual power consumption of photovoltaic power generation, wind power generation and a micro gas turbine in a time slot k, and the control coefficient of a controllable load. A is a continuous motion space satisfying a power balance constraint, a i Is a selected action that satisfies the action constraint.
In any slot, we introduce a policy π in order to find the mapping from state to action. The policy represents a conditional probability distribution for each action given the current state. The next state is represented as s' i The initial state is represented as s 0 i . Namely, it is
Figure BDA0002356951350000103
In practical cases, the state transition probability is unknown, and the state space and the behavior space are continuous. When s is known i ,a i Then, a return value r related to the objective function can be obtained i (s i ,a i ). The reported value is a key component for evaluating the quality of the action and guiding the effect of the learning process. For better setting of the reward value, the reward value is set as a function related to the cost through repeated experiments, and the specific setting of the reward value is explained in detail below:
Figure BDA0002356951350000111
wherein K 1 ,K 2 ,K 3 ,K 4 Is the set weight value. The return value is negative because the cost of the virtual power plant is to be minimized. The total reported value of K hours can be obtained as follows:
Figure BDA0002356951350000112
to maximize the return value, we use the gradient ascent method to update the strategy in the proposed algorithm, i.e. we use
Figure BDA0002356951350000113
From (23), the state value function V can be obtained π (s i ) Sum state contribution function Q π (s i ,a i ) And gamma is a discount factor representing the discount rate of the return value.
Figure BDA0002356951350000114
Figure BDA0002356951350000115
The goal is to select the best strategy and maximize the state effort function, which is expressed as follows:
Figure BDA0002356951350000116
in order to find the optimal economic dispatch strategy, it is usually considered to represent the function by using a data table. However, this approach limits the scale of the reinforcement learning algorithm. When the size of the problem is too large, the storage space for storing the table may be large, and it takes a long time to accurately calculate each value in the table. If learning experience is obtained from a small training data set, the generalization capability of the training pattern is insufficient. In order to solve the above problem, a state value function and a state action value function are parameterized using a deep neural network in consideration of a large-scale state action space. In the algorithm provided by the invention, the deep neural network is used for extracting the characteristics of large-scale input state data to train economic dispatchModels, which make the trained models more generalized. Starting from the first layer of neurons, the next layer of neurons is entered by a non-linear activation function, and continues to pass down to the output layer. Since the nonlinear function is essential for the deep neural network, the deep neural network has sufficient capability to extract data features. Theta.theta. v For approximating a function V(s) of state values i ) And the state merit function Q(s) i ,a i )。
Q(s i ,a i )≈Q(s i ,a iv ) (26)
V(s i )≈V(s iv ) (27)
The deep neural network is used as a function approximator, and the parameter theta of the deep neural network is a strategy parameter. Pi obeys a Gaussian distribution and can be used to solve the problem of continuous motion space, i.e.
Figure BDA0002356951350000121
/>
The value of each slot return for each zone i is given at (20), so
Figure BDA0002356951350000122
In our scenario, to increase the probability of a policy with a higher reward value, we perform an update of the policy gradient, the gradient update calculated as:
Figure BDA0002356951350000123
wherein R is i Is the total reported value in region i and is represented by Q(s) i ,a i ) Estimation, i.e. R i ≈Q(s i ,a i )。b(s i ) Is a baseline for reducing estimation errors. V(s) i ) For estimating the baseline, i.e. b(s) i )≈V(s i )。
A π (s i ,a i ;θ,θ v )=Q π (s i ,a iv )-V π (s iv ) (31)
Equation (31) is an advantage function, representing the advantage of the action value function over the cost function. The merit function is positive if the action value function is greater than the value function, and negative if the action value function is smaller. The parameters are updated in a direction that increases the strategic probability when the dominance function is positive, and in a direction that decreases the strategic probability when the dominance function is negative. Therefore, the convergence speed of the algorithm is faster when the merit function is employed.
Figure BDA0002356951350000124
The policy gradient is updated as:
Figure BDA0002356951350000125
parameter theta v The updates of θ are:
Figure BDA0002356951350000126
Figure BDA0002356951350000127
in order to make the training strategy more adaptive and prevent premature convergence to a suboptimal deterministic strategy, entropy regularization is added to the strategy gradient, i.e.
Figure BDA0002356951350000128
Figure BDA0002356951350000129
When the neural network training is carried out, required data are independently and simultaneously distributed, in order to break the correlation between the data, an asynchronous method is adopted, a plurality of threads can be operated in parallel, and each thread has an own environment copy. During the training process, multiple threads maintain a global operator-critical network, and each thread maintains a copy of the local network weight values of the global network. The local network accumulates gradient updates and passes the gradients to the global network for parameter updates. The local network will then synchronize the parameters in the global network. The local network can not only update its own independent network by learning the environment status, but also interact with the global network. We define the global shared parameter vector as θ 'and θ' v
Figure BDA0002356951350000131
Figure BDA0002356951350000132
In this sense, each zone achieves the best economic dispatch. In the numerical part of the offline training process, we implement 8 threads, the VPP operator communicates with each region and computes C. Based on the algorithm, an economic dispatch model for region i can be obtained. In the online scheduling stage, each proxy edge server first obtains a corresponding network weight value from a VPP cloud server, i.e., proxy i. The DRL-based economic dispatch model is shown in FIG. 3.
Experimental part
To train the DRL-based economic dispatch model, we train load data from photovoltaic, wind, micro gas turbines and industrial users with an offline dataset. Fig. 4 shows the power of photovoltaic power generation and wind power generation, and the power of controllable load and uncontrollable load in a random day. Wherein the maximum power of the micro gas turbine is set to 200kw. Since the industrial load is mainly a variety of industrial processes, the power demand generally does not vary much, without particularly significant peak-to-valley differences. The periods of higher load demand are 9.00-10.00, 12.00-14.00 and 19.00-21.00, and the periods of lower load demand are 1.00-5.00. It can be seen that the photovoltaic power generation and the wind power generation have larger peak-valley difference, the peak time of the photovoltaic power generation is 10.00-16.00, and the peak time of the wind power generation is 10.00-18.00. The power of photovoltaic power generation and wind power generation in one day is random, and the power consumption of controllable load and uncontrollable load is random.
The emission costs of pollution and the operating and maintenance costs of photovoltaic, wind power generation and micro gas turbines are listed in tables 1 and 2.
TABLE 1
Figure BDA0002356951350000133
TABLE 2
Figure BDA0002356951350000134
Figure BDA0002356951350000141
The structure of the neural network in the DRL-based algorithm used in the present invention is described in detail below. The state is expressed as 5-dimensional vector expression, the finally obtained action has 4 dimensions, the action is obtained by normal distribution random sampling according to the state, and the neural network is adopted to calculate mu and sigma parameters required by normal distribution. The states are input into the mu network and the sigma network, respectively, resulting in 4-dimensional mu and sigma parameters. The mu network is composed of 2 MLP layers, the input dimension of the first layer is 5, the output dimension is h, and tan h is used for activation; the second layer inputs dimension h and outputs dimension m is activated using softplus. The sigma network also comprises 2 MLP layers, the input dimension of the first layer is 5, tan h is used for activation, the input dimension is input into a two-layer neural network, the output dimension is 4, softplus is used for activation, and in order to ensure that the sigma network does not output 0,1 multiplied by 10 is added to an output sigma vector -6 . After that time, the user can use the device,the 4-dimensional motion is sampled randomly by a positive-too distribution. Based on the state and action, the Q value is calculated by using a critic network. In the criticc network, the state is encoded by using one MLP, 5 dimensions are input, and activation is performed by using tanh. The actions are encoded using another MLP, input dimension 5, and activated using tanh. And then splicing the two coded outputs to use a linear change output fraction, wherein the final output dimension is 1. For the operator-critical network, it implements two neural networks, with a discount coefficient of 0.90 and an entropy weight of 0.01. Typically, an actor update is generated in return for critic, which is faster than the actor. The convergence speed is faster as the learning rate increases. However, a higher learning rate may result in a local optimum rather than a global optimum. Therefore, we set the learning rate to be moderate.
In the invention, numerical experiments are carried out on an 8-core CPU and 16GB memory computer. The number of threads is 8, i.e. each local operator and critical network corresponds to one sub-thread, for a total of 8 threads. The environment is asynchronously learned through the child threads, and the learning result is regularly updated to the global network. There are many random choices at the beginning of learning, but through multiple iterations, the economic dispatch model converges and selects the action that optimizes the objective. We train the optimal economic scheduling strategy using the offline data set. The main advantage of DRL is that the model can be applied online in a real environment after such offline data is fully trained. This online environment changes slightly, and the DRL model can learn about these changes and dynamically adjust the actions to achieve optimal scheduling.
In order to verify the convergence of the algorithm, 100-day data is sampled as training data, each epsilon runs for any one of 100 days, and after 4.5 ten thousand eposides are run, the model can generate the optimal action. There are 24 steps per epidemode, where each step is one hour and the iterative process is shown in fig. 5. The actions are obtained by random sampling in a normal distribution according to the state. We can see that the algorithm has a large fluctuation in the first 3 ten thousand episodies, mainly due to the randomness of policy selection, and is therefore constantly being explored. But the fluctuation interval is approximately between-300 and-400 due to the constraints of the action interval and the equality constraints. After 32000 episodies are trained, the training has a good breakthrough, as the model learns how to select the optimal action. From 35000 epistates, the model began to converge gradually. The training results show that the proposed model can minimize the cost of a fully trained VPP operator. Although there are many random choices, many iterations, at the beginning of learning, the deep reinforcement learning model can converge and learn to choose an action that is close to the optimal target value.
In a virtual power plant, compared with a micro gas turbine, photovoltaic power generation and wind power generation are lower in cost and more environment-friendly, and the training strategy mainly takes wind power photovoltaic power generation as a main strategy. The load is therefore mainly powered by wind photovoltaic and the remainder is supplemented by gas turbines or curtailed to controllable loads by demand response. Wherein, fig. 6, fig. 7 and fig. 8 are the comparison of the generated power of the wind power, the photovoltaic power and the gas turbine with the actual power consumption, the dark gray is the generated power, the light gray is the actual power consumption, the horizontal axis is time, the unit hour and the vertical axis is power. As can be seen from fig. 6 and 7, the difference between the actual power generation amount and the final power consumption of the wind power generation and the photovoltaic power generation is approximately 0, and the actual power output of the photovoltaic power generation and the wind power generation is small at 1.00 to 7.00 and 23.00 to 24.00 per hour. The load at this time needs to be powered by a micro gas turbine. As can be seen from fig. 7, 1.00-7.00 and 23.00-24.00, micro gas turbines are the main power supply units. As can be seen from fig. 9, at 20.00-24.00, this time period has a high weight to controllable load shedding, almost all shedding, due to the large electricity demand of the industrial user and the high cost of the gas turbine. Therefore, it can be concluded that, by using the algorithm proposed by the inventor to minimize the cost of the virtual power plant, the early learning stage is relatively random under the preset return value, and in the training process, the model learns the correct strategy selection along with the time, so as to minimize the cost of the virtual power plant by stably controlling the distributed power generation and the demand response.
To verify the effectiveness of the proposed method, we compared the proposed algorithm with other reinforcement learning algorithms. The method of the invention is compared with a deterministic tactical gradient algorithm (DPG) which can solve this continuous action space problem. The results are shown in fig. 10, with the light gray curve being DPG and the dark gray curve being our proposed DRL-based algorithm. Comparing the costs of DPG and our proposed DRL-based algorithm over 30 days, it can be seen from the figure that by comparing the costs of the two methods, it can be seen that the cost of our proposed method is significantly lower from day 22 onwards. Compared with the method proposed by us, the DPG uses the return value at the current moment as the unbiased estimation of the action state function under the current strategy, so that the obtained strategy has higher variance, small generalization and instability in some cases. Our proposed method uses a neural network to fit the action value state function, resulting in smaller variance by subtracting baseline. To break the correlation between data, an asynchronous update mechanism is used to create multiple parallel contexts because the parallelism will not interfere with each other, allowing the child threads to simultaneously update the parameters of the primary network in the parallel contexts.
TABLE 3
Figure BDA0002356951350000151
Figure BDA0002356951350000161
We set the epsilon to 4.5 ten thousand compared to DDPG and DPG, comparing the run times of the different methods. As can be seen from Table 3, compared with different deep reinforcement learning methods adapted to solve the economic dispatch of the virtual power plant, the time complexity of the method proposed by the inventor is the lowest. Because each epsilon time is several milliseconds, in a virtual power plant real-time economic dispatching scene, a decision can be made within several milliseconds according to state input. The traditional heuristic method needs to re-run the optimization process for each state, and the time complexity is higher.
The invention is suitable for the random characteristic of distributed renewable energy power generation and provides a VPP optimal economic scheduling algorithm based on deep reinforcement learning. We further utilize a framework based on edge computation so that the optimal scheduling solution can be achieved with lower computational complexity. The performance of the algorithm proposed by us is evaluated by using real-world meteorological and load data, and experimental results show that the DRL-based model proposed by us can successfully learn the characteristics of distributed power generation and industrial user requirements in the economic scheduling problem of the virtual power plant and learn to select actions to minimize the cost of the virtual power plant. By comparison with DPG, the method we propose has better performance. By comparison with DPG and DDPG, we propose a method with lower time complexity.
The above-described calculation examples of the present invention are merely to explain the calculation model and the calculation flow of the present invention in detail, and are not intended to limit the embodiments of the present invention. It will be apparent to those skilled in the art that other variations and modifications of the present invention can be made based on the above description, and it is not intended to be exhaustive or to limit the invention to the precise form disclosed, and all such modifications and variations are possible and contemplated as falling within the scope of the invention.

Claims (3)

1. The method for economically scheduling the virtual power plant in the energy internet based on deep reinforcement learning is characterized by comprising the following steps:
step one, collecting power generation side and user side information from an area I by using an industrial side server and a power supply side server of the area I for any area I, wherein I =1,2, \ 8230, and I are the total number of the areas;
respectively training the operator-critical network by using the information collected by each region to respectively obtain the operator-critical network trained by using the information of each region;
the objective function of the operator-critical network is as follows:
Figure FDA0004105715940000011
wherein: c is the total operating cost of the area i,
Figure FDA0004105715940000012
initial depreciation cost for the photovoltaic investment in time slot K for zone i, K =0,1, ..., K, \ h>
Figure FDA0004105715940000013
For the photovoltaic operation and maintenance costs of zone i in time slot k, <' >>
Figure FDA0004105715940000014
For the initial depreciation cost of the wind turbine for zone i in time slot k, based on the wind turbine status of the wind turbine>
Figure FDA0004105715940000015
For the wind turbine operating and maintenance costs of zone i at time slot k>
Figure FDA0004105715940000016
Based on the initial depreciation cost of the micro gas turbine in the time slot k for the area i>
Figure FDA0004105715940000017
For the micro gas turbine operating and maintenance costs in zone i at time slot k, based on>
Figure FDA0004105715940000018
Environmental protection costs of micro gas turbines in time slot k for zone i->
Figure FDA0004105715940000019
The cost of the micro gas turbine itself consumed in the time slot k for the area i, λ is the compensation factor,
Figure FDA00041057159400000110
controllable load for zone i in time slot k, x i (k) Selection of interruptible load percentage vector, x, for region i in time slot k i (k) Get (1)The value range is [0,1 ]];
The specific training process of the operator network in the operator-critical network comprises the following steps:
the actor network consists of a mu network and a sigma network, and the mu network and the sigma network consist of 2 full connection layers;
the activation functions of the 1 st full connection layer of the mu network and the sigma network are both tanh, the input dimensionality is 5, and the output dimensionality is h;
activating functions of the 2 nd full connection layer of the mu network and the sigma network are softplus, input dimensionality is h, and output dimensionality is m;
inputting the information of the power generation side and the user side into the mu network and the sigma network to obtain the output of the mu network and the sigma network; then, carrying out normal random sampling on the output of the mu network and the sigma network to obtain 4-dimensional action output by the actor network;
the specific training process of the critic network in the operator-critic network comprises the following steps:
the critic network is composed of full connection layers;
inputting the information of the power generation side and the user side and the 4-dimensional action output by the operator network into a full connection layer of the critic network, splicing the output of the full connection layer to obtain a splicing result, and performing linear transformation on the splicing result to obtain a one-dimensional return value output by the critic network;
step two, deploying the trained operator-critic networks at edge nodes of corresponding areas respectively;
and step three, the industrial side server and the power supply side server in each area collect information from the power generation side and the user side in real time, input the collected information into an operator-critical network on a corresponding edge node, and obtain decision information of each area in real time.
2. The deep reinforcement learning-based economic scheduling method for the virtual power plant in the energy internet according to claim 1, wherein in the first step, the operator-critic network of the VPP operator cloud server is trained by using the information collected in each region, an asynchronous method is adopted, and 8 threads are run in parallel.
3. The deep reinforcement learning-based economic dispatching method for virtual power plants in energy Internet according to claim 2, characterized in that the return function of the operator-critic network has the expression:
Figure FDA0004105715940000021
wherein: k 1 、K 2 、K 3 And K 4 Are weighted values.
CN202010010410.XA 2020-01-06 2020-01-06 Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet Active CN111242443B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010010410.XA CN111242443B (en) 2020-01-06 2020-01-06 Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010010410.XA CN111242443B (en) 2020-01-06 2020-01-06 Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet

Publications (2)

Publication Number Publication Date
CN111242443A CN111242443A (en) 2020-06-05
CN111242443B true CN111242443B (en) 2023-04-18

Family

ID=70876028

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010010410.XA Active CN111242443B (en) 2020-01-06 2020-01-06 Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet

Country Status (1)

Country Link
CN (1) CN111242443B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738627B (en) * 2020-08-07 2020-11-27 中国空气动力研究与发展中心低速空气动力研究所 Wind tunnel test scheduling method and system based on deep reinforcement learning
CN112381359B (en) * 2020-10-27 2021-10-26 惠州蓄能发电有限公司 Multi-critic reinforcement learning power economy scheduling method based on data mining
CN113191680B (en) * 2021-05-21 2023-08-15 上海交通大学 Self-adaptive virtual power plant distributed architecture and economic dispatching method thereof
CN113315172B (en) * 2021-05-21 2022-09-20 华中科技大学 Distributed source load data scheduling system of electric heating comprehensive energy
CN114301909B (en) * 2021-12-02 2023-09-22 阿里巴巴(中国)有限公司 Edge distributed management and control system, method, equipment and storage medium
CN114244679A (en) * 2021-12-07 2022-03-25 国网福建省电力有限公司经济技术研究院 Layered control method for communication network of virtual power plant under cloud-edge-end architecture
CN113962390B (en) * 2021-12-21 2022-04-01 中国科学院自动化研究所 Method for constructing diversified search strategy model based on deep reinforcement learning network
CN114862177B (en) * 2022-04-29 2023-05-26 国网江苏省电力有限公司南通供电分公司 Energy storage and distribution method and system for energy interconnection
CN115062869B (en) * 2022-08-04 2022-12-09 国网山东省电力公司东营供电公司 Comprehensive energy scheduling method and system considering carbon emission
CN116111599A (en) * 2022-09-08 2023-05-12 贵州电网有限责任公司 Intelligent power grid uncertainty perception management control method based on interval prediction

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103824134A (en) * 2014-03-06 2014-05-28 河海大学 Two-stage optimized dispatching method for virtual power plant
CN108604310A (en) * 2015-12-31 2018-09-28 威拓股份有限公司 Method, controller and the system of distribution system are controlled for using neural network framework
CN109976909A (en) * 2019-03-18 2019-07-05 中南大学 Low delay method for scheduling task in edge calculations network based on study
CN110443447A (en) * 2019-07-01 2019-11-12 中国电力科学研究院有限公司 A kind of method and system learning adjustment electric power system tide based on deeply

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
MX2018000942A (en) * 2015-07-24 2018-08-09 Deepmind Tech Ltd Continuous control with deep reinforcement learning.
US11727265B2 (en) * 2019-06-27 2023-08-15 Intel Corporation Methods and apparatus to provide machine programmed creative support to a user

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103824134A (en) * 2014-03-06 2014-05-28 河海大学 Two-stage optimized dispatching method for virtual power plant
CN108604310A (en) * 2015-12-31 2018-09-28 威拓股份有限公司 Method, controller and the system of distribution system are controlled for using neural network framework
CN109976909A (en) * 2019-03-18 2019-07-05 中南大学 Low delay method for scheduling task in edge calculations network based on study
CN110443447A (en) * 2019-07-01 2019-11-12 中国电力科学研究院有限公司 A kind of method and system learning adjustment electric power system tide based on deeply

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Nat. Energy.Using peer-to-peer energy-trading platforms to incentivize prosumers to form federated power plants.《Nat. Energy》.2018,第第3卷卷全文. *
陈春武. 智能电网环境下虚拟电厂经济运行模型研究.《中国优秀硕士学位论文全文数据库》.2015,全文. *

Also Published As

Publication number Publication date
CN111242443A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
CN111242443B (en) Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet
Lin et al. Deep reinforcement learning for economic dispatch of virtual power plant in internet of energy
CN112615379B (en) Power grid multi-section power control method based on distributed multi-agent reinforcement learning
Zeng et al. A potential game approach to distributed operational optimization for microgrid energy management with renewable energy and demand response
Du et al. Distributed MPC for coordinated energy efficiency utilization in microgrid systems
Xi et al. A wolf pack hunting strategy based virtual tribes control for automatic generation control of smart grid
CN108039737B (en) Source-grid-load coordinated operation simulation system
CN106026084B (en) A kind of AGC power dynamic allocation methods based on virtual power generation clan
Zhang et al. A cyber-physical-social system with parallel learning for distributed energy management of a microgrid
Xi et al. A deep reinforcement learning algorithm for the power order optimization allocation of AGC in interconnected power grids
CN114331059A (en) Electricity-hydrogen complementary park multi-building energy supply system and coordinated scheduling method thereof
Yoldas et al. Optimal control of microgrids with multi-stage mixed-integer nonlinear programming guided $ Q $-learning algorithm
Liu et al. Optimal dispatch strategy of virtual power plants using potential game theory
Bi et al. Real-time energy management of microgrid using reinforcement learning
CN115795992A (en) Park energy Internet online scheduling method based on virtual deduction of operation situation
Yin et al. Deep Stackelberg heuristic dynamic programming for frequency regulation of interconnected power systems considering flexible energy sources
Huang et al. Distributed real-time economic dispatch for islanded microgrids with dynamic power demand
CN113869742A (en) Power dispatching system of comprehensive supply and demand side based on mobile home and critic network
CN117117878A (en) Power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning
CN110599032A (en) Deep Steinberg self-adaptive dynamic game method for flexible power supply
CN116307071A (en) Method for accessing high-proportion photovoltaic into low-voltage power distribution network
Zile Smart energy management in solar/wind power stations using artificial neural networks
Sage et al. Economic Battery Storage Dispatch with Deep Reinforcement Learning from Rule-Based Demonstrations
Hu et al. Optimal Energy Management in Microgrids Based on Reinforcement Learning
Sun et al. Distributed Optimal Scheduling of Integrated Energy Systems Based on Federated Reinforcement Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant