CN112598473A

CN112598473A - Generator intelligent agent and quotation method based on depth certainty strategy gradient algorithm

Info

Publication number: CN112598473A
Application number: CN202011573875.2A
Authority: CN
Inventors: 朱炳铨; 肖艳炜; 李继红; 项中明; 孙珂; 徐立中; 裘雨音; 孔飘红; 黄志华; 申建强; 王高琴; 史新红; 郑亚先; 杨争林; 冯树海; 王子恒
Original assignee: State Grid Corp of China SGCC; State Grid Zhejiang Electric Power Co Ltd; China Electric Power Research Institute Co Ltd CEPRI; Huzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Zhejiang Electric Power Co Ltd; China Electric Power Research Institute Co Ltd CEPRI; Huzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-04-02

Abstract

The invention discloses a generator agent and a quotation method based on a depth certainty strategy gradient algorithm, wherein the agent comprises: the depth certainty strategy gradient algorithm network construction module is used for establishing an Experience playback library which consists of a depth Actor network, a depth Critic network and an Experience playback memory; and the exploratory quotation action generating module is used for establishing a market bidding model of the power generator in the electric energy, selecting quotation action according to the established market bidding model based on the result calculated by the Current Actor Network, submitting the quotation of the power generator to ISO for clearing, and storing the Current state, quotation coefficient, reward and new state of the power generator intelligent agent corresponding to the clearing into an expert Replay memory. The dynamic quotation strategy of the generator under the incomplete information is searched by a deep reinforcement learning method, and the dynamic quotation strategy is an efficient quotation decision tool and is beneficial to the generator to accurately quote in the electric power market.

Description

Generator intelligent agent and quotation method based on depth certainty strategy gradient algorithm

Technical Field

The invention relates to the power technology, in particular to a generator intelligent agent and a quotation method based on a depth certainty strategy gradient algorithm.

Background

With the emergence of the electric power spot market in the domestic market, the power generators will gradually participate in bidding of the electric power market to obtain own benefits, and in the market environment, participants always continuously optimize own bidding strategies for obtaining higher profits. At present, the electric power market in China is still in the stage of just starting, and a generator is not familiar with the market environment and needs a perfect quotation strategy theory as a guide. An efficient quotation decision tool can help decision makers and quotation staff to make a successful quotation and thereby obtain a high amount of revenue. In addition, the method is helpful for a supervisory organization of the electric power market to investigate the behavior of the generator, so as to identify the existing loopholes in the market rule and continuously improve the electric power market of China, and therefore, the method is necessary for researching the behavior of the generator in the electric power market.

However, the market information is not complete for the participants, and the participants have great difficulty in optimizing their own strategies. The traditional generator quotation strategy research method is mainly based on a game theory method which is very useful for theoretically discussing the optimal bidding strategy of market members and relatively roughly researching the bidding behavior of a generator company, but the game theory method has low practicability due to the inherent defects, so the game theory method is not suitable for researching the complete bidding strategy.

In order to simulate limited rational quotation behaviors of a generator for maximizing self income under the condition that a plurality of competitors exist in a real environment of a non-complete information power market, machine learning algorithms such as reinforcement learning based on numerical drive are increasingly adopted, and Q-learning and a deformation algorithm thereof are mostly adopted for researching the quotation strategy of the generator based on the reinforcement learning algorithm at present. The idea of this algorithm is to find the expected value of the state-action by looking up a two-dimensional Q-value table of finite scale, so it is necessary to simplify the model accordingly and to reduce the continuous state space into a finite number of state intervals. Based on the above reasons, the size of the Q-value table has a large influence on the optimization capability of the Q-learning algorithm, and as the number of states considered in the model increases or the state interval decreases, the scale of the Q-value table increases exponentially, which is likely to cause dimension disaster.

Disclosure of Invention

The invention aims to provide a generator intelligent agent and a quotation method based on a depth certainty strategy gradient algorithm, and aims to solve the technical problem that quotation coefficients of the generator intelligent agent are discontinuous in the prior art. The invention considers that complete information in the market can not be obtained in the actual situation, and the generator can give the optimal quotation of continuous values by utilizing deep learning and reinforcement learning without knowing strategies of other people and unit cost parameters of other people.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, the invention provides a generator pricing method based on a depth certainty strategy gradient algorithm, which comprises the following steps:

establishing a deep deterministic strategy gradient algorithm Network consisting of a Current critical Network, a Target critical Network, a Current Actor Network, a Target Actor Network and an Experience Replay memory, and initializing Network parameters;

establishing a market bidding model of the generator in the electric energy, selecting a quotation action based on the result calculated by the Current Actor Network according to the established market bidding model, submitting the quotation of the generator to ISO for clearing, and storing the Current state, quotation coefficient, reward and new state of the generator intelligent agent corresponding to the clearing into an Experience Replay memory.

The invention further improves the following steps: further comprising the steps of:

and when the data stored in the Experience Replay memory is full, randomly extracting batch sample data from the Experience Replay memory to update the Network parameters of the Current Critic Network and the Current Actor Network.

The invention further improves the following steps: the updating of the Network parameters of the Current critical Network and the Current Actor Network specifically includes:

and sending the extracted sample data into an optimizer, carrying out gradient descent training on the Network parameters of the Current critical Network and the Current Actor Network by the optimizer according to the principle of minimizing the loss function, and finishing updating the Network parameters of the Current critical Network and the Current Actor Network after the training is finished.

The invention further improves the following steps: the step of establishing a deep deterministic strategy gradient algorithm Network composed of a Current critical Network, a Target critical Network, a Current Actor Network, a Target Actor Network and an expert Replay memory, and initializing Network parameters specifically comprises:

establishing a Current Actor Network and a Target Actor Network neural Network, wherein Network parameters of the neural networks are respectively marked as theta_aAnd theta_a'; establishing a Current crystalline Network and a Target crystalline Network neural Network, and recording Network parameters as theta_cAnd theta_c'; establishing an Experience Replay memory for storing sample data obtained after the price quoted by the intelligent agent of the power generator;

setting an input state vector of a generator agent intelligent agent as a market clearing price, and setting the output limit of an Actor Network according to the upper limit of a price reporting coefficient of a generator agent;

the network parameter theta_aAnd theta_cRandomly initializing and making theta_a’＝θ_a，θ_c’＝θ_c。

The invention further improves the following steps: the method comprises the following steps of establishing a market bidding model of a generator in electric energy, selecting a quotation action according to the established market bidding model and a result calculated based on a Current Actor Network, submitting the quotation of the generator to ISO for clearing, and storing the Current state, quotation coefficient, reward and new state of a generator intelligent agent corresponding to the clearing into an Experience Replay memory, and specifically comprises the following steps:

establishing a generator set electric energy quotation model;

establishing a market bidding model of a generator in electric energy according to the electric energy quotation model of the generator set;

selecting a quotation action based on a depth certainty strategy gradient algorithm;

submitting the quoted price to ISO for clearing;

the electricity price of the generator node and the amount of winning bid are fed back by the ISO;

and storing the current state, the quotation coefficient, the reward and the new state of the generator intelligent agent corresponding to the quotation as one piece of data in an Experience Replay memory, and updating the current state.

In a second aspect, the present invention further provides a generator agent based on a depth-deterministic strategy gradient algorithm, including:

the depth certainty strategy gradient algorithm Network construction module is used for establishing a depth certainty strategy gradient algorithm Network consisting of a Current Critic Network, a Target Critic Network, a Current Actor Network, a Target Actor Network and an Experience Replay memory;

and the exploratory quotation action generating module is used for establishing a market bidding model of the power generator in the electric energy, selecting quotation action according to the established market bidding model based on the result calculated by the Current Actor Network, submitting the quotation of the power generator to ISO for clearing, and storing the Current state, quotation coefficient, reward and new state of the power generator intelligent agent corresponding to the clearing into an expert Replay memory.

The invention further improves the following steps: the depth certainty strategy gradient algorithm network construction module specifically comprises:

the depth Actor Network is used for establishing a Current Actor Network and a Target Actor Network neural Network; the quotation coefficient calculated by the Current Actor Network is used as a reference when the intelligent agent actually selects the quotation coefficient, and the quotation coefficient calculated by the Target Actor Network is used for estimating the action selected by the intelligent agent in the future state when training the Critic Network;

the deep Critic Network is used for establishing a Current Critic Network and a Target Critic Network neural Network; the Current critical Network is used for estimating a Q value of an action in the Current state and providing a gradient when the Current Actor Network is trained, and the Target critical Network is used for providing a Q value of an action taken by an intelligent agent represented by the Target Actor Network in the future state when the Current critical Network is trained;

the system comprises an Experience Replay memory unit, a data storage unit and a data processing unit, wherein the Experience Replay memory unit is used for storing sample data obtained after the intelligent agent of the power generator quotes, and each sample data records four information of the current state, the quote coefficient, the reward and the new state of the intelligent agent of the power generator.

The invention further improves the following steps: the exploratory quotation action generation module specifically comprises:

the generator set electric energy quotation model establishing unit is used for establishing a generator set electric energy quotation model;

the bidding model establishing unit is used for establishing a market bidding model of the generator in the electric energy according to the electric energy bidding model of the generator set;

an action selection unit for selecting a quote action based on a depth-deterministic policy gradient algorithm;

the quotation submitting unit is used for submitting quotations to ISO for clearing;

and the profit calculation unit is used for calculating the profit r according to the electricity price of the generator node fed back by the ISO and the winning bid amount.

The invention further improves the following steps: further comprising:

and the deep deterministic strategy gradient algorithm training module is used for randomly extracting sample data stored in an expert Replay memory and updating Network parameters of a Current critical Network and a Current Actor Network.

The invention further improves the following steps: the depth certainty strategy gradient algorithm training module specifically comprises:

the Current Network updating unit is used for updating the parameter of the Current critical Network and the parameter of the Current Actor Network according to the sample data extracted from the Experience Replay memory:

and the Target Network updating unit is used for copying the parameters of the Current critical Network and the Current Actor Network to the corresponding Target critical Network and the Target Actor Network according to the set time.

Compared with the prior art, the invention has the following beneficial effects:

the method considers that complete information in the market cannot be obtained in the actual situation, a generator does not need to know strategies of others and unit cost parameters of others, and continuous optimal quotation is selected by deep learning and reinforcement learning.

Drawings

FIG. 1 is a block schematic diagram of one embodiment of a generator agent based on a depth-deterministic policy gradient algorithm provided by the present invention;

FIG. 2 is a five-machine five-node network topology diagram;

FIG. 3 is a schematic diagram of the relationship between the generator profit and the clearing times in the generator quotation strategy learning process;

fig. 4 is a schematic diagram of a relationship between a generator quote price coefficient and a clearing number in a generator quote strategy learning process.

Detailed Description

The Deep Deterministic Policy Gradient algorithm (Deep Deterministic Policy Gradient) is an algorithm which combines Deep learning and reinforcement learning and directly learns an action Policy from high-dimensional original data based on a data-driven thought, so that the defect that a Q-learning algorithm has to simplify a model due to a dimension disaster problem is overcome. Compared with Q-learning, the deep certainty strategy gradient algorithm can establish a market model of a multidimensional continuous state space, can process clearing prices under continuous quoted prices of power generators, can take other environment variables such as system total load requirements which have large influence on the gains of the power generators and the like as input conditions, and can utilize the strong fitting capacity of a neural network and deep learning to make a decision of higher gains of a bidding model of the power generators. With the gradual expansion of market scale, the market environment becomes more complex, so that the research on the generator quotation method of the deep deterministic strategy gradient algorithm has important promoting significance for the construction of the spot market.

Example 1

Referring to fig. 1, the present embodiment provides a generator agent based on a depth-deterministic strategy gradient algorithm, including: the system comprises a depth certainty strategy gradient algorithm network construction module, an exploratory quotation action generation module and a depth certainty strategy gradient algorithm training module.

The deep deterministic strategy gradient algorithm Network construction module is used for establishing a deep Actor Network consisting of a Current Actor Network and a Target Actor Network, a deep criticic Network consisting of a Current criticic Network and a Target criticic Network and an Experience playback library consisting of an Experience playback memory, wherein the input state vector is a market clearing price, and the output action is a quotation coefficient of a power generator, and the initialization is carried out.

And the exploratory quotation action generation module is used for establishing a market bidding model of the generator on the electric energy and generating a quotation coefficient for increasing certain noise according to the quotation coefficient of the generator generated by the established market bidding model based on a deep Actor network of a depth deterministic strategy gradient algorithm.

And the deep deterministic strategy gradient algorithm training module is used for randomly extracting parameters of an Online Network and a Target Network of a deep Actor Network and a deep Critic Network in the sample data training deep deterministic strategy gradient algorithm Network construction module stored in the Experience Replay memory. Each module is described in detail below.

The depth certainty strategy gradient algorithm network construction module specifically comprises:

depth ActorThe Network is used for establishing a Current Actor Network and a Target Actor Network neural Network, and Network parameters of the networks are respectively marked as theta_aAnd theta_a', Current Actor Network and Target Actor Network output the quotation coefficient of the power generation provider intelligent agent according to the input state vector. The quotation coefficient calculated by the Target Actor Network is used for estimating the action selected by the intelligent agent in the future state when the Critic Network in the training algorithm is used;

the deep Critic Network is used for establishing a Current Critic Network and a Target Critic Network neural Network, and the Network parameters are marked as theta_cAnd theta_c', Current critical Network and Target critical Network output the Q value corresponding to the action value under the corresponding state vector according to the input state vector and action value. The system comprises a Current Critic Network, a Target Critic Network and an intelligent agent, wherein the Current Critic Network is used for estimating a Q value of an action in the Current state and providing a gradient when the Current Critic Network is trained, and the Target Critic Network is used for providing a Q value of an action taken by the intelligent agent represented by the Target Critic Network in the future state when the Current Critic Network is trained;

The exploratory quotation action generation module specifically comprises: the electric energy pricing model building unit of the generator set, the bidding model building unit, the action selecting unit, the pricing submitting unit and the profit calculating unit.

The generator set electric energy quotation model establishing unit is used for establishing a generator set electric energy quotation model:

in the formula: c_i(P_Gi) A fuel cost function for generator i; p_GiThe output of the generator i; a is_i、b_i、c_iA first-order coefficient, a second-order coefficient and a constant-term coefficient of the fuel cost respectively, wherein G represents a power generation quotient set;

a marginal cost function for generator i; p (P)_Gi) An electric energy bidding curve for the generator i; p is a radical of_iThe electric energy bidding coefficient submitted for the generator i;

the bidding model establishing unit is used for establishing a market bidding model of the power generator in the electric energy according to the electric energy bidding model of the power generator set:

in the formula: f. of_GFor the profit of the generator i, λ_eFor clearing price of electric energy, k_iThe quotation coefficient of the generator i in the electric energy market; k is a radical of_imin、k_imaxThe minimum value and the maximum value of the electric energy quotation coefficient of the generator i are respectively; p_DkThe load requirement of the kth user; l is a network node set; p_Gimin、P_GimaxRespectively the upper and lower technical output limits, f, of the generator i_ISOThe sum of the costs quoted for all generators;

the action selection unit is used for selecting the quotation action based on a depth certainty strategy gradient algorithm, and the specific selection method is as follows:

based on a normal distribution noise algorithm, calculating a quotation coefficient a in the Current state by using a Current Actor Network₀Then, a normally distributed noise is superimposed on the base to obtain the quotation coefficient k_i＝a₀+ X, wherein X-N (μ, σ)²) μ ═ 0, and σ is an adjustable quantity;

and the quotation submitting unit is used for submitting quotations to ISO for clearing, wherein the ISO market clearing model is as follows:

in the formula: p_GiThe output of the generator i; a is_i、b_iRespectively a first-order coefficient and a second-order coefficient of the fuel cost, wherein L is a network node set; branch is a branch set; lambda [ alpha ]_elClearing the price for the market of the node l; p_DhThe load requirement of the h-th user; x_ijThe reactance value for branch ij; theta_i、θ_jPhase angles corresponding to the nodes i and j respectively; p_ijmaxFor the current limit of line ij, P_Gimin、P_GimaxRespectively the upper and lower technical output limits of the generator i;

and the profit calculation unit is used for calculating the profit r according to the electricity price of the generator node fed back by the ISO and the winning bid amount:

in the formula, λ_eThe node electricity price of the node i.

The depth certainty strategy gradient algorithm training module specifically comprises: a Current Network updating unit and a Target Network updating unit.

A Current Network updating unit for updating the Current Network according to the slave Experience Replay meUpdating parameter theta of Current critical Network by sample data extracted from memory_cAnd the parameter theta of the Current Actor Network_a：

For the Current critical Network, its loss function is defined as follows:

L(θ_c)＝([r_n+γQ_n(s_n+1,a'|θ_c')]-Q_n(s,a|θ_c))²

wherein L (θ)_c) R is a loss function, i.e. the square of the difference between the Q value of the Current critical Network output and the target Q value_nFor the benefit of the power generator, the Current critical Network outputs Q in the formula_n(s,a|θ_c) That is, the generator takes the Q value corresponding to the action a in the state s, and the Target critical Network outputs Q in the formula_n(s_n+1,a’|θ_c') i.e. the generator is in the next state s_n+1And taking the Q value corresponding to the action a'.

For the Current Actor Network, its loss gradient is defined as follows:

L(θ_a)＝-Q_n(s,a|θ_c)

wherein L (θ)_a) Is a loss function. The loss function is originally defined by using the Q value estimated by the Current critical Network to calculate the gradient of the action a, but the calculation is more complex. The purpose of the Current Actor Network training is to select the action with the maximum Q value in the state s, and the method of minimizing-Q is mostly adopted in practical application to realize the training. Q in the formula_n(s,a|θ_c) And outputting by a Current critical Network.

The Target Network updating unit is used for updating parameters of a Target critical Network and a Target Actor Network; the parameters of the Target Critic Network and the Target Actor Network are respectively identical to the parameters of the Current Critic Network and the Current Actor Network, the only difference is that the updating frequency of the parameters is different, and each training of the Current Network can update the Network parameters theta of the Current Network_cAnd theta_aAnd the Target Network updating unit copies the parameters of the Current critical Network and the Current Actor Network at set time intervalsAnd assigning the parameters to the corresponding Target critical Network and Target Actor Network.

Example 2

The embodiment also provides a generator pricing method based on the depth certainty strategy gradient algorithm, which comprises the following steps:

(1) establishing a deep deterministic strategy gradient algorithm Network consisting of a Current critical Network, a Target critical Network, a Current Actor Network, a Target Actor Network and an Experience Replay memory, and initializing Network parameters;

(2) establishing a bidding model of a generator in an electric energy market, selecting a quotation action based on the result calculated by a Current Actor Network according to the established market bidding model, submitting the quotation of the generator to ISO for clearing, and storing the Current state, the quotation coefficient, the reward and the new state of the intelligent agent of the generator into an expert Replay memory;

(3) and when the data stored in the Experience Replay memory is full, randomly extracting batch sample data and training the deep deterministic strategy gradient algorithm network.

The step (1) specifically comprises the following steps:

(1-1) establishing a Current Actor Network and a Target Actor Network neural Network, wherein Network parameters of the neural networks are respectively marked as theta_aAnd theta_a' establishing Current crystalline Network and Target crystalline Network neural Network, and recording the Network parameters as theta_cAnd theta_c' establishing an Experience Replay memory for storing sample data obtained after the intelligent agent of the power generator quotes;

(1-2) setting an input state vector of a generator agent as a market clearing price, and setting the limit of the Actor Network output according to the upper limit of a price reporting coefficient of a generator agent;

and (1-3) randomly initializing the 4 network parameters in the step (1-1).

The step (2) specifically comprises the following steps:

(2-1) establishing a generator set electric energy quotation model:

(2-2) establishing a market bidding model of the generator in the electric energy according to the electric energy quotation model of the generator set:

in the formula: f. of_GFor the profit of the generator i, λ_eFor clearing price of electric energy, k_iThe quotation coefficient of the generator i in the electric energy market; k is a radical of_imin、k_imaxThe minimum value and the maximum value of the electric energy quotation coefficient of the generator i are respectively; p_DkThe load requirement of the kth user; l is a network node set; p_Gimin、P_GimaxRespectively the upper and lower technical output limits, f, of the generator i_ISOCost quoted for all generatorsAnd;

(2-3) selecting a quotation action based on a depth certainty strategy gradient algorithm, wherein the specific selection method comprises the following steps:

based on a normal distribution noise algorithm, calculating a quotation coefficient a in the Current state by using a Current Actor Network₀Then, a normally distributed noise is superimposed on the base to obtain the quotation coefficient k_i＝a₀+ X, wherein X-N (μ, σ)²) μ is 0, and σ is an adjustable quantity, and can be gradually reduced or kept unchanged with the increase of the training times;

(2-4) submitting the quote to ISO for clearing, wherein the ISO market clearing model is as follows:

(2-5) the electricity price of the generator node and the amount of winning bid fed back by the ISO, and calculating the profit r:

in the formula, λ_eNode electricity price for node i;

and (2-6) storing the current state, the quotation coefficient, the reward and the new state of the generator intelligent agent corresponding to the quotation as one piece of data in an Experience Replay memory, and updating the current state.

The step (3) specifically comprises the following steps:

(3-1) when the stored data of the Experience Replay memory is full, randomly extracting batch sample data from the Experience Replay memory for updating Network parameters of a Current critical Network and a Current Actor Network;

(3-2) sending the extracted sample data to an optimizer, and automatically performing gradient descent training on Network parameters of a Current critical Network and a Current Actor Network by the optimizer according to the principle of minimizing a loss function:

for the Current critical Network, its loss function is defined as follows:

L(θ_c)＝([r_n+γQ_n(s_n+1,a'|θ_c')]-Q_n(s,a|θ_c))²

For the Current Actor Network, its loss gradient is defined as follows:

L(θ_a)＝-Q_n(s,a|θ_c)

(3-3) updating the Network parameters of the Current Network, and copying the parameters of the Current critical Network and the Current Actor Network to the Target critical Network and the Target Actor Network respectively at intervals.

(3-4) judging whether the network updating times reach the preset maximum times or not; if yes, finishing training, and if not, starting from the step (2) to continuously iteratively update the depth certainty strategy gradient algorithm network.

Example 3

The embodiment also provides a generator quotation system based on the depth certainty strategy gradient algorithm, which is applied to an electric power system, and the system comprises: a processor and a memory coupled to the processor, the memory storing a computer program which, when executed by the processor, performs the method steps of embodiment 2.

Next, a 5-machine 5-node test system is adopted, and as shown in fig. 2, simulation analysis is performed on generator behaviors in the power market. The 5-node test system contains 5 generators. G1 access node 1, rest nodes access node 2, load access node 3. The basic information of the generator is shown in table 1 and the 24 hour load demand is shown in fig. 3 and 4.

TABLE 1

Case setting simulation parameters are as follows: the size of the expert Replay memory is 10000, the variance sigma of normal distribution is 0.3 at the beginning of training, the variance sigma finally drops to 0 along with the training, the parameters of the Online Network are updated once every 10 times of training in the training stage, and the parameters of the Online Network are copied to the Target Network once every 10 times of updating.

Fig. 3 and 4 show the revenue curve and quote action training cases of the generator G1 agent using the deep deterministic strategy gradient algorithm training. As can be seen from the figure, the gradient algorithm of the depth deterministic strategy can converge and obtain higher yield.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. The generator pricing method based on the depth certainty strategy gradient algorithm is characterized by comprising the following steps of:

2. The generator pricing method based on the depth deterministic strategy gradient algorithm of claim 1, further comprising the steps of:

3. The generator pricing method based on the deep deterministic strategy gradient algorithm of claim 2, wherein the updating of the Network parameters of the Current critical Network and the Current Actor Network specifically comprises:

4. The generator pricing method based on the depth deterministic policy gradient algorithm of claim 1, wherein the step of establishing a depth deterministic policy gradient algorithm Network composed of a Current critical Network, a Target critical Network, a Current Actor Network, a Target Actor Network and an expert Replay memory, and initializing Network parameters specifically comprises:

5. The generator pricing method based on the deep certainty strategy gradient algorithm of claim 1, wherein the method is characterized in that a market bidding model of a generator on electric energy is established, a pricing action is selected according to the established market bidding model based on a result calculated by a Current Actor Network, the pricing of the generator is submitted to ISO for clearing, and the Current state, the pricing coefficient, the reward and the new state of the generator agent corresponding to the clearing are stored in an Experience Replay memory, and specifically comprises the following steps:

establishing a generator set electric energy quotation model;

submitting the quoted price to ISO for clearing;

6. The generator agent based on the gradient algorithm of the depth certainty strategy is characterized by comprising the following components:

7. The depth-deterministic-policy-gradient-algorithm-based power generator agent of claim 6, wherein the depth-deterministic-policy-gradient-algorithm network building module specifically comprises:

8. The depth-deterministic policy gradient algorithm-based generator agent according to claim 6, characterized by: the exploratory quotation action generation module specifically comprises:

9. The depth-deterministic policy gradient algorithm-based generator agent according to claim 6, further comprising:

10. The depth-deterministic policy gradient algorithm-based generator agent according to claim 9, characterized by: the depth certainty strategy gradient algorithm training module specifically comprises: