CN112598473A - Generator intelligent agent and quotation method based on depth certainty strategy gradient algorithm - Google Patents

Generator intelligent agent and quotation method based on depth certainty strategy gradient algorithm Download PDF

Info

Publication number
CN112598473A
CN112598473A CN202011573875.2A CN202011573875A CN112598473A CN 112598473 A CN112598473 A CN 112598473A CN 202011573875 A CN202011573875 A CN 202011573875A CN 112598473 A CN112598473 A CN 112598473A
Authority
CN
China
Prior art keywords
network
generator
current
quotation
establishing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011573875.2A
Other languages
Chinese (zh)
Inventor
朱炳铨
肖艳炜
李继红
项中明
孙珂
徐立中
裘雨音
孔飘红
黄志华
申建强
王高琴
史新红
郑亚先
杨争林
冯树海
王子恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Zhejiang Electric Power Co Ltd
China Electric Power Research Institute Co Ltd CEPRI
Huzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Zhejiang Electric Power Co Ltd
China Electric Power Research Institute Co Ltd CEPRI
Huzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Zhejiang Electric Power Co Ltd, China Electric Power Research Institute Co Ltd CEPRI, Huzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202011573875.2A priority Critical patent/CN112598473A/en
Publication of CN112598473A publication Critical patent/CN112598473A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0611Request for offers or quotes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/08Auctions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Human Resources & Organizations (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Primary Health Care (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a generator agent and a quotation method based on a depth certainty strategy gradient algorithm, wherein the agent comprises: the depth certainty strategy gradient algorithm network construction module is used for establishing an Experience playback library which consists of a depth Actor network, a depth Critic network and an Experience playback memory; and the exploratory quotation action generating module is used for establishing a market bidding model of the power generator in the electric energy, selecting quotation action according to the established market bidding model based on the result calculated by the Current Actor Network, submitting the quotation of the power generator to ISO for clearing, and storing the Current state, quotation coefficient, reward and new state of the power generator intelligent agent corresponding to the clearing into an expert Replay memory. The dynamic quotation strategy of the generator under the incomplete information is searched by a deep reinforcement learning method, and the dynamic quotation strategy is an efficient quotation decision tool and is beneficial to the generator to accurately quote in the electric power market.

Description

Generator intelligent agent and quotation method based on depth certainty strategy gradient algorithm
Technical Field
The invention relates to the power technology, in particular to a generator intelligent agent and a quotation method based on a depth certainty strategy gradient algorithm.
Background
With the emergence of the electric power spot market in the domestic market, the power generators will gradually participate in bidding of the electric power market to obtain own benefits, and in the market environment, participants always continuously optimize own bidding strategies for obtaining higher profits. At present, the electric power market in China is still in the stage of just starting, and a generator is not familiar with the market environment and needs a perfect quotation strategy theory as a guide. An efficient quotation decision tool can help decision makers and quotation staff to make a successful quotation and thereby obtain a high amount of revenue. In addition, the method is helpful for a supervisory organization of the electric power market to investigate the behavior of the generator, so as to identify the existing loopholes in the market rule and continuously improve the electric power market of China, and therefore, the method is necessary for researching the behavior of the generator in the electric power market.
However, the market information is not complete for the participants, and the participants have great difficulty in optimizing their own strategies. The traditional generator quotation strategy research method is mainly based on a game theory method which is very useful for theoretically discussing the optimal bidding strategy of market members and relatively roughly researching the bidding behavior of a generator company, but the game theory method has low practicability due to the inherent defects, so the game theory method is not suitable for researching the complete bidding strategy.
In order to simulate limited rational quotation behaviors of a generator for maximizing self income under the condition that a plurality of competitors exist in a real environment of a non-complete information power market, machine learning algorithms such as reinforcement learning based on numerical drive are increasingly adopted, and Q-learning and a deformation algorithm thereof are mostly adopted for researching the quotation strategy of the generator based on the reinforcement learning algorithm at present. The idea of this algorithm is to find the expected value of the state-action by looking up a two-dimensional Q-value table of finite scale, so it is necessary to simplify the model accordingly and to reduce the continuous state space into a finite number of state intervals. Based on the above reasons, the size of the Q-value table has a large influence on the optimization capability of the Q-learning algorithm, and as the number of states considered in the model increases or the state interval decreases, the scale of the Q-value table increases exponentially, which is likely to cause dimension disaster.
Disclosure of Invention
The invention aims to provide a generator intelligent agent and a quotation method based on a depth certainty strategy gradient algorithm, and aims to solve the technical problem that quotation coefficients of the generator intelligent agent are discontinuous in the prior art. The invention considers that complete information in the market can not be obtained in the actual situation, and the generator can give the optimal quotation of continuous values by utilizing deep learning and reinforcement learning without knowing strategies of other people and unit cost parameters of other people.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the invention provides a generator pricing method based on a depth certainty strategy gradient algorithm, which comprises the following steps:
establishing a deep deterministic strategy gradient algorithm Network consisting of a Current critical Network, a Target critical Network, a Current Actor Network, a Target Actor Network and an Experience Replay memory, and initializing Network parameters;
establishing a market bidding model of the generator in the electric energy, selecting a quotation action based on the result calculated by the Current Actor Network according to the established market bidding model, submitting the quotation of the generator to ISO for clearing, and storing the Current state, quotation coefficient, reward and new state of the generator intelligent agent corresponding to the clearing into an Experience Replay memory.
The invention further improves the following steps: further comprising the steps of:
and when the data stored in the Experience Replay memory is full, randomly extracting batch sample data from the Experience Replay memory to update the Network parameters of the Current Critic Network and the Current Actor Network.
The invention further improves the following steps: the updating of the Network parameters of the Current critical Network and the Current Actor Network specifically includes:
and sending the extracted sample data into an optimizer, carrying out gradient descent training on the Network parameters of the Current critical Network and the Current Actor Network by the optimizer according to the principle of minimizing the loss function, and finishing updating the Network parameters of the Current critical Network and the Current Actor Network after the training is finished.
The invention further improves the following steps: the step of establishing a deep deterministic strategy gradient algorithm Network composed of a Current critical Network, a Target critical Network, a Current Actor Network, a Target Actor Network and an expert Replay memory, and initializing Network parameters specifically comprises:
establishing a Current Actor Network and a Target Actor Network neural Network, wherein Network parameters of the neural networks are respectively marked as thetaaAnd thetaa'; establishing a Current crystalline Network and a Target crystalline Network neural Network, and recording Network parameters as thetacAnd thetac'; establishing an Experience Replay memory for storing sample data obtained after the price quoted by the intelligent agent of the power generator;
setting an input state vector of a generator agent intelligent agent as a market clearing price, and setting the output limit of an Actor Network according to the upper limit of a price reporting coefficient of a generator agent;
the network parameter thetaaAnd thetacRandomly initializing and making thetaa’=θa,θc’=θc
The invention further improves the following steps: the method comprises the following steps of establishing a market bidding model of a generator in electric energy, selecting a quotation action according to the established market bidding model and a result calculated based on a Current Actor Network, submitting the quotation of the generator to ISO for clearing, and storing the Current state, quotation coefficient, reward and new state of a generator intelligent agent corresponding to the clearing into an Experience Replay memory, and specifically comprises the following steps:
establishing a generator set electric energy quotation model;
establishing a market bidding model of a generator in electric energy according to the electric energy quotation model of the generator set;
selecting a quotation action based on a depth certainty strategy gradient algorithm;
submitting the quoted price to ISO for clearing;
the electricity price of the generator node and the amount of winning bid are fed back by the ISO;
and storing the current state, the quotation coefficient, the reward and the new state of the generator intelligent agent corresponding to the quotation as one piece of data in an Experience Replay memory, and updating the current state.
In a second aspect, the present invention further provides a generator agent based on a depth-deterministic strategy gradient algorithm, including:
the depth certainty strategy gradient algorithm Network construction module is used for establishing a depth certainty strategy gradient algorithm Network consisting of a Current Critic Network, a Target Critic Network, a Current Actor Network, a Target Actor Network and an Experience Replay memory;
and the exploratory quotation action generating module is used for establishing a market bidding model of the power generator in the electric energy, selecting quotation action according to the established market bidding model based on the result calculated by the Current Actor Network, submitting the quotation of the power generator to ISO for clearing, and storing the Current state, quotation coefficient, reward and new state of the power generator intelligent agent corresponding to the clearing into an expert Replay memory.
The invention further improves the following steps: the depth certainty strategy gradient algorithm network construction module specifically comprises:
the depth Actor Network is used for establishing a Current Actor Network and a Target Actor Network neural Network; the quotation coefficient calculated by the Current Actor Network is used as a reference when the intelligent agent actually selects the quotation coefficient, and the quotation coefficient calculated by the Target Actor Network is used for estimating the action selected by the intelligent agent in the future state when training the Critic Network;
the deep Critic Network is used for establishing a Current Critic Network and a Target Critic Network neural Network; the Current critical Network is used for estimating a Q value of an action in the Current state and providing a gradient when the Current Actor Network is trained, and the Target critical Network is used for providing a Q value of an action taken by an intelligent agent represented by the Target Actor Network in the future state when the Current critical Network is trained;
the system comprises an Experience Replay memory unit, a data storage unit and a data processing unit, wherein the Experience Replay memory unit is used for storing sample data obtained after the intelligent agent of the power generator quotes, and each sample data records four information of the current state, the quote coefficient, the reward and the new state of the intelligent agent of the power generator.
The invention further improves the following steps: the exploratory quotation action generation module specifically comprises:
the generator set electric energy quotation model establishing unit is used for establishing a generator set electric energy quotation model;
the bidding model establishing unit is used for establishing a market bidding model of the generator in the electric energy according to the electric energy bidding model of the generator set;
an action selection unit for selecting a quote action based on a depth-deterministic policy gradient algorithm;
the quotation submitting unit is used for submitting quotations to ISO for clearing;
and the profit calculation unit is used for calculating the profit r according to the electricity price of the generator node fed back by the ISO and the winning bid amount.
The invention further improves the following steps: further comprising:
and the deep deterministic strategy gradient algorithm training module is used for randomly extracting sample data stored in an expert Replay memory and updating Network parameters of a Current critical Network and a Current Actor Network.
The invention further improves the following steps: the depth certainty strategy gradient algorithm training module specifically comprises:
the Current Network updating unit is used for updating the parameter of the Current critical Network and the parameter of the Current Actor Network according to the sample data extracted from the Experience Replay memory:
and the Target Network updating unit is used for copying the parameters of the Current critical Network and the Current Actor Network to the corresponding Target critical Network and the Target Actor Network according to the set time.
Compared with the prior art, the invention has the following beneficial effects:
the method considers that complete information in the market cannot be obtained in the actual situation, a generator does not need to know strategies of others and unit cost parameters of others, and continuous optimal quotation is selected by deep learning and reinforcement learning.
Drawings
FIG. 1 is a block schematic diagram of one embodiment of a generator agent based on a depth-deterministic policy gradient algorithm provided by the present invention;
FIG. 2 is a five-machine five-node network topology diagram;
FIG. 3 is a schematic diagram of the relationship between the generator profit and the clearing times in the generator quotation strategy learning process;
fig. 4 is a schematic diagram of a relationship between a generator quote price coefficient and a clearing number in a generator quote strategy learning process.
Detailed Description
The Deep Deterministic Policy Gradient algorithm (Deep Deterministic Policy Gradient) is an algorithm which combines Deep learning and reinforcement learning and directly learns an action Policy from high-dimensional original data based on a data-driven thought, so that the defect that a Q-learning algorithm has to simplify a model due to a dimension disaster problem is overcome. Compared with Q-learning, the deep certainty strategy gradient algorithm can establish a market model of a multidimensional continuous state space, can process clearing prices under continuous quoted prices of power generators, can take other environment variables such as system total load requirements which have large influence on the gains of the power generators and the like as input conditions, and can utilize the strong fitting capacity of a neural network and deep learning to make a decision of higher gains of a bidding model of the power generators. With the gradual expansion of market scale, the market environment becomes more complex, so that the research on the generator quotation method of the deep deterministic strategy gradient algorithm has important promoting significance for the construction of the spot market.
Example 1
Referring to fig. 1, the present embodiment provides a generator agent based on a depth-deterministic strategy gradient algorithm, including: the system comprises a depth certainty strategy gradient algorithm network construction module, an exploratory quotation action generation module and a depth certainty strategy gradient algorithm training module.
The deep deterministic strategy gradient algorithm Network construction module is used for establishing a deep Actor Network consisting of a Current Actor Network and a Target Actor Network, a deep criticic Network consisting of a Current criticic Network and a Target criticic Network and an Experience playback library consisting of an Experience playback memory, wherein the input state vector is a market clearing price, and the output action is a quotation coefficient of a power generator, and the initialization is carried out.
And the exploratory quotation action generation module is used for establishing a market bidding model of the generator on the electric energy and generating a quotation coefficient for increasing certain noise according to the quotation coefficient of the generator generated by the established market bidding model based on a deep Actor network of a depth deterministic strategy gradient algorithm.
And the deep deterministic strategy gradient algorithm training module is used for randomly extracting parameters of an Online Network and a Target Network of a deep Actor Network and a deep Critic Network in the sample data training deep deterministic strategy gradient algorithm Network construction module stored in the Experience Replay memory. Each module is described in detail below.
The depth certainty strategy gradient algorithm network construction module specifically comprises:
depth ActorThe Network is used for establishing a Current Actor Network and a Target Actor Network neural Network, and Network parameters of the networks are respectively marked as thetaaAnd thetaa', Current Actor Network and Target Actor Network output the quotation coefficient of the power generation provider intelligent agent according to the input state vector. The quotation coefficient calculated by the Target Actor Network is used for estimating the action selected by the intelligent agent in the future state when the Critic Network in the training algorithm is used;
the deep Critic Network is used for establishing a Current Critic Network and a Target Critic Network neural Network, and the Network parameters are marked as thetacAnd thetac', Current critical Network and Target critical Network output the Q value corresponding to the action value under the corresponding state vector according to the input state vector and action value. The system comprises a Current Critic Network, a Target Critic Network and an intelligent agent, wherein the Current Critic Network is used for estimating a Q value of an action in the Current state and providing a gradient when the Current Critic Network is trained, and the Target Critic Network is used for providing a Q value of an action taken by the intelligent agent represented by the Target Critic Network in the future state when the Current Critic Network is trained;
the system comprises an Experience Replay memory unit, a data storage unit and a data processing unit, wherein the Experience Replay memory unit is used for storing sample data obtained after the intelligent agent of the power generator quotes, and each sample data records four information of the current state, the quote coefficient, the reward and the new state of the intelligent agent of the power generator.
The exploratory quotation action generation module specifically comprises: the electric energy pricing model building unit of the generator set, the bidding model building unit, the action selecting unit, the pricing submitting unit and the profit calculating unit.
The generator set electric energy quotation model establishing unit is used for establishing a generator set electric energy quotation model:
Figure BDA0002861548020000071
Figure BDA0002861548020000072
Figure BDA0002861548020000073
in the formula: ci(PGi) A fuel cost function for generator i; pGiThe output of the generator i; a isi、bi、ciA first-order coefficient, a second-order coefficient and a constant-term coefficient of the fuel cost respectively, wherein G represents a power generation quotient set;
Figure BDA0002861548020000074
a marginal cost function for generator i; p (P)Gi) An electric energy bidding curve for the generator i; p is a radical ofiThe electric energy bidding coefficient submitted for the generator i;
the bidding model establishing unit is used for establishing a market bidding model of the power generator in the electric energy according to the electric energy bidding model of the power generator set:
Figure BDA0002861548020000081
Figure BDA0002861548020000082
in the formula: f. ofGFor the profit of the generator i, λeFor clearing price of electric energy, kiThe quotation coefficient of the generator i in the electric energy market; k is a radical ofimin、kimaxThe minimum value and the maximum value of the electric energy quotation coefficient of the generator i are respectively; pDkThe load requirement of the kth user; l is a network node set; pGimin、PGimaxRespectively the upper and lower technical output limits, f, of the generator iISOThe sum of the costs quoted for all generators;
the action selection unit is used for selecting the quotation action based on a depth certainty strategy gradient algorithm, and the specific selection method is as follows:
based on a normal distribution noise algorithm, calculating a quotation coefficient a in the Current state by using a Current Actor Network0Then, a normally distributed noise is superimposed on the base to obtain the quotation coefficient ki=a0+ X, wherein X-N (μ, σ)2) μ ═ 0, and σ is an adjustable quantity;
and the quotation submitting unit is used for submitting quotations to ISO for clearing, wherein the ISO market clearing model is as follows:
Figure BDA0002861548020000083
Figure BDA0002861548020000084
in the formula: pGiThe output of the generator i; a isi、biRespectively a first-order coefficient and a second-order coefficient of the fuel cost, wherein L is a network node set; branch is a branch set; lambda [ alpha ]elClearing the price for the market of the node l; pDhThe load requirement of the h-th user; xijThe reactance value for branch ij; thetai、θjPhase angles corresponding to the nodes i and j respectively; pijmaxFor the current limit of line ij, PGimin、PGimaxRespectively the upper and lower technical output limits of the generator i;
and the profit calculation unit is used for calculating the profit r according to the electricity price of the generator node fed back by the ISO and the winning bid amount:
Figure BDA0002861548020000091
in the formula, λeThe node electricity price of the node i.
The depth certainty strategy gradient algorithm training module specifically comprises: a Current Network updating unit and a Target Network updating unit.
A Current Network updating unit for updating the Current Network according to the slave Experience Replay meUpdating parameter theta of Current critical Network by sample data extracted from memorycAnd the parameter theta of the Current Actor Networka
For the Current critical Network, its loss function is defined as follows:
L(θc)=([rn+γQn(sn+1,a'|θc')]-Qn(s,a|θc))2
wherein L (θ)c) R is a loss function, i.e. the square of the difference between the Q value of the Current critical Network output and the target Q valuenFor the benefit of the power generator, the Current critical Network outputs Q in the formulan(s,a|θc) That is, the generator takes the Q value corresponding to the action a in the state s, and the Target critical Network outputs Q in the formulan(sn+1,a’|θc') i.e. the generator is in the next state sn+1And taking the Q value corresponding to the action a'.
For the Current Actor Network, its loss gradient is defined as follows:
L(θa)=-Qn(s,a|θc)
wherein L (θ)a) Is a loss function. The loss function is originally defined by using the Q value estimated by the Current critical Network to calculate the gradient of the action a, but the calculation is more complex. The purpose of the Current Actor Network training is to select the action with the maximum Q value in the state s, and the method of minimizing-Q is mostly adopted in practical application to realize the training. Q in the formulan(s,a|θc) And outputting by a Current critical Network.
The Target Network updating unit is used for updating parameters of a Target critical Network and a Target Actor Network; the parameters of the Target Critic Network and the Target Actor Network are respectively identical to the parameters of the Current Critic Network and the Current Actor Network, the only difference is that the updating frequency of the parameters is different, and each training of the Current Network can update the Network parameters theta of the Current NetworkcAnd thetaaAnd the Target Network updating unit copies the parameters of the Current critical Network and the Current Actor Network at set time intervalsAnd assigning the parameters to the corresponding Target critical Network and Target Actor Network.
Example 2
The embodiment also provides a generator pricing method based on the depth certainty strategy gradient algorithm, which comprises the following steps:
(1) establishing a deep deterministic strategy gradient algorithm Network consisting of a Current critical Network, a Target critical Network, a Current Actor Network, a Target Actor Network and an Experience Replay memory, and initializing Network parameters;
(2) establishing a bidding model of a generator in an electric energy market, selecting a quotation action based on the result calculated by a Current Actor Network according to the established market bidding model, submitting the quotation of the generator to ISO for clearing, and storing the Current state, the quotation coefficient, the reward and the new state of the intelligent agent of the generator into an expert Replay memory;
(3) and when the data stored in the Experience Replay memory is full, randomly extracting batch sample data and training the deep deterministic strategy gradient algorithm network.
The step (1) specifically comprises the following steps:
(1-1) establishing a Current Actor Network and a Target Actor Network neural Network, wherein Network parameters of the neural networks are respectively marked as thetaaAnd thetaa' establishing Current crystalline Network and Target crystalline Network neural Network, and recording the Network parameters as thetacAnd thetac' establishing an Experience Replay memory for storing sample data obtained after the intelligent agent of the power generator quotes;
(1-2) setting an input state vector of a generator agent as a market clearing price, and setting the limit of the Actor Network output according to the upper limit of a price reporting coefficient of a generator agent;
and (1-3) randomly initializing the 4 network parameters in the step (1-1).
The step (2) specifically comprises the following steps:
(2-1) establishing a generator set electric energy quotation model:
Figure BDA0002861548020000111
Figure BDA0002861548020000112
Figure BDA0002861548020000113
in the formula: ci(PGi) A fuel cost function for generator i; pGiThe output of the generator i; a isi、bi、ciA first-order coefficient, a second-order coefficient and a constant-term coefficient of the fuel cost respectively, wherein G represents a power generation quotient set;
Figure BDA0002861548020000114
a marginal cost function for generator i; p (P)Gi) An electric energy bidding curve for the generator i; p is a radical ofiThe electric energy bidding coefficient submitted for the generator i;
(2-2) establishing a market bidding model of the generator in the electric energy according to the electric energy quotation model of the generator set:
Figure BDA0002861548020000115
Figure BDA0002861548020000116
in the formula: f. ofGFor the profit of the generator i, λeFor clearing price of electric energy, kiThe quotation coefficient of the generator i in the electric energy market; k is a radical ofimin、kimaxThe minimum value and the maximum value of the electric energy quotation coefficient of the generator i are respectively; pDkThe load requirement of the kth user; l is a network node set; pGimin、PGimaxRespectively the upper and lower technical output limits, f, of the generator iISOCost quoted for all generatorsAnd;
(2-3) selecting a quotation action based on a depth certainty strategy gradient algorithm, wherein the specific selection method comprises the following steps:
based on a normal distribution noise algorithm, calculating a quotation coefficient a in the Current state by using a Current Actor Network0Then, a normally distributed noise is superimposed on the base to obtain the quotation coefficient ki=a0+ X, wherein X-N (μ, σ)2) μ is 0, and σ is an adjustable quantity, and can be gradually reduced or kept unchanged with the increase of the training times;
(2-4) submitting the quote to ISO for clearing, wherein the ISO market clearing model is as follows:
Figure BDA0002861548020000121
Figure BDA0002861548020000122
in the formula: pGiThe output of the generator i; a isi、biRespectively a first-order coefficient and a second-order coefficient of the fuel cost, wherein L is a network node set; branch is a branch set; lambda [ alpha ]elClearing the price for the market of the node l; pDhThe load requirement of the h-th user; xijThe reactance value for branch ij; thetai、θjPhase angles corresponding to the nodes i and j respectively; pijmaxFor the current limit of line ij, PGimin、PGimaxRespectively the upper and lower technical output limits of the generator i;
(2-5) the electricity price of the generator node and the amount of winning bid fed back by the ISO, and calculating the profit r:
Figure BDA0002861548020000123
in the formula, λeNode electricity price for node i;
and (2-6) storing the current state, the quotation coefficient, the reward and the new state of the generator intelligent agent corresponding to the quotation as one piece of data in an Experience Replay memory, and updating the current state.
The step (3) specifically comprises the following steps:
(3-1) when the stored data of the Experience Replay memory is full, randomly extracting batch sample data from the Experience Replay memory for updating Network parameters of a Current critical Network and a Current Actor Network;
(3-2) sending the extracted sample data to an optimizer, and automatically performing gradient descent training on Network parameters of a Current critical Network and a Current Actor Network by the optimizer according to the principle of minimizing a loss function:
for the Current critical Network, its loss function is defined as follows:
L(θc)=([rn+γQn(sn+1,a'|θc')]-Qn(s,a|θc))2
wherein L (θ)c) R is a loss function, i.e. the square of the difference between the Q value of the Current critical Network output and the target Q valuenFor the benefit of the power generator, the Current critical Network outputs Q in the formulan(s,a|θc) That is, the generator takes the Q value corresponding to the action a in the state s, and the Target critical Network outputs Q in the formulan(sn+1,a’|θc') i.e. the generator is in the next state sn+1And taking the Q value corresponding to the action a'.
For the Current Actor Network, its loss gradient is defined as follows:
L(θa)=-Qn(s,a|θc)
wherein L (θ)a) Is a loss function. The loss function is originally defined by using the Q value estimated by the Current critical Network to calculate the gradient of the action a, but the calculation is more complex. The purpose of the Current Actor Network training is to select the action with the maximum Q value in the state s, and the method of minimizing-Q is mostly adopted in practical application to realize the training. Q in the formulan(s,a|θc) And outputting by a Current critical Network.
(3-3) updating the Network parameters of the Current Network, and copying the parameters of the Current critical Network and the Current Actor Network to the Target critical Network and the Target Actor Network respectively at intervals.
(3-4) judging whether the network updating times reach the preset maximum times or not; if yes, finishing training, and if not, starting from the step (2) to continuously iteratively update the depth certainty strategy gradient algorithm network.
Example 3
The embodiment also provides a generator quotation system based on the depth certainty strategy gradient algorithm, which is applied to an electric power system, and the system comprises: a processor and a memory coupled to the processor, the memory storing a computer program which, when executed by the processor, performs the method steps of embodiment 2.
Next, a 5-machine 5-node test system is adopted, and as shown in fig. 2, simulation analysis is performed on generator behaviors in the power market. The 5-node test system contains 5 generators. G1 access node 1, rest nodes access node 2, load access node 3. The basic information of the generator is shown in table 1 and the 24 hour load demand is shown in fig. 3 and 4.
TABLE 1
Figure BDA0002861548020000141
Case setting simulation parameters are as follows: the size of the expert Replay memory is 10000, the variance sigma of normal distribution is 0.3 at the beginning of training, the variance sigma finally drops to 0 along with the training, the parameters of the Online Network are updated once every 10 times of training in the training stage, and the parameters of the Online Network are copied to the Target Network once every 10 times of updating.
Fig. 3 and 4 show the revenue curve and quote action training cases of the generator G1 agent using the deep deterministic strategy gradient algorithm training. As can be seen from the figure, the gradient algorithm of the depth deterministic strategy can converge and obtain higher yield.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (10)

1. The generator pricing method based on the depth certainty strategy gradient algorithm is characterized by comprising the following steps of:
establishing a deep deterministic strategy gradient algorithm Network consisting of a Current critical Network, a Target critical Network, a Current Actor Network, a Target Actor Network and an Experience Replay memory, and initializing Network parameters;
establishing a market bidding model of the generator in the electric energy, selecting a quotation action based on the result calculated by the Current Actor Network according to the established market bidding model, submitting the quotation of the generator to ISO for clearing, and storing the Current state, quotation coefficient, reward and new state of the generator intelligent agent corresponding to the clearing into an Experience Replay memory.
2. The generator pricing method based on the depth deterministic strategy gradient algorithm of claim 1, further comprising the steps of:
and when the data stored in the Experience Replay memory is full, randomly extracting batch sample data from the Experience Replay memory to update the Network parameters of the Current Critic Network and the Current Actor Network.
3. The generator pricing method based on the deep deterministic strategy gradient algorithm of claim 2, wherein the updating of the Network parameters of the Current critical Network and the Current Actor Network specifically comprises:
and sending the extracted sample data into an optimizer, carrying out gradient descent training on the Network parameters of the Current critical Network and the Current Actor Network by the optimizer according to the principle of minimizing the loss function, and finishing updating the Network parameters of the Current critical Network and the Current Actor Network after the training is finished.
4. The generator pricing method based on the depth deterministic policy gradient algorithm of claim 1, wherein the step of establishing a depth deterministic policy gradient algorithm Network composed of a Current critical Network, a Target critical Network, a Current Actor Network, a Target Actor Network and an expert Replay memory, and initializing Network parameters specifically comprises:
establishing a Current Actor Network and a Target Actor Network neural Network, wherein Network parameters of the neural networks are respectively marked as thetaaAnd thetaa'; establishing a Current crystalline Network and a Target crystalline Network neural Network, and recording Network parameters as thetacAnd thetac'; establishing an Experience Replay memory for storing sample data obtained after the price quoted by the intelligent agent of the power generator;
setting an input state vector of a generator agent intelligent agent as a market clearing price, and setting the output limit of an Actor Network according to the upper limit of a price reporting coefficient of a generator agent;
the network parameter thetaaAnd thetacRandomly initializing and making thetaa’=θa,θc’=θc
5. The generator pricing method based on the deep certainty strategy gradient algorithm of claim 1, wherein the method is characterized in that a market bidding model of a generator on electric energy is established, a pricing action is selected according to the established market bidding model based on a result calculated by a Current Actor Network, the pricing of the generator is submitted to ISO for clearing, and the Current state, the pricing coefficient, the reward and the new state of the generator agent corresponding to the clearing are stored in an Experience Replay memory, and specifically comprises the following steps:
establishing a generator set electric energy quotation model;
establishing a market bidding model of a generator in electric energy according to the electric energy quotation model of the generator set;
selecting a quotation action based on a depth certainty strategy gradient algorithm;
submitting the quoted price to ISO for clearing;
the electricity price of the generator node and the amount of winning bid are fed back by the ISO;
and storing the current state, the quotation coefficient, the reward and the new state of the generator intelligent agent corresponding to the quotation as one piece of data in an Experience Replay memory, and updating the current state.
6. The generator agent based on the gradient algorithm of the depth certainty strategy is characterized by comprising the following components:
the depth certainty strategy gradient algorithm Network construction module is used for establishing a depth certainty strategy gradient algorithm Network consisting of a Current Critic Network, a Target Critic Network, a Current Actor Network, a Target Actor Network and an Experience Replay memory;
and the exploratory quotation action generating module is used for establishing a market bidding model of the power generator in the electric energy, selecting quotation action according to the established market bidding model based on the result calculated by the Current Actor Network, submitting the quotation of the power generator to ISO for clearing, and storing the Current state, quotation coefficient, reward and new state of the power generator intelligent agent corresponding to the clearing into an expert Replay memory.
7. The depth-deterministic-policy-gradient-algorithm-based power generator agent of claim 6, wherein the depth-deterministic-policy-gradient-algorithm network building module specifically comprises:
the depth Actor Network is used for establishing a Current Actor Network and a Target Actor Network neural Network; the quotation coefficient calculated by the Current Actor Network is used as a reference when the intelligent agent actually selects the quotation coefficient, and the quotation coefficient calculated by the Target Actor Network is used for estimating the action selected by the intelligent agent in the future state when training the Critic Network;
the deep Critic Network is used for establishing a Current Critic Network and a Target Critic Network neural Network; the Current critical Network is used for estimating a Q value of an action in the Current state and providing a gradient when the Current Actor Network is trained, and the Target critical Network is used for providing a Q value of an action taken by an intelligent agent represented by the Target Actor Network in the future state when the Current critical Network is trained;
the system comprises an Experience Replay memory unit, a data storage unit and a data processing unit, wherein the Experience Replay memory unit is used for storing sample data obtained after the intelligent agent of the power generator quotes, and each sample data records four information of the current state, the quote coefficient, the reward and the new state of the intelligent agent of the power generator.
8. The depth-deterministic policy gradient algorithm-based generator agent according to claim 6, characterized by: the exploratory quotation action generation module specifically comprises:
the generator set electric energy quotation model establishing unit is used for establishing a generator set electric energy quotation model;
the bidding model establishing unit is used for establishing a market bidding model of the generator in the electric energy according to the electric energy bidding model of the generator set;
an action selection unit for selecting a quote action based on a depth-deterministic policy gradient algorithm;
the quotation submitting unit is used for submitting quotations to ISO for clearing;
and the profit calculation unit is used for calculating the profit r according to the electricity price of the generator node fed back by the ISO and the winning bid amount.
9. The depth-deterministic policy gradient algorithm-based generator agent according to claim 6, further comprising:
and the deep deterministic strategy gradient algorithm training module is used for randomly extracting sample data stored in an expert Replay memory and updating Network parameters of a Current critical Network and a Current Actor Network.
10. The depth-deterministic policy gradient algorithm-based generator agent according to claim 9, characterized by: the depth certainty strategy gradient algorithm training module specifically comprises:
the Current Network updating unit is used for updating the parameter of the Current critical Network and the parameter of the Current Actor Network according to the sample data extracted from the Experience Replay memory:
and the Target Network updating unit is used for copying the parameters of the Current critical Network and the Current Actor Network to the corresponding Target critical Network and the Target Actor Network according to the set time.
CN202011573875.2A 2020-12-25 2020-12-25 Generator intelligent agent and quotation method based on depth certainty strategy gradient algorithm Pending CN112598473A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011573875.2A CN112598473A (en) 2020-12-25 2020-12-25 Generator intelligent agent and quotation method based on depth certainty strategy gradient algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011573875.2A CN112598473A (en) 2020-12-25 2020-12-25 Generator intelligent agent and quotation method based on depth certainty strategy gradient algorithm

Publications (1)

Publication Number Publication Date
CN112598473A true CN112598473A (en) 2021-04-02

Family

ID=75203072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011573875.2A Pending CN112598473A (en) 2020-12-25 2020-12-25 Generator intelligent agent and quotation method based on depth certainty strategy gradient algorithm

Country Status (1)

Country Link
CN (1) CN112598473A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240459A (en) * 2021-04-27 2021-08-10 东南大学 Market member quotation method based on deep reinforcement learning algorithm and module thereof
CN113378456A (en) * 2021-05-21 2021-09-10 青海大学 Multi-park comprehensive energy scheduling method and system
TWI779732B (en) * 2021-07-21 2022-10-01 國立清華大學 Method for renewable energy bidding using multiagent transfer reinforcement learning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240459A (en) * 2021-04-27 2021-08-10 东南大学 Market member quotation method based on deep reinforcement learning algorithm and module thereof
CN113378456A (en) * 2021-05-21 2021-09-10 青海大学 Multi-park comprehensive energy scheduling method and system
TWI779732B (en) * 2021-07-21 2022-10-01 國立清華大學 Method for renewable energy bidding using multiagent transfer reinforcement learning

Similar Documents

Publication Publication Date Title
CN112598473A (en) Generator intelligent agent and quotation method based on depth certainty strategy gradient algorithm
CN113099729B (en) Deep reinforcement learning of production schedule
Shen et al. Mathematical modeling and multi-objective evolutionary algorithms applied to dynamic flexible job shop scheduling problems
Wang et al. An evolutionary game approach to analyzing bidding strategies in electricity markets with elastic demand
CN109478045A (en) Goal systems is controlled using prediction
CN113449183B (en) Interactive recommendation method and system based on offline user environment and dynamic rewards
CN111309927B (en) Personalized learning path recommendation method and system based on knowledge graph mining
CN114492675B (en) Intelligent fault cause diagnosis method for capacitor voltage transformer
CN109858798B (en) Power grid investment decision modeling method and device for correlating transformation measures with voltage indexes
CN111582903A (en) Generator intelligent agent considering electric power futures change influence and quotation method
CN115293623A (en) Training method and device for production scheduling model, electronic equipment and medium
CN116108742A (en) Low-voltage transformer area ultra-short-term load prediction method and system based on improved GRU-NP model
CN111192158A (en) Transformer substation daily load curve similarity matching method based on deep learning
Zhang et al. An energy-efficient multi-objective integrated process planning and scheduling for a flexible job-shop-type remanufacturing system
US20220269835A1 (en) Resource prediction system for executing machine learning models
Sun Design and optimization of indoor space layout based on deep learning
CN117217820A (en) Intelligent integrated prediction method and system for purchasing demand of supply chain
CN111695967A (en) Method, device, equipment and storage medium for determining quotation
CN109447231B (en) Method for solving multi-attribute bilateral matching problem under shared economic background by ant colony algorithm
CN115146455B (en) Complex supply chain multi-objective decision method supported by calculation experiment
CN108924196A (en) Industry internet green energy resource management system
CN111027709B (en) Information recommendation method and device, server and storage medium
Mohd et al. Rapid modelling of machine learning in predicting office rental price
Marchesano et al. Deep Reinforcement Learning Approach for Maintenance Planning in a Flow-Shop Scheduling Problem
CN112734286B (en) Workshop scheduling method based on multi-strategy deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination