CN114123178B - Multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method - Google Patents
Multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method Download PDFInfo
- Publication number
- CN114123178B CN114123178B CN202111364422.3A CN202111364422A CN114123178B CN 114123178 B CN114123178 B CN 114123178B CN 202111364422 A CN202111364422 A CN 202111364422A CN 114123178 B CN114123178 B CN 114123178B
- Authority
- CN
- China
- Prior art keywords
- power grid
- power
- agent
- environment
- agents
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 230000002787 reinforcement Effects 0.000 title claims abstract description 18
- 238000005192 partition Methods 0.000 title claims abstract description 9
- 230000009471 action Effects 0.000 claims abstract description 47
- 230000006870 function Effects 0.000 claims abstract description 19
- 238000004088 simulation Methods 0.000 claims abstract description 11
- 238000012549 training Methods 0.000 claims abstract description 8
- 230000003993 interaction Effects 0.000 claims abstract description 6
- 238000003062 neural network model Methods 0.000 claims abstract description 5
- 239000003795 chemical substances by application Substances 0.000 claims description 109
- 230000006854 communication Effects 0.000 claims description 29
- 238000004891 communication Methods 0.000 claims description 28
- 230000007613 environmental effect Effects 0.000 claims description 19
- 238000013528 artificial neural network Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- 230000005540 biological transmission Effects 0.000 claims description 6
- 238000012986 modification Methods 0.000 claims description 3
- 230000004048 modification Effects 0.000 claims description 3
- 230000006399 behavior Effects 0.000 claims 1
- 238000006243 chemical reaction Methods 0.000 claims 1
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000010248 power generation Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2113/00—Details relating to the application field
- G06F2113/04—Power grid distribution networks
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/10—Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/20—Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Power Engineering (AREA)
- Supply And Distribution Of Alternating Current (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method, which comprises the following steps: dividing a power grid into N areas according to the operation requirement of the power grid, and constructing basic elements of multi-agent reinforcement learning, including environment, agents, states, observation, actions and rewarding functions; step 2, operating a simulation environment of the power system, and creating an initial operation state data set of the power system; step 3, constructing a deep neural network model, and training decision-making agents by applying enhanced inter-agent learning; and 4, providing a strategy for power grid reconstruction by using the trained intelligent agent. According to the invention, through interaction of multiple intelligent agents and the electric power simulation environment, the strategy of optimal network reconstruction is learned offline, and the strategy is applied to an actual power grid online.
Description
Technical Field
The invention relates to the field of multi-agent reinforcement learning, in particular to a smart grid partition network reconstruction method based on multi-agent reinforcement learning
Background
The network reconstruction refers to changing the network topology structure of the power grid, namely changing the running states of the tie switches and the sectionalizing switches of the power grid, so that loads between feeder lines or distribution stations are transferred, and the running states of the power grid are changed. When the power grid fails, the network reconstruction can enable the power grid to recover safe and stable operation. The traditional network reconstruction relies on an optimization algorithm or expert experience, and the optimization algorithm is often huge in calculation amount and low in processing speed, so that the real-time application is not facilitated. Expert experience lacks means for coping with the possible risks which do not occur, and the problem of running safety of the increasingly complex power system is difficult to solve. In addition, the uncertainty of wind power, photovoltaic power generation and load is difficult to consider simultaneously in the traditional network reconstruction. Before the network reconstruction is executed, the running state of the power grid after the network reconstruction needs to be estimated, and the accuracy of the estimation directly determines the advantages and disadvantages of the network reconstruction action, so that the difficulty of the network reconstruction is increased. The reinforcement learning fully considers the change rule of the environment, has the capability of predicting the new environment after the action, and provides a new idea for network reconstruction. In addition, the method based on reinforcement learning has the characteristics of high calculation speed and high efficiency, and is suitable for online application of a power system.
Disclosure of Invention
The invention aims to provide a multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method for realizing automatic decision-making and safe operation of a power grid,
the purpose of the invention is realized in the following way:
a smart grid partition network reconstruction method based on multi-agent reinforcement learning comprises the following steps:
dividing a power grid into N areas according to the operation requirement of the power grid, and constructing basic elements of multi-agent reinforcement learning, including environment, agents, states, observation, actions and rewarding functions;
step 2, operating a simulation environment of the power system, and creating an initial operation state data set of the power system;
step 3, constructing a deep neural network model, and training decision-making agents by applying enhanced Inter-Agent Learning (RIAL);
and 4, providing a strategy for power grid reconstruction by using the trained intelligent agent.
Further, the basic element construction process of the multi-agent reinforcement learning method in the step 1 comprises the following steps:
step 1.1: and constructing an interaction environment taking the simulation environment of the power system as an intelligent agent, and providing decision reference for the intelligent agent for various attributes and state values of the power grid. And when the power system is safely operated, i.e. no overload line exists, the intelligent agent is not operated. If and only if the line overload exists in the power system, the intelligent agent performs a series of continuous decision actions, so that the power system is restored to safe operation. Each time a step length is operated, the environment modifies relevant parameters in the power grid according to actions of all intelligent agents, and then the power flow calculation is carried out to update the power grid state according to time-varying rules of power plants and load power;
step 1.2: n zone control agents are constructed. The agent acts as both a decision maker and a learner, interacts with the environment to obtain experience, and learns from time to obtain an optimal strategy. Each intelligent agent is responsible for supervising an area, and the intelligent agents continuously learn an optimal global strategy through cooperation;
step 1.3: a global state space is constructed. The state reflects the operating state of the power system at a certain moment. Active power of a power grid topological structure, a power plant, a load and a transmission line is used as a current system characteristic;
step 1.4: an observation space is built for each agent. And observing and reflecting the operation state of the regional power grid which can be observed by a certain agent at a certain moment. Taking the power grid topological structure, a power plant, a load and the active power of a transmission line as observables;
step 1.5: an environmental action space is built for each agent. The environmental actions of each agent affect the environment and team rewards. The environmental action is selected from one of the following two actions to be performed: switching a line; the bus is switched for a device of a substation. When the power grid runs safely, the environment action is selected to be kept as it is; once the line out of limit is found, the grid topology is changed to restore grid security. According to the operation limit of the actual power grid, the operation of the same line or power distribution station needs to be separated by at least 3 step sizes, and one step size corresponds to 5 minutes in the actual power grid;
step 1.6: a communication action space is built for each agent. The communication actions of each agent are received by other agents at the next moment and are used as decision basis, but the environment or rewards are not directly influenced. The communication action is a multidimensional vector, and the dimension of the multidimensional vector is determined by the communication capacity and the communication requirement between the intelligent agents in the actual application scene;
step 1.7: the bonus function includes two cases. The first is a reward function based on line overload in the reconstruction process;
and secondly, a reward function obtained based on whether the system is restored to safe operation or not at the end of the reconstruction of the round.
Bonus function based on line overload: and the sum of the per-unit values of the line overload amounts of all overload lines at the current moment.
Wherein is P i actual The actual active power per unit value, P, of the ith line i threshold And the per unit value of the active power threshold value of the ith line is represented by O, and the per unit value of the active power threshold value of the ith line is represented by a sequence number set of the overload line.
Further, the method for constructing the running state data set of the electric power system in the step 2 includes the following steps:
step 2.1: establishing a topological structure model and a tide calculation model of the power grid according to the power grid structure of the intelligent body;
step 2.2: establishing a time-varying rule model of active power of each power plant and load in the power grid by using the historical data and the forecast data of the real power grid;
step 2.3: random network attacks are designed. After the power grid runs safely and stably, a line is randomly disconnected, so that the creation event is handed over to an intelligent agent for solving.
Further, the training method by using the RIAL algorithm in the step 3 is as follows:
all agents were trained simultaneously using Deep Q Network (DQN), but there were two modifications to DQN: first, no experience reuse pool is used; second, the environmental actions and communication actions taken by the agent are taken as input to the next time step.
The deep Q learning of multiple agents includes the steps of:
step 3.1: establishing a simulation environment of the power system;
step 3.2: determining a state space, an observation space, an environment action space and a communication action space;
step 3.3: determining a neural network structure of the intelligent agent according to the RIAL architecture and initializing neural network parameters;
step 3.4: initializing an environment, and inputting a fault state of a power system as an initial state;
step 3.5: each step length, all the agents select respective actions, the environment is converted into a new environment after receiving the combined actions, rewards are generated, and the neural network parameters of the agents are updated according to the transfer process;
step 3.6: and judging whether the environment reaches a convergence or divergence condition, if not, returning to the step 3.5, otherwise, returning to the step 3.4.
Compared with the prior art, the invention has the beneficial effects that:
the method solves the problem of reconstruction after the complex power grid faults by adopting a multi-agent method, does not need to model a complex power system, learns an optimal reconstruction strategy through interaction between the multi-agent and the environment and information interaction among the multi-agent, realizes automatic reconstruction of the network, does not depend on an expert system and a traditional model algorithm, has self-adaptability to wind power, photovoltaic and load uncertainty, and has better countermeasure to unknown risks. The multi-agent in the partition has high training efficiency and high decision-making speed.
Drawings
FIG. 1 is a general flow chart of the present invention;
FIG. 2 is a diagram of the RIAL architecture of the present invention;
FIG. 3 is a DQN training flow diagram for multiple agents of the present invention;
FIG. 4 is a schematic illustration of multi-agent communication according to the present invention;
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description.
An intelligent power grid partition automatic decision-making method based on multi-agent reinforcement learning, the general flow chart of which is shown in fig. 1, comprises the following steps:
step 1: the power grid is divided into N areas according to the operation requirement of the power grid, and basic elements of multi-agent reinforcement learning (MARL) are constructed, including environment, agents, states, observation, actions and rewarding functions.
Step 2: the power system simulation environment is run, and an initial running state data set of the power system is created.
Step 3: a deep neural network model is constructed, and reinforcement inter-agent learning (RIAL) is applied to train decision agents.
Step 4: and providing a strategy for power grid control by using the trained agent.
The invention also includes:
1. the basic element construction process of the multi-agent reinforcement learning method in the step 1 is as follows:
(1) And constructing an interaction environment taking the simulation environment of the power system as an intelligent agent, and providing decision reference for the intelligent agent for various attributes and state values of the power grid. And when the power system is safely operated, i.e. no overload line exists, the intelligent agent is not operated. If and only if the line overload exists in the power system, the intelligent agent performs a series of continuous decision actions, so that the power system is restored to safe operation. And when one step length is operated, the environment modifies relevant parameters in the power grid according to the actions of all the agents, and then the power flow calculation is carried out to update the power grid state according to the time-varying rule of the power plant and the load power.
(2) N zone control agents are constructed. The agent acts as both a decision maker and a learner, interacts with the environment to obtain experience, and learns from time to obtain an optimal strategy. Each agent is responsible for supervising an area, and the agents continuously learn an optimal global strategy through cooperation.
(3) A global state space is constructed. The state reflects the operating state of the power system at a certain moment. Active power of a power grid topological structure, a power plant, a load and a transmission line is taken as the current system characteristic.
(4) An observation space is built for each agent. And observing and reflecting the operation state of the regional power grid which can be observed by a certain agent at a certain moment. The power grid topology structure, the power plant, the load and the active power of the transmission line are taken as observables.
(5) An environmental action space is built for each agent. The environmental actions of each agent affect the environment and team rewards. The environmental action is selected from one of the following two actions to be performed: switching a line; the bus is switched for a device of a substation. When the power grid runs safely (no out-of-limit line exists in the power grid), the environmental action is selected to be kept as it is; once the line out of limit is found, the grid topology is changed to restore grid security. According to the operation limit of the actual power grid, the operation of the same line or power distribution station needs to be separated by at least 3 step sizes, and one step size corresponds to 5 minutes in the actual power grid.
(6) A communication action space is built for each agent. The communication actions of each agent are received by other agents at the next moment and are used as decision basis, but the environment or rewards are not directly influenced. The communication action is a multidimensional vector, and the dimension of the multidimensional vector is determined by the communication capability and the communication requirement between the intelligent agents in the actual application scene.
(7) The bonus function includes two cases. The first is a reward function based on line overload during reconstruction. The second is a bonus function based on whether the system resumes safe operation at the end of the current round of reconstruction.
Bonus function based on line overload: and the sum of the per-unit values of the line overload amounts of all overload lines at the current moment.
Wherein is P i actual The actual active power per unit value, P, of the ith line i threshold And the per unit value of the active power threshold value of the ith line is represented by O, and the per unit value of the active power threshold value of the ith line is represented by a sequence number set of the overload line.
An end condition for a round of reconstruction is determined. When the power system is restored to safety, i.e. no overload line exists, the present round of reconstruction is successful, ends and a larger prize, e.g. 100, is obtained. If the power system has not yet reached safety over multiple actions (exceeding the set maximum number of steps), the current round of reconstruction fails, ending and giving a larger penalty, e.g. -100.
2. The method for constructing the running state data set of the electric power system in the step 2 comprises the following steps:
(1) And establishing a topological structure model and a tide calculation model of the power grid according to the power grid structure of the intelligent body.
(2) And establishing a time-varying rule model of the active power of each power plant and load in the power grid by using the real power grid historical data and the prediction data.
(3) Random network attacks are designed. After the power grid runs safely and stably, a line is randomly disconnected (accidents possibly happening in the power grid are simulated, such as cable burnout, artificial damage and the like), so that the events are created and are addressed by an intelligent agent.
3. The method for constructing the deep neural network model in the step 3 is as follows:
each agent comprises two cyclic neural network RNNs, which correspond to the environmental actions and the communication actions, respectively. The RNN corresponding to the environmental action is input as own observation at the current moment, information from other intelligent agents at the previous moment, own environmental action at the previous moment, own individual number, and output as Q function of own environmental action at the current moment and the environmental action. The RNN corresponding to the communication action is inputted with the current observation of the RNN, the information from other intelligent agents at the previous time, the communication action of the RNN at the previous time, and the individual number of the RNN, and the RNN is outputted as the Q function of the communication action of the RNN and the communication action of the RNN. The RNN is composed of a GRU layer, a BN layer, a Relu activation layer and a full connection layer.
The RIAL architecture is shown in figure 2. Wherein i is an agentI' represents other agents than i,indicating the observation of the ith agent at time t,/-)>The communication action from other agents at time t-1, a is the environmental action, and Q is the value function.
4. The training method by using RIAL algorithm in step 3 is:
all agents were trained simultaneously using deep Q learning (DQN), but there are two modifications to DQN: first, no experience reuse pool is used; second, the environmental actions and communication actions taken by the agent are taken as input to the next time step.
The deep Q learning of multiple agents includes the steps of:
step 1: establishing a simulation environment of the power system;
step 2: determining a state space, an observation space, an environment action space and a communication action space;
step 3: determining a neural network structure of the intelligent agent according to the RIAL architecture and initializing neural network parameters;
step 4: initializing an environment, and inputting a fault state of a power system as an initial state;
step 5: each step length, all the agents select respective actions, the environment is converted into a new environment after receiving the combined actions, rewards are generated, and the neural network parameters of the agents are updated according to the transfer process;
step 6: and judging whether the environment reaches a convergence or divergence condition, if not, returning to the step 5, otherwise, returning to the step 4.
The DQN training flow is as shown in figure 3. The communication process of the multi-agent is shown in fig. 4.
Claims (1)
1. A smart grid partition network reconstruction method based on multi-agent reinforcement learning is characterized by comprising the following steps:
step 1: dividing the power grid into N areas according to the operation requirement of the power grid, and constructing basic elements of multi-agent reinforcement learning, including environment, agents, states, observation, actions and rewarding functions;
step 1.1: constructing an interaction environment taking a power system simulation environment as an intelligent agent, and providing decision reference for the intelligent agent for various attributes and state values of a power grid; when the power system runs safely, i.e. no overload line exists, the intelligent agent is not operated; if and only if the line overload exists in the power system, the intelligent agent performs a series of continuous decision behaviors, so that the power system is restored to safe operation; each time a step length is operated, the environment modifies relevant parameters in the power grid according to actions of all intelligent agents, and then the power flow calculation is carried out to update the power grid state according to time-varying rules of power plants and load power;
step 1.2: constructing N regional control intelligent agents; the intelligent agent is used as a decision maker and a learner, interacts with the environment to obtain experience, and continuously learns to obtain an optimal strategy; each intelligent agent is responsible for supervising an area, and the intelligent agents continuously learn an optimal global strategy through cooperation;
step 1.3: constructing a global state space; the state reflects the running state of the power system at a certain moment; active power of a power grid topological structure, a power plant, a load and a transmission line is used as a current system characteristic;
step 1.4: constructing an observation space for each agent; observing and reflecting the operation state of the regional power grid which can be observed by a certain agent at a certain moment; taking the power grid topological structure, a power plant, a load and the active power of a transmission line as observables;
step 1.5: building an environmental action space for each agent; the environmental actions of each agent can affect the environment and team rewards; the environmental action is selected from one of the following two actions to be performed: switching a line; switching bus bars for a device of a substation; when the power grid runs safely, the environment action is selected to be kept as it is; once the line out-of-limit is found, changing the topology of the power grid to restore the power grid security; according to the operation limit of the actual power grid, the operation of the same line or power distribution station needs to be separated by at least 3 step sizes, and one step size corresponds to 5 minutes in the actual power grid;
step 1.6: constructing a communication action space for each agent; the communication action of each intelligent agent can be received by other intelligent agents at the next moment and used as the basis of decision, but the environment or rewards are not directly influenced; the communication action is a multidimensional vector, and the dimension of the multidimensional vector is determined by the communication capacity and the communication requirement between the intelligent agents in the actual application scene;
step 1.7: the rewarding function comprises two cases, namely, a rewarding function based on the line overload in the reconstruction process and a rewarding function obtained based on whether the system recovers safe operation or not when the reconstruction of the round is finished;
in the reconstruction process, a reward function based on the line overload amount is the sum of the line overload amount per unit values of all overload lines at the current moment;
wherein P is i actual The actual active power per unit value of the ith line; p (P) i threshold The active power threshold per unit value of the ith line; o is the sequence number set of the overload line;
step 2: operating a power system simulation environment, and creating an initial operating state data set of the power system;
step 2.1: establishing a topological structure model and a tide calculation model of the power grid according to the power grid structure of the intelligent body;
step 2.2: establishing a time-varying rule model of active power of each power plant and load in the power grid by using the historical data and the forecast data of the real power grid;
step 2.3: designing random network attack; randomly disconnecting a line after the power grid runs safely and stably, so that the creation event is handed over to an intelligent agent for solving;
step 3: constructing a deep neural network model, and training decision-making agents by applying enhanced inter-agent learning;
all agents were trained simultaneously using deep Q network learning, and there were two modifications to the deep Q network: first, no experience reuse pool is used; secondly, taking the environmental action and the communication action taken by the intelligent agent as the input of the next time step;
the deep Q network learning of multiple agents includes the steps of:
step 3.1: establishing a simulation environment of the power system;
step 3.2: determining a state space, an observation space, an environment action space and a communication action space;
step 3.3: determining a neural network structure of the intelligent agent according to the RIAL architecture and initializing neural network parameters;
step 3.4: initializing an environment, and inputting a fault state of a power system as an initial state;
step 3.5: each step length, all the agents select respective actions, the environment is converted into a new environment after receiving the combined actions, rewards are generated, and the neural network parameters of the agents are updated according to the conversion process;
step 3.6: judging whether the environment reaches a convergence or divergence condition, if not, returning to the step 3.5, otherwise, returning to the step 3.4;
step 4: and providing a strategy for power grid reconstruction by using the trained agent.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111364422.3A CN114123178B (en) | 2021-11-17 | 2021-11-17 | Multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111364422.3A CN114123178B (en) | 2021-11-17 | 2021-11-17 | Multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114123178A CN114123178A (en) | 2022-03-01 |
CN114123178B true CN114123178B (en) | 2023-12-19 |
Family
ID=80396390
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111364422.3A Active CN114123178B (en) | 2021-11-17 | 2021-11-17 | Multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114123178B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114662982B (en) * | 2022-04-15 | 2023-07-14 | 四川大学 | Multistage dynamic reconstruction method for urban power distribution network based on machine learning |
CN114925850B (en) * | 2022-05-11 | 2024-02-20 | 华东师范大学 | Deep reinforcement learning countermeasure defense method for disturbance rewards |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110945542A (en) * | 2018-06-29 | 2020-03-31 | 东莞理工学院 | Multi-agent deep reinforcement learning agent method based on smart power grid |
CN112186799A (en) * | 2020-09-22 | 2021-01-05 | 中国电力科学研究院有限公司 | Distributed energy system autonomous control method and system based on deep reinforcement learning |
CN112615379A (en) * | 2020-12-10 | 2021-04-06 | 浙江大学 | Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning |
CN112927505A (en) * | 2021-01-28 | 2021-06-08 | 哈尔滨工程大学 | Signal lamp self-adaptive control method based on multi-agent deep reinforcement learning in Internet of vehicles environment |
CN113097994A (en) * | 2021-03-15 | 2021-07-09 | 国网浙江省电力有限公司 | Power grid operation mode adjusting method and device based on multiple reinforcement learning agents |
CN113363998A (en) * | 2021-06-21 | 2021-09-07 | 东南大学 | Power distribution network voltage control method based on multi-agent deep reinforcement learning |
CN113392935A (en) * | 2021-07-09 | 2021-09-14 | 浙江工业大学 | Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism |
CN113452026A (en) * | 2021-06-29 | 2021-09-28 | 华中科技大学 | Intelligent training method, evaluation method and system for weak evaluation of power system |
WO2023093537A1 (en) * | 2021-11-26 | 2023-06-01 | 南京邮电大学 | Multi-end collaborative voltage treatment method and system for power distribution network with high-penetration-rate photovoltaic access, and storage medium |
WO2023109699A1 (en) * | 2021-12-17 | 2023-06-22 | 深圳先进技术研究院 | Multi-agent communication learning method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200119556A1 (en) * | 2018-10-11 | 2020-04-16 | Di Shi | Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency |
-
2021
- 2021-11-17 CN CN202111364422.3A patent/CN114123178B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110945542A (en) * | 2018-06-29 | 2020-03-31 | 东莞理工学院 | Multi-agent deep reinforcement learning agent method based on smart power grid |
CN112186799A (en) * | 2020-09-22 | 2021-01-05 | 中国电力科学研究院有限公司 | Distributed energy system autonomous control method and system based on deep reinforcement learning |
CN112615379A (en) * | 2020-12-10 | 2021-04-06 | 浙江大学 | Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning |
CN112927505A (en) * | 2021-01-28 | 2021-06-08 | 哈尔滨工程大学 | Signal lamp self-adaptive control method based on multi-agent deep reinforcement learning in Internet of vehicles environment |
CN113097994A (en) * | 2021-03-15 | 2021-07-09 | 国网浙江省电力有限公司 | Power grid operation mode adjusting method and device based on multiple reinforcement learning agents |
CN113363998A (en) * | 2021-06-21 | 2021-09-07 | 东南大学 | Power distribution network voltage control method based on multi-agent deep reinforcement learning |
CN113452026A (en) * | 2021-06-29 | 2021-09-28 | 华中科技大学 | Intelligent training method, evaluation method and system for weak evaluation of power system |
CN113392935A (en) * | 2021-07-09 | 2021-09-14 | 浙江工业大学 | Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism |
WO2023093537A1 (en) * | 2021-11-26 | 2023-06-01 | 南京邮电大学 | Multi-end collaborative voltage treatment method and system for power distribution network with high-penetration-rate photovoltaic access, and storage medium |
WO2023109699A1 (en) * | 2021-12-17 | 2023-06-22 | 深圳先进技术研究院 | Multi-agent communication learning method |
Non-Patent Citations (2)
Title |
---|
基于深度强化学习的群体对抗策略研究;刘强;姜峰;;智能计算机与应用(第05期);全文 * |
船舶中压直流电力***配电网络重构技术研究;刘胜;王天骐;张兰勇;;舰船科学技术(第01期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114123178A (en) | 2022-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114123178B (en) | Multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method | |
CN112615379B (en) | Power grid multi-section power control method based on distributed multi-agent reinforcement learning | |
CN114217524B (en) | Power grid real-time self-adaptive decision-making method based on deep reinforcement learning | |
CN104934968A (en) | Multi-agent based distribution network disaster responding recovery coordinate control method and multi-agent based distribution network disaster responding recovery coordinate control device | |
CN110336270B (en) | Updating method of transient stability prediction model of power system | |
CN112701681B (en) | Power grid accidental fault safety regulation and control strategy generation method based on reinforcement learning | |
CN116454926B (en) | Multi-type resource cooperative regulation and control method for three-phase unbalanced management of distribution network | |
CN114666204B (en) | Fault root cause positioning method and system based on causal reinforcement learning | |
CN113761791A (en) | Power system automatic operation method and device based on physical information and deep reinforcement learning | |
CN112327098A (en) | Power distribution network fault section positioning method based on low-voltage distribution network comprehensive monitoring unit | |
Kodama et al. | Multi‐agent‐based autonomous power distribution network restoration using contract net protocol | |
CN108270216A (en) | A kind of Complicated Distribution Network fault recovery system and method for considering multiple target | |
CN115133540B (en) | Model-free real-time voltage control method for power distribution network | |
CN116151562A (en) | Mobile emergency vehicle scheduling and power distribution network toughness improving method based on graphic neural network reinforcement learning | |
CN108521345A (en) | A kind of information physical collaboration countermeasure for the isolated island micro-capacitance sensor considering communication disruption | |
CN114417710A (en) | Overload dynamic decision generation method and related device for power transmission network | |
Yang et al. | Control method of power grid topology structure based on reinforcement learning | |
Zhang et al. | Reinforcement Learning based Optimization of Line Switching off during Cascading failures in Power Grids | |
CN117791560A (en) | Active power distribution network elastic self-healing method considering dynamic micro-grid and controller | |
CN113725853B (en) | Power grid topology control method and system based on active person in-loop reinforcement learning | |
CN113837654B (en) | Multi-objective-oriented smart grid hierarchical scheduling method | |
CN117914001B (en) | Power system, fault studying and judging method, device, equipment and medium | |
CN115660324B (en) | Power grid multi-section out-of-limit regulation and control method and system based on graph reinforcement learning | |
CN117526309A (en) | Power distribution network recovery method and device, electronic equipment and storage medium | |
CN117057623A (en) | Comprehensive power grid safety optimization scheduling method, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |