CN114123178B - Multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method - Google Patents

Multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method Download PDF

Info

Publication number
CN114123178B
CN114123178B CN202111364422.3A CN202111364422A CN114123178B CN 114123178 B CN114123178 B CN 114123178B CN 202111364422 A CN202111364422 A CN 202111364422A CN 114123178 B CN114123178 B CN 114123178B
Authority
CN
China
Prior art keywords
power grid
power
agent
environment
agents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111364422.3A
Other languages
Chinese (zh)
Other versions
CN114123178A (en
Inventor
卢芳
陈理先
王琴
姚绪梁
兰海
刘宏达
黄曼磊
刘瑜超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN202111364422.3A priority Critical patent/CN114123178B/en
Publication of CN114123178A publication Critical patent/CN114123178A/en
Application granted granted Critical
Publication of CN114123178B publication Critical patent/CN114123178B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/04Power grid distribution networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Power Engineering (AREA)
  • Supply And Distribution Of Alternating Current (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method, which comprises the following steps: dividing a power grid into N areas according to the operation requirement of the power grid, and constructing basic elements of multi-agent reinforcement learning, including environment, agents, states, observation, actions and rewarding functions; step 2, operating a simulation environment of the power system, and creating an initial operation state data set of the power system; step 3, constructing a deep neural network model, and training decision-making agents by applying enhanced inter-agent learning; and 4, providing a strategy for power grid reconstruction by using the trained intelligent agent. According to the invention, through interaction of multiple intelligent agents and the electric power simulation environment, the strategy of optimal network reconstruction is learned offline, and the strategy is applied to an actual power grid online.

Description

Multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method
Technical Field
The invention relates to the field of multi-agent reinforcement learning, in particular to a smart grid partition network reconstruction method based on multi-agent reinforcement learning
Background
The network reconstruction refers to changing the network topology structure of the power grid, namely changing the running states of the tie switches and the sectionalizing switches of the power grid, so that loads between feeder lines or distribution stations are transferred, and the running states of the power grid are changed. When the power grid fails, the network reconstruction can enable the power grid to recover safe and stable operation. The traditional network reconstruction relies on an optimization algorithm or expert experience, and the optimization algorithm is often huge in calculation amount and low in processing speed, so that the real-time application is not facilitated. Expert experience lacks means for coping with the possible risks which do not occur, and the problem of running safety of the increasingly complex power system is difficult to solve. In addition, the uncertainty of wind power, photovoltaic power generation and load is difficult to consider simultaneously in the traditional network reconstruction. Before the network reconstruction is executed, the running state of the power grid after the network reconstruction needs to be estimated, and the accuracy of the estimation directly determines the advantages and disadvantages of the network reconstruction action, so that the difficulty of the network reconstruction is increased. The reinforcement learning fully considers the change rule of the environment, has the capability of predicting the new environment after the action, and provides a new idea for network reconstruction. In addition, the method based on reinforcement learning has the characteristics of high calculation speed and high efficiency, and is suitable for online application of a power system.
Disclosure of Invention
The invention aims to provide a multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method for realizing automatic decision-making and safe operation of a power grid,
the purpose of the invention is realized in the following way:
a smart grid partition network reconstruction method based on multi-agent reinforcement learning comprises the following steps:
dividing a power grid into N areas according to the operation requirement of the power grid, and constructing basic elements of multi-agent reinforcement learning, including environment, agents, states, observation, actions and rewarding functions;
step 2, operating a simulation environment of the power system, and creating an initial operation state data set of the power system;
step 3, constructing a deep neural network model, and training decision-making agents by applying enhanced Inter-Agent Learning (RIAL);
and 4, providing a strategy for power grid reconstruction by using the trained intelligent agent.
Further, the basic element construction process of the multi-agent reinforcement learning method in the step 1 comprises the following steps:
step 1.1: and constructing an interaction environment taking the simulation environment of the power system as an intelligent agent, and providing decision reference for the intelligent agent for various attributes and state values of the power grid. And when the power system is safely operated, i.e. no overload line exists, the intelligent agent is not operated. If and only if the line overload exists in the power system, the intelligent agent performs a series of continuous decision actions, so that the power system is restored to safe operation. Each time a step length is operated, the environment modifies relevant parameters in the power grid according to actions of all intelligent agents, and then the power flow calculation is carried out to update the power grid state according to time-varying rules of power plants and load power;
step 1.2: n zone control agents are constructed. The agent acts as both a decision maker and a learner, interacts with the environment to obtain experience, and learns from time to obtain an optimal strategy. Each intelligent agent is responsible for supervising an area, and the intelligent agents continuously learn an optimal global strategy through cooperation;
step 1.3: a global state space is constructed. The state reflects the operating state of the power system at a certain moment. Active power of a power grid topological structure, a power plant, a load and a transmission line is used as a current system characteristic;
step 1.4: an observation space is built for each agent. And observing and reflecting the operation state of the regional power grid which can be observed by a certain agent at a certain moment. Taking the power grid topological structure, a power plant, a load and the active power of a transmission line as observables;
step 1.5: an environmental action space is built for each agent. The environmental actions of each agent affect the environment and team rewards. The environmental action is selected from one of the following two actions to be performed: switching a line; the bus is switched for a device of a substation. When the power grid runs safely, the environment action is selected to be kept as it is; once the line out of limit is found, the grid topology is changed to restore grid security. According to the operation limit of the actual power grid, the operation of the same line or power distribution station needs to be separated by at least 3 step sizes, and one step size corresponds to 5 minutes in the actual power grid;
step 1.6: a communication action space is built for each agent. The communication actions of each agent are received by other agents at the next moment and are used as decision basis, but the environment or rewards are not directly influenced. The communication action is a multidimensional vector, and the dimension of the multidimensional vector is determined by the communication capacity and the communication requirement between the intelligent agents in the actual application scene;
step 1.7: the bonus function includes two cases. The first is a reward function based on line overload in the reconstruction process;
and secondly, a reward function obtained based on whether the system is restored to safe operation or not at the end of the reconstruction of the round.
Bonus function based on line overload: and the sum of the per-unit values of the line overload amounts of all overload lines at the current moment.
Wherein is P i actual The actual active power per unit value, P, of the ith line i threshold And the per unit value of the active power threshold value of the ith line is represented by O, and the per unit value of the active power threshold value of the ith line is represented by a sequence number set of the overload line.
Further, the method for constructing the running state data set of the electric power system in the step 2 includes the following steps:
step 2.1: establishing a topological structure model and a tide calculation model of the power grid according to the power grid structure of the intelligent body;
step 2.2: establishing a time-varying rule model of active power of each power plant and load in the power grid by using the historical data and the forecast data of the real power grid;
step 2.3: random network attacks are designed. After the power grid runs safely and stably, a line is randomly disconnected, so that the creation event is handed over to an intelligent agent for solving.
Further, the training method by using the RIAL algorithm in the step 3 is as follows:
all agents were trained simultaneously using Deep Q Network (DQN), but there were two modifications to DQN: first, no experience reuse pool is used; second, the environmental actions and communication actions taken by the agent are taken as input to the next time step.
The deep Q learning of multiple agents includes the steps of:
step 3.1: establishing a simulation environment of the power system;
step 3.2: determining a state space, an observation space, an environment action space and a communication action space;
step 3.3: determining a neural network structure of the intelligent agent according to the RIAL architecture and initializing neural network parameters;
step 3.4: initializing an environment, and inputting a fault state of a power system as an initial state;
step 3.5: each step length, all the agents select respective actions, the environment is converted into a new environment after receiving the combined actions, rewards are generated, and the neural network parameters of the agents are updated according to the transfer process;
step 3.6: and judging whether the environment reaches a convergence or divergence condition, if not, returning to the step 3.5, otherwise, returning to the step 3.4.
Compared with the prior art, the invention has the beneficial effects that:
the method solves the problem of reconstruction after the complex power grid faults by adopting a multi-agent method, does not need to model a complex power system, learns an optimal reconstruction strategy through interaction between the multi-agent and the environment and information interaction among the multi-agent, realizes automatic reconstruction of the network, does not depend on an expert system and a traditional model algorithm, has self-adaptability to wind power, photovoltaic and load uncertainty, and has better countermeasure to unknown risks. The multi-agent in the partition has high training efficiency and high decision-making speed.
Drawings
FIG. 1 is a general flow chart of the present invention;
FIG. 2 is a diagram of the RIAL architecture of the present invention;
FIG. 3 is a DQN training flow diagram for multiple agents of the present invention;
FIG. 4 is a schematic illustration of multi-agent communication according to the present invention;
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description.
An intelligent power grid partition automatic decision-making method based on multi-agent reinforcement learning, the general flow chart of which is shown in fig. 1, comprises the following steps:
step 1: the power grid is divided into N areas according to the operation requirement of the power grid, and basic elements of multi-agent reinforcement learning (MARL) are constructed, including environment, agents, states, observation, actions and rewarding functions.
Step 2: the power system simulation environment is run, and an initial running state data set of the power system is created.
Step 3: a deep neural network model is constructed, and reinforcement inter-agent learning (RIAL) is applied to train decision agents.
Step 4: and providing a strategy for power grid control by using the trained agent.
The invention also includes:
1. the basic element construction process of the multi-agent reinforcement learning method in the step 1 is as follows:
(1) And constructing an interaction environment taking the simulation environment of the power system as an intelligent agent, and providing decision reference for the intelligent agent for various attributes and state values of the power grid. And when the power system is safely operated, i.e. no overload line exists, the intelligent agent is not operated. If and only if the line overload exists in the power system, the intelligent agent performs a series of continuous decision actions, so that the power system is restored to safe operation. And when one step length is operated, the environment modifies relevant parameters in the power grid according to the actions of all the agents, and then the power flow calculation is carried out to update the power grid state according to the time-varying rule of the power plant and the load power.
(2) N zone control agents are constructed. The agent acts as both a decision maker and a learner, interacts with the environment to obtain experience, and learns from time to obtain an optimal strategy. Each agent is responsible for supervising an area, and the agents continuously learn an optimal global strategy through cooperation.
(3) A global state space is constructed. The state reflects the operating state of the power system at a certain moment. Active power of a power grid topological structure, a power plant, a load and a transmission line is taken as the current system characteristic.
(4) An observation space is built for each agent. And observing and reflecting the operation state of the regional power grid which can be observed by a certain agent at a certain moment. The power grid topology structure, the power plant, the load and the active power of the transmission line are taken as observables.
(5) An environmental action space is built for each agent. The environmental actions of each agent affect the environment and team rewards. The environmental action is selected from one of the following two actions to be performed: switching a line; the bus is switched for a device of a substation. When the power grid runs safely (no out-of-limit line exists in the power grid), the environmental action is selected to be kept as it is; once the line out of limit is found, the grid topology is changed to restore grid security. According to the operation limit of the actual power grid, the operation of the same line or power distribution station needs to be separated by at least 3 step sizes, and one step size corresponds to 5 minutes in the actual power grid.
(6) A communication action space is built for each agent. The communication actions of each agent are received by other agents at the next moment and are used as decision basis, but the environment or rewards are not directly influenced. The communication action is a multidimensional vector, and the dimension of the multidimensional vector is determined by the communication capability and the communication requirement between the intelligent agents in the actual application scene.
(7) The bonus function includes two cases. The first is a reward function based on line overload during reconstruction. The second is a bonus function based on whether the system resumes safe operation at the end of the current round of reconstruction.
Bonus function based on line overload: and the sum of the per-unit values of the line overload amounts of all overload lines at the current moment.
Wherein is P i actual The actual active power per unit value, P, of the ith line i threshold And the per unit value of the active power threshold value of the ith line is represented by O, and the per unit value of the active power threshold value of the ith line is represented by a sequence number set of the overload line.
An end condition for a round of reconstruction is determined. When the power system is restored to safety, i.e. no overload line exists, the present round of reconstruction is successful, ends and a larger prize, e.g. 100, is obtained. If the power system has not yet reached safety over multiple actions (exceeding the set maximum number of steps), the current round of reconstruction fails, ending and giving a larger penalty, e.g. -100.
2. The method for constructing the running state data set of the electric power system in the step 2 comprises the following steps:
(1) And establishing a topological structure model and a tide calculation model of the power grid according to the power grid structure of the intelligent body.
(2) And establishing a time-varying rule model of the active power of each power plant and load in the power grid by using the real power grid historical data and the prediction data.
(3) Random network attacks are designed. After the power grid runs safely and stably, a line is randomly disconnected (accidents possibly happening in the power grid are simulated, such as cable burnout, artificial damage and the like), so that the events are created and are addressed by an intelligent agent.
3. The method for constructing the deep neural network model in the step 3 is as follows:
each agent comprises two cyclic neural network RNNs, which correspond to the environmental actions and the communication actions, respectively. The RNN corresponding to the environmental action is input as own observation at the current moment, information from other intelligent agents at the previous moment, own environmental action at the previous moment, own individual number, and output as Q function of own environmental action at the current moment and the environmental action. The RNN corresponding to the communication action is inputted with the current observation of the RNN, the information from other intelligent agents at the previous time, the communication action of the RNN at the previous time, and the individual number of the RNN, and the RNN is outputted as the Q function of the communication action of the RNN and the communication action of the RNN. The RNN is composed of a GRU layer, a BN layer, a Relu activation layer and a full connection layer.
The RIAL architecture is shown in figure 2. Wherein i is an agentI' represents other agents than i,indicating the observation of the ith agent at time t,/-)>The communication action from other agents at time t-1, a is the environmental action, and Q is the value function.
4. The training method by using RIAL algorithm in step 3 is:
all agents were trained simultaneously using deep Q learning (DQN), but there are two modifications to DQN: first, no experience reuse pool is used; second, the environmental actions and communication actions taken by the agent are taken as input to the next time step.
The deep Q learning of multiple agents includes the steps of:
step 1: establishing a simulation environment of the power system;
step 2: determining a state space, an observation space, an environment action space and a communication action space;
step 3: determining a neural network structure of the intelligent agent according to the RIAL architecture and initializing neural network parameters;
step 4: initializing an environment, and inputting a fault state of a power system as an initial state;
step 5: each step length, all the agents select respective actions, the environment is converted into a new environment after receiving the combined actions, rewards are generated, and the neural network parameters of the agents are updated according to the transfer process;
step 6: and judging whether the environment reaches a convergence or divergence condition, if not, returning to the step 5, otherwise, returning to the step 4.
The DQN training flow is as shown in figure 3. The communication process of the multi-agent is shown in fig. 4.

Claims (1)

1. A smart grid partition network reconstruction method based on multi-agent reinforcement learning is characterized by comprising the following steps:
step 1: dividing the power grid into N areas according to the operation requirement of the power grid, and constructing basic elements of multi-agent reinforcement learning, including environment, agents, states, observation, actions and rewarding functions;
step 1.1: constructing an interaction environment taking a power system simulation environment as an intelligent agent, and providing decision reference for the intelligent agent for various attributes and state values of a power grid; when the power system runs safely, i.e. no overload line exists, the intelligent agent is not operated; if and only if the line overload exists in the power system, the intelligent agent performs a series of continuous decision behaviors, so that the power system is restored to safe operation; each time a step length is operated, the environment modifies relevant parameters in the power grid according to actions of all intelligent agents, and then the power flow calculation is carried out to update the power grid state according to time-varying rules of power plants and load power;
step 1.2: constructing N regional control intelligent agents; the intelligent agent is used as a decision maker and a learner, interacts with the environment to obtain experience, and continuously learns to obtain an optimal strategy; each intelligent agent is responsible for supervising an area, and the intelligent agents continuously learn an optimal global strategy through cooperation;
step 1.3: constructing a global state space; the state reflects the running state of the power system at a certain moment; active power of a power grid topological structure, a power plant, a load and a transmission line is used as a current system characteristic;
step 1.4: constructing an observation space for each agent; observing and reflecting the operation state of the regional power grid which can be observed by a certain agent at a certain moment; taking the power grid topological structure, a power plant, a load and the active power of a transmission line as observables;
step 1.5: building an environmental action space for each agent; the environmental actions of each agent can affect the environment and team rewards; the environmental action is selected from one of the following two actions to be performed: switching a line; switching bus bars for a device of a substation; when the power grid runs safely, the environment action is selected to be kept as it is; once the line out-of-limit is found, changing the topology of the power grid to restore the power grid security; according to the operation limit of the actual power grid, the operation of the same line or power distribution station needs to be separated by at least 3 step sizes, and one step size corresponds to 5 minutes in the actual power grid;
step 1.6: constructing a communication action space for each agent; the communication action of each intelligent agent can be received by other intelligent agents at the next moment and used as the basis of decision, but the environment or rewards are not directly influenced; the communication action is a multidimensional vector, and the dimension of the multidimensional vector is determined by the communication capacity and the communication requirement between the intelligent agents in the actual application scene;
step 1.7: the rewarding function comprises two cases, namely, a rewarding function based on the line overload in the reconstruction process and a rewarding function obtained based on whether the system recovers safe operation or not when the reconstruction of the round is finished;
in the reconstruction process, a reward function based on the line overload amount is the sum of the line overload amount per unit values of all overload lines at the current moment;
wherein P is i actual The actual active power per unit value of the ith line; p (P) i threshold The active power threshold per unit value of the ith line; o is the sequence number set of the overload line;
step 2: operating a power system simulation environment, and creating an initial operating state data set of the power system;
step 2.1: establishing a topological structure model and a tide calculation model of the power grid according to the power grid structure of the intelligent body;
step 2.2: establishing a time-varying rule model of active power of each power plant and load in the power grid by using the historical data and the forecast data of the real power grid;
step 2.3: designing random network attack; randomly disconnecting a line after the power grid runs safely and stably, so that the creation event is handed over to an intelligent agent for solving;
step 3: constructing a deep neural network model, and training decision-making agents by applying enhanced inter-agent learning;
all agents were trained simultaneously using deep Q network learning, and there were two modifications to the deep Q network: first, no experience reuse pool is used; secondly, taking the environmental action and the communication action taken by the intelligent agent as the input of the next time step;
the deep Q network learning of multiple agents includes the steps of:
step 3.1: establishing a simulation environment of the power system;
step 3.2: determining a state space, an observation space, an environment action space and a communication action space;
step 3.3: determining a neural network structure of the intelligent agent according to the RIAL architecture and initializing neural network parameters;
step 3.4: initializing an environment, and inputting a fault state of a power system as an initial state;
step 3.5: each step length, all the agents select respective actions, the environment is converted into a new environment after receiving the combined actions, rewards are generated, and the neural network parameters of the agents are updated according to the conversion process;
step 3.6: judging whether the environment reaches a convergence or divergence condition, if not, returning to the step 3.5, otherwise, returning to the step 3.4;
step 4: and providing a strategy for power grid reconstruction by using the trained agent.
CN202111364422.3A 2021-11-17 2021-11-17 Multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method Active CN114123178B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111364422.3A CN114123178B (en) 2021-11-17 2021-11-17 Multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111364422.3A CN114123178B (en) 2021-11-17 2021-11-17 Multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method

Publications (2)

Publication Number Publication Date
CN114123178A CN114123178A (en) 2022-03-01
CN114123178B true CN114123178B (en) 2023-12-19

Family

ID=80396390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111364422.3A Active CN114123178B (en) 2021-11-17 2021-11-17 Multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method

Country Status (1)

Country Link
CN (1) CN114123178B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114662982B (en) * 2022-04-15 2023-07-14 四川大学 Multistage dynamic reconstruction method for urban power distribution network based on machine learning
CN114925850B (en) * 2022-05-11 2024-02-20 华东师范大学 Deep reinforcement learning countermeasure defense method for disturbance rewards

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110945542A (en) * 2018-06-29 2020-03-31 东莞理工学院 Multi-agent deep reinforcement learning agent method based on smart power grid
CN112186799A (en) * 2020-09-22 2021-01-05 中国电力科学研究院有限公司 Distributed energy system autonomous control method and system based on deep reinforcement learning
CN112615379A (en) * 2020-12-10 2021-04-06 浙江大学 Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning
CN112927505A (en) * 2021-01-28 2021-06-08 哈尔滨工程大学 Signal lamp self-adaptive control method based on multi-agent deep reinforcement learning in Internet of vehicles environment
CN113097994A (en) * 2021-03-15 2021-07-09 国网浙江省电力有限公司 Power grid operation mode adjusting method and device based on multiple reinforcement learning agents
CN113363998A (en) * 2021-06-21 2021-09-07 东南大学 Power distribution network voltage control method based on multi-agent deep reinforcement learning
CN113392935A (en) * 2021-07-09 2021-09-14 浙江工业大学 Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism
CN113452026A (en) * 2021-06-29 2021-09-28 华中科技大学 Intelligent training method, evaluation method and system for weak evaluation of power system
WO2023093537A1 (en) * 2021-11-26 2023-06-01 南京邮电大学 Multi-end collaborative voltage treatment method and system for power distribution network with high-penetration-rate photovoltaic access, and storage medium
WO2023109699A1 (en) * 2021-12-17 2023-06-22 深圳先进技术研究院 Multi-agent communication learning method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200119556A1 (en) * 2018-10-11 2020-04-16 Di Shi Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110945542A (en) * 2018-06-29 2020-03-31 东莞理工学院 Multi-agent deep reinforcement learning agent method based on smart power grid
CN112186799A (en) * 2020-09-22 2021-01-05 中国电力科学研究院有限公司 Distributed energy system autonomous control method and system based on deep reinforcement learning
CN112615379A (en) * 2020-12-10 2021-04-06 浙江大学 Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning
CN112927505A (en) * 2021-01-28 2021-06-08 哈尔滨工程大学 Signal lamp self-adaptive control method based on multi-agent deep reinforcement learning in Internet of vehicles environment
CN113097994A (en) * 2021-03-15 2021-07-09 国网浙江省电力有限公司 Power grid operation mode adjusting method and device based on multiple reinforcement learning agents
CN113363998A (en) * 2021-06-21 2021-09-07 东南大学 Power distribution network voltage control method based on multi-agent deep reinforcement learning
CN113452026A (en) * 2021-06-29 2021-09-28 华中科技大学 Intelligent training method, evaluation method and system for weak evaluation of power system
CN113392935A (en) * 2021-07-09 2021-09-14 浙江工业大学 Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism
WO2023093537A1 (en) * 2021-11-26 2023-06-01 南京邮电大学 Multi-end collaborative voltage treatment method and system for power distribution network with high-penetration-rate photovoltaic access, and storage medium
WO2023109699A1 (en) * 2021-12-17 2023-06-22 深圳先进技术研究院 Multi-agent communication learning method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于深度强化学习的群体对抗策略研究;刘强;姜峰;;智能计算机与应用(第05期);全文 *
船舶中压直流电力***配电网络重构技术研究;刘胜;王天骐;张兰勇;;舰船科学技术(第01期);全文 *

Also Published As

Publication number Publication date
CN114123178A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN114123178B (en) Multi-agent reinforcement learning-based intelligent power grid partition network reconstruction method
CN112615379B (en) Power grid multi-section power control method based on distributed multi-agent reinforcement learning
CN114217524B (en) Power grid real-time self-adaptive decision-making method based on deep reinforcement learning
CN104934968A (en) Multi-agent based distribution network disaster responding recovery coordinate control method and multi-agent based distribution network disaster responding recovery coordinate control device
CN110336270B (en) Updating method of transient stability prediction model of power system
CN112701681B (en) Power grid accidental fault safety regulation and control strategy generation method based on reinforcement learning
CN116454926B (en) Multi-type resource cooperative regulation and control method for three-phase unbalanced management of distribution network
CN114666204B (en) Fault root cause positioning method and system based on causal reinforcement learning
CN113761791A (en) Power system automatic operation method and device based on physical information and deep reinforcement learning
CN112327098A (en) Power distribution network fault section positioning method based on low-voltage distribution network comprehensive monitoring unit
Kodama et al. Multi‐agent‐based autonomous power distribution network restoration using contract net protocol
CN108270216A (en) A kind of Complicated Distribution Network fault recovery system and method for considering multiple target
CN115133540B (en) Model-free real-time voltage control method for power distribution network
CN116151562A (en) Mobile emergency vehicle scheduling and power distribution network toughness improving method based on graphic neural network reinforcement learning
CN108521345A (en) A kind of information physical collaboration countermeasure for the isolated island micro-capacitance sensor considering communication disruption
CN114417710A (en) Overload dynamic decision generation method and related device for power transmission network
Yang et al. Control method of power grid topology structure based on reinforcement learning
Zhang et al. Reinforcement Learning based Optimization of Line Switching off during Cascading failures in Power Grids
CN117791560A (en) Active power distribution network elastic self-healing method considering dynamic micro-grid and controller
CN113725853B (en) Power grid topology control method and system based on active person in-loop reinforcement learning
CN113837654B (en) Multi-objective-oriented smart grid hierarchical scheduling method
CN117914001B (en) Power system, fault studying and judging method, device, equipment and medium
CN115660324B (en) Power grid multi-section out-of-limit regulation and control method and system based on graph reinforcement learning
CN117526309A (en) Power distribution network recovery method and device, electronic equipment and storage medium
CN117057623A (en) Comprehensive power grid safety optimization scheduling method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant