CN117833353A - Simulation training method, device and equipment for power grid active control intelligent agent - Google Patents

Simulation training method, device and equipment for power grid active control intelligent agent Download PDF

Info

Publication number
CN117833353A
CN117833353A CN202311630737.7A CN202311630737A CN117833353A CN 117833353 A CN117833353 A CN 117833353A CN 202311630737 A CN202311630737 A CN 202311630737A CN 117833353 A CN117833353 A CN 117833353A
Authority
CN
China
Prior art keywords
state data
power grid
initial
active control
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311630737.7A
Other languages
Chinese (zh)
Inventor
周毅
沈维健
周良才
陈清
高佳宁
范栋琦
闪鑫
王波
王天禄
骆玮
徐峰
徐希
李雷
郑义明
孙小磊
刘理达
孙志豪
余飞翔
陆廷骧
吴自博
张楷
杨永瑞
夏正国
周志涛
李林鑫
滕书宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Branch Of State Grid Corp ltd
NARI Nanjing Control System Co Ltd
Original Assignee
East China Branch Of State Grid Corp ltd
NARI Nanjing Control System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Branch Of State Grid Corp ltd, NARI Nanjing Control System Co Ltd filed Critical East China Branch Of State Grid Corp ltd
Priority to CN202311630737.7A priority Critical patent/CN117833353A/en
Publication of CN117833353A publication Critical patent/CN117833353A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a simulation training method, device and equipment for an intelligent agent for active control of a power grid, relates to the technical field of power system simulation, and can solve the problem that the intelligent agent for deep reinforcement learning training cannot be applied to an actual power grid model topological structure. The method comprises the following steps: acquiring an agent constructed based on deep reinforcement learning, constructing a power grid simulation environment based on an actual power grid model topological structure and historical operation data, acquiring at least one group of initial state data of power grid equipment in the power grid simulation environment, training the agent, and generating an initial active control action strategy; if the initial active control action strategy meets the preset check condition, updating initial state data according to the initial active control action strategy to obtain next state data; and calculating to obtain an initial action rewarding value according to the initial state data and the next state data, and continuously training the intelligent agent by utilizing the next state data and the initial action rewarding value until a preset stopping condition is reached, so as to obtain the intelligent agent after training is completed.

Description

Simulation training method, device and equipment for power grid active control intelligent agent
Technical Field
The invention relates to the technical field of power system simulation, in particular to a simulation training method, device and equipment for an intelligent power grid active control agent.
Background
In recent years, with the rapid development of big data analysis and new generation artificial intelligence technology, the new problems and new challenges faced by the current power grid are solved by utilizing the artificial intelligence technology based on data driving, and the method becomes a new means and a new method in the process of building a novel power system. The deep reinforcement learning technology plays an important role in power grid scheduling auxiliary decision-making.
In order to research the application of deep reinforcement learning in power grid dispatching, a laboratory-level simulation environment is built based on an IEEE typical topology to verify the application effect of the deep reinforcement learning in power grid regulation and control, but the actual power grid characteristics are remarkably different from the IEEE typical topology, so that an intelligent body of the current deep reinforcement learning training cannot adapt to a power grid model topological structure inconsistent with the training environment, and therefore, the deep reinforcement learning technology is difficult to land in an actual system. How to train an agent algorithm adapting to actual power grid dispatching requirements is a problem to be solved at present.
Disclosure of Invention
In view of the above, the invention provides a simulation training method, device and equipment for an intelligent agent for active control of a power grid, which can solve the technical problem that the intelligent agent for deep reinforcement learning training cannot be applied to an actual power grid model topological structure.
According to a first aspect of the present invention, there is provided a simulation training method of an active control agent for a power grid, the method comprising:
acquiring an agent constructed based on deep reinforcement learning, constructing a power grid simulation environment based on an actual power grid model topological structure and historical operation data, acquiring at least one set of initial state data of power grid equipment in the power grid simulation environment, training the agent by using one set of initial state data, and generating an initial active control action strategy at the current moment;
judging whether the initial active control action strategy meets a preset check condition, if so, updating the initial state data according to the initial active control action strategy to obtain next state data at the next moment;
and calculating to obtain an initial action rewarding value according to the initial state data and the next state data, and continuously training the intelligent agent by utilizing the next state data and the initial action rewarding value until a preset stopping condition is reached, so as to obtain the trained intelligent agent.
Preferably, the method further comprises:
and if the initial active control action strategy does not meet the preset check condition, training the intelligent agent again by using the initial state data until the initial active control action strategy meets the preset check condition.
Preferably, the determining whether the initial active control action policy meets a preset check condition includes:
judging whether the length of an action list in the initial active control action strategy is equal to the number of the current environmental units or not; the method comprises the steps of,
and judging whether the current active output and the active adjustment quantity of the unit in the initial active control action strategy are within the unit output limit after being overlapped.
Preferably, the method further comprises:
and setting the preset stopping condition, wherein the preset stopping condition comprises the group number of the initial state data and the training times of each group.
Preferably, after the trained agent is obtained, the method further comprises:
acquiring state data to be adjusted of the power grid equipment;
inputting the state data to be adjusted into the trained intelligent agent to generate an optimal active control action strategy;
and updating the state data to be adjusted according to the optimal active control action strategy.
Preferably, the acquiring at least one set of initial state data of the power grid device includes:
and acquiring different sets of initial state data of the power grid equipment corresponding to different preset moments according to the actual power grid model topological structure and the historical operation data.
According to a second aspect of the present invention, there is provided a simulation training apparatus for a power grid active control agent, the apparatus comprising:
the system comprises an acquisition module, a control module and a control module, wherein the acquisition module is used for acquiring an agent constructed based on deep reinforcement learning, constructing a power grid simulation environment based on an actual power grid model topological structure and historical operation data, acquiring at least one set of initial state data of power grid equipment in the power grid simulation environment, training the agent by using one set of initial state data, and generating an initial active control action strategy at the current moment;
the judging module is used for judging whether the initial active control action strategy meets a preset check condition or not, if yes, updating the initial state data according to the initial active control action strategy to obtain the next state data at the next moment;
and the first training module is used for calculating and obtaining an initial action rewarding value according to the initial state data and the next state data, and continuously training the intelligent agent by utilizing the next state data and the initial action rewarding value until reaching a preset stopping condition, so as to obtain the trained intelligent agent.
Preferably, the apparatus further comprises:
and the second training module is used for training the intelligent agent again by using the initial state data if the initial active control action strategy does not meet the preset check condition until the initial active control action strategy meets the preset check condition.
According to a third aspect of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method for simulation training of an active control agent of a power grid.
According to a fourth aspect of the present application, there is provided a computer device, including a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, where the processor implements the simulation training method of the power grid active control agent when executing the program.
By means of the technical scheme, the simulation training method, the simulation training device and the simulation training equipment for the intelligent agent for the power grid active control can firstly acquire the intelligent agent constructed based on deep reinforcement learning, construct a power grid simulation environment based on an actual power grid model topological structure and historical operation data, acquire at least one set of initial state data of power grid equipment in the power grid simulation environment, train the intelligent agent by utilizing one set of initial state data, and generate an initial active control action strategy at the current moment; then judging whether the initial active control action strategy meets a preset check condition, if so, updating the initial state data according to the initial active control action strategy to obtain next state data at the next moment; and finally, calculating to obtain an initial action rewarding value according to the initial state data and the next state data, and continuously training the intelligent agent by utilizing the next state data and the initial action rewarding value until reaching a preset stopping condition to obtain the trained intelligent agent. According to the technical scheme, on one hand, the power grid simulation training environment is built based on the actual power grid model topological structure and the historical operation data, so that the actual power grid data (namely at least one group of initial state data of power grid equipment) is obtained from the actual power grid model topological structure and the historical operation data, on the other hand, the state data of the power grid are updated through executing the active control action strategy output by the intelligent body, and the interaction between the intelligent body and the power grid simulation environment is realized, therefore, the calculated state data and action rewarding value in the invention accord with the actual power grid characteristics, and the intelligent body is trained by utilizing the state data and the action rewarding value in an iterative manner, so that the trained intelligent body can adapt to the actual power grid scheduling requirement, and has strong adaptability in the actual power grid application. In the prior art, a laboratory level simulation environment is built based on an IEEE typical topology, and because the IEEE typical topology is different from an actual power grid, an intelligent body trained based on the laboratory level simulation environment cannot adapt to a topological structure inconsistent with a training environment, further cannot adapt to an actual power grid dispatching requirement, and has poor adaptability in actual power grid application.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute an undue limitation to the present application. In the drawings:
fig. 1 shows a flow diagram of a simulation training method of an active control intelligent agent of a power grid, which is provided by an embodiment of the invention;
fig. 2 is a schematic flow chart of another simulation training method of an active control intelligent agent of a power grid according to an embodiment of the present invention;
fig. 3 shows a schematic structural diagram of a simulation training device for an active control intelligent agent of a power grid according to an embodiment of the present invention;
fig. 4 shows a schematic structural diagram of another simulation training apparatus for active control of an intelligent agent in a power grid according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
The embodiment provides a simulation training method of an active control intelligent agent of a power grid, as shown in fig. 1, the method comprises the following steps:
101. acquiring an agent constructed based on deep reinforcement learning, constructing a power grid simulation environment based on an actual power grid model topological structure and historical operation data, acquiring at least one set of initial state data of power grid equipment in the power grid simulation environment, training the agent by using the initial state data, and generating an initial active control action strategy at the current moment.
It should be noted that deep reinforcement learning is used to describe and solve the problem that an Agent (Agent) reaches a maximum return or a specific objective through learning strategies during interaction with an environment. Deep reinforcement learning is learning by an agent in a "trial and error" manner, and obtains rewards through interaction with the environment, so that the agent obtains the maximum rewards, and in the deep reinforcement learning, the environment evaluates the quality of actions generated by the agent, rather than telling the agent how to generate correct actions. In this way, the agent gains knowledge in the context of the action-assessment, improving the action plan to suit the context. Deep reinforcement learning regards learning as a heuristic evaluation process, wherein an agent selects an action for an environment, the state of the environment changes after receiving the action, and a prize or punishment signal is generated and fed back to the agent, and the agent selects the next action according to the prize or punishment signal and the current state of the environment, wherein the selection principle is that the probability of being subjected to positive reinforcement (prize) is increased. The selected action affects not only the immediate prize value, but also the state at the next moment in the environment and the final prize value. Namely, the training process of the intelligent agent is to learn what action strategy is output by the intelligent agent, so that the initial state can be adjusted by using the action strategy to achieve a specific target state.
Wherein the agent built based on deep reinforcement learning has not been trained yet.
The method comprises the steps of constructing a power grid simulation environment based on an actual power grid model topological structure and historical operation data, namely, storing the actual power grid model topological structure and the historical operation data in an environment, taking the environment as an initial power grid simulation environment, setting an initial value of power grid simulation environment parameters, and setting a power grid simulation environment state space, so that the power grid simulation environment is obtained, in the power grid simulation environment, at least one group of initial state data of power grid equipment can be obtained according to the actual power grid model topological structure and the historical operation data, and an initial active control action strategy generated by an agent can be simulated and executed on the actual power grid model topological structure, so that the initial state data of the power grid equipment in the actual power grid model topological structure is updated into next state data of the power grid equipment through the initial active control action strategy.
The actual power grid model topological structure comprises the following steps: at least one of line voltage class, thermal stability limit and power limit, at least one of bus voltage class, upper voltage limit, lower voltage limit and station to which the transformer belongs, at least one of transformer voltage class, thermal stability limit, power limit and station to which the transformer belongs, line bus connection relationship, transformer bus connection relationship, section composition and power limit. The historical operating data includes: at least one of line number, name, active value, reactive value and current value, at least one of main transformer number, name, active value, reactive value and current value and at least one of bus number, name, active value and reactive value. The power grid simulation environment parameters include: the current moment reserves decimal digits and the allowed precision of action strategies. The power grid simulation environment state space comprises: the current moment, the current step number, the active and reactive power of the unit/line/bus, the convergence condition of the line tide, the on-off state of the unit, the bus voltage amplitude and the bus voltage phase angle.
For obtaining at least one set of initial state data of the power grid equipment according to the actual power grid model topological structure and the historical operation data, specifically, the actual power grid model topological structure comprises nodes and connection relations among the nodes, the connection relations among the nodes are all the power grid equipment, the historical operation data are the historical operation data of the power grid equipment, therefore, a plurality of preset moments can be selected, and the actual power grid model topological structure and the historical operation data at each preset moment correspond to one set of initial state data of the power grid equipment, so that at least one set of initial state data of the power grid equipment is obtained.
102. And judging whether the initial active control action strategy meets a preset check condition, if so, updating the initial state data according to the initial active control action strategy to obtain the next state data at the next moment.
103. And calculating to obtain an initial action rewarding value according to the initial state data and the next state data, and continuously training the intelligent agent by utilizing the next state data and the initial action rewarding value until a preset stopping condition is reached, so as to obtain the trained intelligent agent.
For the embodiment steps 101-103, taking one preset time as the current time, inputting a corresponding set of initial state data into the intelligent agent, after the training of the set of initial state data is completed, training other sets of initial state data until the training of all sets of initial state data is completed, wherein the intelligent agent needs to achieve a specific target state through the output of an action strategy, when the intelligent agent inputs a set of initial state data, in order to achieve the specific target state, the intelligent agent randomly outputs an initial active control action strategy, after executing the initial active control action strategy, obtains next state data at the next time, hopefully, the next state data is the specific target state, but the next state data is not the specific target state, and at this time, the initial active control action strategy output by the intelligent agent is evaluated, wherein the evaluation is based on the fact that whether the adjustment direction of the next state data is in a direction close to the specific target state compared with the initial state data or not, for example. The evaluation is measured by the initial action reward value, which is fed back to the agent, which receives the feedback and the next state data, on the basis of which the training is continued.
In summary, the training process of the agent according to the initial state data is described in detail:
acquiring multiple sets of initial state data (one preset time corresponds to one set of initial state data) at different preset times, taking one preset time as a current time t, inputting the corresponding set of initial state data into an intelligent agent, randomly outputting an initial active control action strategy by the intelligent agent, executing the initial active control action strategy if the initial active control action strategy meets preset verification conditions, updating the next state data of the next time t+1 from the initial state data of the current time t by power grid equipment, calculating an initial action rewarding value according to the initial state data of the current time t and the next state data of the next time t+1, inputting the next state data and the initial action rewarding value into the intelligent agent for continuous training, randomly outputting a second active control action strategy by the intelligent agent, executing the second active control action strategy if the second active control action strategy meets preset verification conditions, and calculating the rewarding number of times by power grid equipment from the next state data of the next time t+1 to the next state data of the next time t+2 according to the next state data of the next time t+1. Similarly, another preset time is taken as the current time, the corresponding another set of initial state data is continuously input into the intelligent agent, the training mode is the same, and the detailed description is omitted until all sets of initial state data are trained, and the intelligent agent obtains the trained intelligent agent in the iterative training process.
According to the simulation training method, device and equipment for the power grid active control intelligent agent, the intelligent agent constructed based on deep reinforcement learning can be firstly obtained, a power grid simulation environment is constructed based on an actual power grid model topological structure and historical operation data, at least one set of initial state data of power grid equipment is obtained in the power grid simulation environment, the intelligent agent is trained by using the initial state data, and an initial active control action strategy at the current moment is generated; then judging whether the initial active control action strategy meets a preset check condition, if so, updating the initial state data according to the initial active control action strategy to obtain next state data at the next moment; and finally, calculating to obtain an initial action rewarding value according to the initial state data and the next state data, and continuously training the intelligent agent by utilizing the next state data and the initial action rewarding value until reaching a preset stopping condition to obtain the trained intelligent agent. According to the technical scheme, on one hand, the power grid simulation training environment is built based on the actual power grid model topological structure and the historical operation data, so that the actual power grid data (namely at least one group of initial state data of power grid equipment) is obtained from the actual power grid model topological structure and the historical operation data, on the other hand, the state data of the power grid are updated through executing the active control action strategy output by the intelligent body, and the interaction between the intelligent body and the power grid simulation environment is realized, therefore, the calculated state data and action rewarding value in the invention accord with the actual power grid characteristics, and the intelligent body is trained by utilizing the state data and the action rewarding value in an iterative manner, so that the trained intelligent body can adapt to the actual power grid scheduling requirement, and has strong adaptability in the actual power grid application. In the prior art, a laboratory level simulation environment is built based on an IEEE typical topology, and because the IEEE typical topology is different from an actual power grid, an intelligent body trained based on the laboratory level simulation environment cannot adapt to a topological structure inconsistent with a training environment, further cannot adapt to an actual power grid dispatching requirement, and has poor adaptability in actual power grid application.
Further, as a refinement and extension of the specific implementation manner of the foregoing embodiment, in order to fully describe the specific implementation process in this embodiment, another simulation training method of the active control agent of the power grid is provided, as shown in fig. 2, where the method includes:
201. and constructing a power grid simulation environment based on an actual power grid model topological structure and historical operation data, and acquiring different sets of initial state data of power grid equipment corresponding to different preset moments according to the actual power grid model topological structure and the historical operation data in the power grid simulation environment.
202. And acquiring an agent constructed based on deep reinforcement learning, training the agent by using a group of initial state data, and generating an initial active control action strategy at the current moment.
For example steps 201 and 202, the specific implementation is the same as example step 101, and will not be repeated here.
203. And judging whether the initial active control action strategy meets a preset check condition, if so, updating the initial state data according to the initial active control action strategy to obtain the next state data at the next moment.
Wherein, the action strategy is a unit active power adjustment list delta P g And an adjustable load list ΔP l Specifically, it isWherein n is the number of units, and m is the number of adjustable loads.
In the application scenario of the present embodiment, the active control action policy means that the generated action policy is active controlled.
For this embodiment, as an implementation manner, the determining whether the initial active control action policy meets a preset check condition includes: checking the dimension of the intelligent agent action list, namely judging whether the action list length in the initial active control action strategy is equal to the number of the current environmental units or not; and checking the adjustment value limit, namely judging whether the current active output of the unit in the initial active control action strategy and the active adjustment amount are within the unit output limit after being overlapped.
204. And if the initial active control action strategy does not meet the preset check condition, training the intelligent agent again by using the initial state data until the initial active control action strategy meets the preset check condition.
For the embodiment, the retraining agent policy after the preset verification condition is not satisfied may be preset, and may include 1, without replacing other sets of initial state data; or 2, replacing other sets of initial state data. Retraining the agent using the initial state data, comprising: and retraining the agent according to the retraining agent strategy and the initial state data.
Specifically, there are multiple groups of initial state data, if a certain group of initial state data is used for training an agent, the obtained initial active control action strategy does not meet the preset check condition, as an implementation manner, since the action strategy output by the agent is random after the agent acquires the data input, one action strategy does not meet the preset check condition, and the action strategy obtained by retraining by using the same group of data may meet the preset check condition, so that other groups of initial state data do not need to be replaced, but the agent is still retrained by using the same group of initial state data. As another embodiment, to improve efficiency of action policy generation, when the initial active control action policy does not satisfy a preset check condition, other sets of initial state data are replaced.
It should be noted that, only before updating the state data with the active control action policy, it is required to determine whether the active control action policy satisfies the preset check condition.
205. And setting the preset stopping condition, wherein the preset stopping condition comprises the group number of the initial state data and the training times of each group.
For example, the number of the initial state data sets is 5, and each training time is 100, so when the number of the initial state data sets is 100, the preset stopping condition is reached, the training is stopped, and the trained intelligent body is obtained.
206. And calculating to obtain an initial action rewarding value according to the initial state data and the next state data, and continuously training the intelligent agent by utilizing the next state data and the initial action rewarding value until a preset stopping condition is reached, so as to obtain the trained intelligent agent.
It should be noted that, after a certain set of initial state data is trained for the first time, the corresponding initial action rewarding value is calculated by the initial state data and the next state data, and after the second training is performed, the corresponding second action rewarding value is calculated by the next state data and the next state data, and the calculation modes are as follows:
wherein C is sys Index for measuring unbalanced degree of system tide, P i or And P i ex The active power measured by the transmitting end and the receiving end of the ith line, K is the number of system lines, L is the number of stable sections of the system, and P inter_i Is the power flow of the ith section,is the stability limit of the ith section, D of Is the sum of squares of the cross section of the system, D p Is the unbalance of system power, n is the number of generators, m is the number of adjustable loads, P loss Is the system power shortage, E 1 Is a positive offset coefficient, and is used for rewarding positive E when the corresponding condition is satisfied 2 -E 4 Is a positive constant, and functions to adjust the ratio between the respective components and limit the range of values of the prize.
207. And acquiring the state data to be regulated of the power grid equipment, inputting the state data to be regulated into the trained intelligent body, generating an optimal active control action strategy, and updating the state data to be regulated according to the optimal active control action strategy.
It should be noted that, in the embodiment steps 201 to 206, training is automatically completed in the power grid simulation environment, the trained agent is more accurate in the actual power grid application, and in this embodiment, after the trained agent is obtained, the state data to be adjusted is input, the trained agent can output the optimal active control action strategy according to the input state data to be adjusted, and execute the optimal active control action strategy, so that the state data to be adjusted can be updated, and the adjusted state data reaches the target state.
According to the simulation training method, device and equipment for the power grid active control intelligent agent, the intelligent agent constructed based on deep reinforcement learning can be firstly obtained, a power grid simulation environment is constructed based on an actual power grid model topological structure and historical operation data, at least one set of initial state data of power grid equipment is obtained in the power grid simulation environment, the intelligent agent is trained by using the initial state data, and an initial active control action strategy at the current moment is generated; then judging whether the initial active control action strategy meets a preset check condition, if so, updating the initial state data according to the initial active control action strategy to obtain next state data at the next moment; and finally, calculating to obtain an initial action rewarding value according to the initial state data and the next state data, and continuously training the intelligent agent by utilizing the next state data and the initial action rewarding value until reaching a preset stopping condition to obtain the trained intelligent agent. According to the technical scheme, on one hand, the power grid simulation training environment is built based on the actual power grid model topological structure and the historical operation data, so that the actual power grid data (namely at least one group of initial state data of power grid equipment) is obtained from the actual power grid model topological structure and the historical operation data, on the other hand, the state data of the power grid are updated through executing the active control action strategy output by the intelligent body, and the interaction between the intelligent body and the power grid simulation environment is realized, therefore, the calculated state data and action rewarding value in the invention accord with the actual power grid characteristics, and the intelligent body is trained by utilizing the state data and the action rewarding value in an iterative manner, so that the trained intelligent body can adapt to the actual power grid scheduling requirement, and has strong adaptability in the actual power grid application. In the prior art, a laboratory level simulation environment is built based on an IEEE typical topology, and because the IEEE typical topology is different from an actual power grid, an intelligent body trained based on the laboratory level simulation environment cannot adapt to a topological structure inconsistent with a training environment, further cannot adapt to an actual power grid dispatching requirement, and has poor adaptability in actual power grid application.
Further, as a specific implementation of the method shown in fig. 1 and fig. 2, an embodiment of the present invention provides a simulation training apparatus for an active control agent of a power grid, as shown in fig. 3, where the apparatus includes: the device comprises an acquisition module 31, a judgment module 32 and a first training module 33;
the obtaining module 31 is configured to obtain an agent constructed based on deep reinforcement learning, construct a power grid simulation environment based on an actual power grid model topology structure and historical operation data, obtain at least one set of initial state data of power grid equipment in the power grid simulation environment, train the agent by using one set of initial state data, and generate an initial active control action strategy at the current moment;
the judging module 32 is configured to judge whether the initial active control action policy meets a preset verification condition, if yes, update the initial state data according to the initial active control action policy, and obtain next state data at a next moment;
the first training module 33 is configured to calculate an initial action rewarding value according to the initial state data and the next state data, and continue training the agent by using the next state data and the initial action rewarding value until reaching a preset stopping condition, thereby obtaining the trained agent.
In a specific application scenario, as shown in fig. 4, a simulation training apparatus for an active control agent of a power grid, the apparatus further includes: the second training module 34 is specifically configured to, if the initial active control action policy does not meet the preset check condition, re-train the agent by using the initial state data until the initial active control action policy meets the preset check condition.
Correspondingly, in order to determine whether the initial active control action policy meets a preset check condition, the determining module 32 may be specifically configured to determine whether the action list length in the initial active control action policy is equal to the number of current environmental units; and judging whether the current active output and the active adjustment quantity of the unit in the initial active control action strategy are within the unit output limit after being overlapped.
In a specific application scenario, as shown in fig. 4, a simulation training apparatus for an active control agent of a power grid, the apparatus further includes: the setting module 35 is specifically configured to set the preset stopping condition, where the preset stopping condition includes the number of sets of the initial state data and the training number of times of each set.
In a specific application scenario, as shown in fig. 4, a simulation training apparatus for an active control agent of a power grid, the apparatus further includes: the application module 36 is specifically configured to obtain status data to be adjusted of the power grid device; inputting the state data to be adjusted into the trained intelligent agent to generate an optimal active control action strategy; and updating the state data to be adjusted according to the optimal active control action strategy.
Correspondingly, in order to obtain at least one set of initial state data of the power grid device, the obtaining module 31 may be specifically further configured to obtain different sets of initial state data of the power grid device corresponding to different preset moments according to the actual power grid model topology structure and the historical operation data.
It should be noted that, other corresponding descriptions of each functional unit related to the simulation training apparatus of the power grid active control intelligent agent provided in this embodiment may refer to corresponding descriptions of fig. 1 to 2, and are not repeated herein.
Based on the above-mentioned method shown in fig. 1 to 2, correspondingly, the present embodiment further provides a storage medium, which may be specifically volatile or nonvolatile, and has a computer program stored thereon, where the program, when executed by a processor, implements the above-mentioned simulation training method for the active control agent of the power grid shown in fig. 1 to 2.
Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which may be stored in a storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to execute the method of each implementation scenario of the present invention.
Based on the method shown in fig. 1 to 2 and the virtual device embodiments shown in fig. 3 and 4, in order to achieve the above object, the present embodiment further provides a computer device, where the computer device includes a storage medium and a processor; a storage medium storing a computer program; and the processor is used for executing the computer program to realize the simulation training method of the power grid active control intelligent agent shown in the figures 1-2.
Optionally, the computer device may also include a user interface, a network interface, a camera, radio Frequency (RF) circuitry, sensors, audio circuitry, WI-FI modules, and the like. The user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), etc.
It will be appreciated by those skilled in the art that the architecture of a computer device provided in this embodiment is not limited to this physical device, but may include more or fewer components, or may be combined with certain components, or may be arranged in a different arrangement of components.
The storage medium may also include an operating system, a network communication module. An operating system is a program that manages the computer device hardware and software resources described above, supporting the execution of information handling programs and other software and/or programs. The network communication module is used for communication among components in the actual storage medium and communication with other hardware and software in the information processing entity device.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present invention may be implemented by means of software plus necessary general hardware platforms, or may be implemented by hardware.
According to the simulation training method, device and equipment for the power grid active control intelligent agent, the intelligent agent constructed based on deep reinforcement learning can be firstly obtained, a power grid simulation environment is constructed based on an actual power grid model topological structure and historical operation data, at least one set of initial state data of power grid equipment is obtained in the power grid simulation environment, the intelligent agent is trained by using the initial state data, and an initial active control action strategy at the current moment is generated; then judging whether the initial active control action strategy meets a preset check condition, if so, updating the initial state data according to the initial active control action strategy to obtain next state data at the next moment; and finally, calculating to obtain an initial action rewarding value according to the initial state data and the next state data, and continuously training the intelligent agent by utilizing the next state data and the initial action rewarding value until reaching a preset stopping condition to obtain the trained intelligent agent. According to the technical scheme, on one hand, the power grid simulation training environment is built based on the actual power grid model topological structure and the historical operation data, so that the actual power grid data (namely at least one group of initial state data of power grid equipment) is obtained from the actual power grid model topological structure and the historical operation data, on the other hand, the state data of the power grid are updated through executing the active control action strategy output by the intelligent body, and the interaction between the intelligent body and the power grid simulation environment is realized, therefore, the calculated state data and action rewarding value in the invention accord with the actual power grid characteristics, and the intelligent body is trained by utilizing the state data and the action rewarding value in an iterative manner, so that the trained intelligent body can adapt to the actual power grid scheduling requirement, and has strong adaptability in the actual power grid application. In the prior art, a laboratory level simulation environment is built based on an IEEE typical topology, and because the IEEE typical topology is different from an actual power grid, an intelligent body trained based on the laboratory level simulation environment cannot adapt to a topological structure inconsistent with a training environment, further cannot adapt to an actual power grid dispatching requirement, and has poor adaptability in actual power grid application.
Those skilled in the art will appreciate that the drawing is merely a schematic illustration of a preferred implementation scenario and that the modules or flows in the drawing are not necessarily required to practice the invention. Those skilled in the art will appreciate that modules in an apparatus in an implementation scenario may be distributed in an apparatus in an implementation scenario according to an implementation scenario description, or that corresponding changes may be located in one or more apparatuses different from the implementation scenario. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above-mentioned inventive sequence numbers are merely for description and do not represent advantages or disadvantages of the implementation scenario. The foregoing disclosure is merely illustrative of some embodiments of the invention, and the invention is not limited thereto, as modifications may be made by those skilled in the art without departing from the scope of the invention.

Claims (10)

1. The simulation training method of the power grid active control intelligent agent is characterized by comprising the following steps of:
acquiring an agent constructed based on deep reinforcement learning, constructing a power grid simulation environment based on an actual power grid model topological structure and historical operation data, acquiring at least one set of initial state data of power grid equipment in the power grid simulation environment, training the agent by using one set of initial state data, and generating an initial active control action strategy at the current moment;
judging whether the initial active control action strategy meets a preset check condition, if so, updating the initial state data according to the initial active control action strategy to obtain next state data at the next moment;
and calculating to obtain an initial action rewarding value according to the initial state data and the next state data, and continuously training the intelligent agent by utilizing the next state data and the initial action rewarding value until a preset stopping condition is reached, so as to obtain the trained intelligent agent.
2. The method according to claim 1, wherein the method further comprises:
and if the initial active control action strategy does not meet the preset check condition, training the intelligent agent again by using the initial state data until the initial active control action strategy meets the preset check condition.
3. The method according to claim 1 or 2, wherein said determining whether the initial active control action policy meets a preset check condition comprises:
judging whether the length of an action list in the initial active control action strategy is equal to the number of the current environmental units or not; the method comprises the steps of,
and judging whether the current active output and the active adjustment quantity of the unit in the initial active control action strategy are within the unit output limit after being overlapped.
4. The method according to claim 1, wherein the method further comprises:
and setting the preset stopping condition, wherein the preset stopping condition comprises the group number of the initial state data and the training times of each group.
5. The method of claim 1, wherein after the trained agent is obtained, the method further comprises:
acquiring state data to be adjusted of the power grid equipment;
inputting the state data to be adjusted into the trained intelligent agent to generate an optimal active control action strategy;
and updating the state data to be adjusted according to the optimal active control action strategy.
6. The method of claim 1, wherein the obtaining at least one set of initial state data for the power grid device comprises:
and acquiring different sets of initial state data of the power grid equipment corresponding to different preset moments according to the actual power grid model topological structure and the historical operation data.
7. A simulation training apparatus for an active control agent of a power grid, the apparatus comprising:
the system comprises an acquisition module, a control module and a control module, wherein the acquisition module is used for acquiring an agent constructed based on deep reinforcement learning, constructing a power grid simulation environment based on an actual power grid model topological structure and historical operation data, acquiring at least one set of initial state data of power grid equipment in the power grid simulation environment, training the agent by using one set of initial state data, and generating an initial active control action strategy at the current moment;
the judging module is used for judging whether the initial active control action strategy meets a preset check condition or not, if yes, updating the initial state data according to the initial active control action strategy to obtain the next state data at the next moment;
and the first training module is used for calculating and obtaining an initial action rewarding value according to the initial state data and the next state data, and continuously training the intelligent agent by utilizing the next state data and the initial action rewarding value until reaching a preset stopping condition, so as to obtain the trained intelligent agent.
8. The apparatus of claim 7, wherein the apparatus further comprises:
and the second training module is used for training the intelligent agent again by using the initial state data if the initial active control action strategy does not meet the preset check condition until the initial active control action strategy meets the preset check condition.
9. A storage medium having stored thereon a computer program, which when executed by a processor, implements a method of simulated training of an active control agent of a power grid as claimed in any one of claims 1 to 6.
10. A computer device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, characterized in that the processor implements a simulated training method of the power grid active control agent according to any of claims 1 to 6 when executing the computer program.
CN202311630737.7A 2023-11-30 2023-11-30 Simulation training method, device and equipment for power grid active control intelligent agent Pending CN117833353A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311630737.7A CN117833353A (en) 2023-11-30 2023-11-30 Simulation training method, device and equipment for power grid active control intelligent agent

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311630737.7A CN117833353A (en) 2023-11-30 2023-11-30 Simulation training method, device and equipment for power grid active control intelligent agent

Publications (1)

Publication Number Publication Date
CN117833353A true CN117833353A (en) 2024-04-05

Family

ID=90504875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311630737.7A Pending CN117833353A (en) 2023-11-30 2023-11-30 Simulation training method, device and equipment for power grid active control intelligent agent

Country Status (1)

Country Link
CN (1) CN117833353A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112615379A (en) * 2020-12-10 2021-04-06 浙江大学 Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning
US20210356923A1 (en) * 2020-05-15 2021-11-18 Tsinghua University Power grid reactive voltage control method based on two-stage deep reinforcement learning
CN113964884A (en) * 2021-11-17 2022-01-21 国家电网有限公司华东分部 Power grid active frequency regulation and control method based on deep reinforcement learning
WO2022077693A1 (en) * 2020-10-15 2022-04-21 中国科学院深圳先进技术研究院 Load prediction model training method and apparatus, storage medium, and device
CN115293052A (en) * 2022-09-01 2022-11-04 国家电网有限公司华北分部 Power system active power flow online optimization control method, storage medium and device
CN115833147A (en) * 2022-12-12 2023-03-21 广东电网有限责任公司 Reactive voltage optimization method, device, equipment and medium based on reinforcement learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210356923A1 (en) * 2020-05-15 2021-11-18 Tsinghua University Power grid reactive voltage control method based on two-stage deep reinforcement learning
WO2022077693A1 (en) * 2020-10-15 2022-04-21 中国科学院深圳先进技术研究院 Load prediction model training method and apparatus, storage medium, and device
CN112615379A (en) * 2020-12-10 2021-04-06 浙江大学 Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning
CN113964884A (en) * 2021-11-17 2022-01-21 国家电网有限公司华东分部 Power grid active frequency regulation and control method based on deep reinforcement learning
CN115293052A (en) * 2022-09-01 2022-11-04 国家电网有限公司华北分部 Power system active power flow online optimization control method, storage medium and device
CN115833147A (en) * 2022-12-12 2023-03-21 广东电网有限责任公司 Reactive voltage optimization method, device, equipment and medium based on reinforcement learning

Similar Documents

Publication Publication Date Title
US20210133536A1 (en) Load prediction method and apparatus based on neural network
CN104636801B (en) A kind of prediction transmission line of electricity audible noise method based on Optimized BP Neural Network
CN114362196B (en) Multi-time-scale active power distribution network voltage control method
CN106684885B (en) Wind turbine generator system power distribution network reactive power optimization method based on multi-scene analysis
CN112818588B (en) Optimal power flow calculation method, device and storage medium of power system
CN114784823A (en) Micro-grid frequency control method and system based on depth certainty strategy gradient
CN104037761A (en) AGC power multi-objective random optimization distribution method
CN115085202A (en) Power grid multi-region intelligent power collaborative optimization method, device, equipment and medium
CN115860170A (en) Power quality optimization method of power distribution system considering power electronic load
CN104021315A (en) Method for calculating station service power consumption rate of power station on basis of BP neutral network
CN112018813B (en) Virtual synchronous generator frequency control method
CN117833353A (en) Simulation training method, device and equipment for power grid active control intelligent agent
CN116995682A (en) Adjustable load participation active power flow continuous adjustment method and system
CN116544934A (en) Power scheduling method and system based on power load prediction
CN115693655A (en) Load frequency control method, device and equipment based on TS fuzzy control
CN109670210A (en) A kind of method of optimization for power electronic circuit based on parallel distributed particle swarm algorithm
Skiparev et al. Reinforcement learning based MIMO controller for virtual inertia control in isolated microgrids
CN111478331B (en) Method and system for adjusting power flow convergence of power system
CN113991752A (en) Power grid quasi-real-time intelligent control method and system
Subbaraj et al. Generation control of interconnected power systems using computational intelligence techniques
CN112329995A (en) Optimal scheduling method and device for distributed energy storage cluster and computer equipment
CN116154771B (en) Control method of power equipment, equipment control method and electronic equipment
CN114421470B (en) Intelligent real-time operation control method for flexible diamond type power distribution system
CN113708390B (en) Intelligent equipment real-time control method and system for three-phase imbalance treatment of power distribution station area
CN112510716B (en) Power flow calculation method and device of power supply system, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination