CN106296044B

CN106296044B - Power system risk scheduling method and system

Info

Publication number: CN106296044B
Application number: CN201610882652.1A
Authority: CN
Inventors: 郭晓斌; 许爱东; 简淦杨; 魏文潇; 占恺峤; 史训涛; 谭勤学; 吴俊阳; 韩传家; 余涛
Original assignee: South China University of Technology SCUT; CSG Electric Power Research Institute
Current assignee: South China University of Technology SCUT; CSG Electric Power Research Institute
Priority date: 2016-10-08
Filing date: 2016-10-08
Publication date: 2023-08-25
Anticipated expiration: 2036-10-08
Also published as: CN106296044A

Abstract

The invention relates to a power system risk scheduling method and a power system risk scheduling system, which are used for acquiring framework data and new task load section data of a power system; and according to the framework data and the new task load section data, iteratively updating a preset initial knowledge matrix through a bacterial foraging reinforcement learning algorithm to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix. And performing online optimization of the new tasks according to the updated knowledge matrix corresponding to the minimum risk scheduling objective function value, obtaining a risk scheduling optimization result and outputting the risk scheduling optimization result. And taking the optimal knowledge matrix in the source task as an initial matrix of the new task to realize knowledge migration, and carrying out online optimization on the new task by utilizing bacterial foraging reinforcement learning based on the knowledge migration. The speed of online learning is greatly improved through transfer learning, the online dynamic optimization of the risk scheduling problem is realized, and when the problem scale is further enlarged, the faster solving speed can be ensured, so that the method can be suitable for the rapid optimization of large-scale complex risk scheduling.

Description

Power system risk scheduling method and system

Technical Field

The invention relates to the technical field of power grids, in particular to a power system risk scheduling method and system.

Background

In recent years, with the development of regional power grid interconnection and high-voltage long-distance large-capacity power transmission, the safe and stable operation of a power system faces more serious challenges. In order to better balance the safety and economic benefit of the system, enhance the level of the operation risk resistance of the scheduling operation, introduce the risk theory of the power system in the power generation optimization, and carry out a great deal of research on risk scheduling.

The traditional power system risk scheduling method is to apply intelligent algorithms such as genetics (genetic algorithm, GA), quantum genetics (quantum genetic algorithm, QGA), bee colony (artificial bee colony, ABC), particle swarm (particle swarm optimization, PSO) and the like to each optimization problem of the power system. However, the optimization of the intelligent algorithm on similar tasks is carried out in isolation, experience and knowledge of the past tasks cannot be effectively saved, self-learning capability is lacked, and the intelligent algorithm needs to be reinitialized when each new task is executed, so that the optimization efficiency is low, and the intelligent algorithm is difficult to adapt to the rapid optimization of large-scale complex risk scheduling.

Disclosure of Invention

Based on the foregoing, it is necessary to provide a power system risk scheduling method and system that can adapt to the rapid optimization of large-scale complex risk scheduling.

A power system risk scheduling method, comprising the steps of:

acquiring architecture data and new task load section data of a power system;

according to the framework data and the new task load section data, iteratively updating a preset initial knowledge matrix through a bacterial foraging reinforcement learning algorithm to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix; the initial knowledge matrix is an optimal knowledge matrix in a source task;

and performing online optimization of the new tasks according to the updated knowledge matrix corresponding to the minimum risk scheduling objective function value, obtaining a risk scheduling optimization result and outputting the risk scheduling optimization result.

A power system risk dispatch system comprising:

the task data acquisition module is used for acquiring the architecture data of the power system and the section data of the new task load;

the knowledge matrix updating module is used for carrying out iterative updating on a preset initial knowledge matrix through a bacterial foraging reinforcement learning algorithm according to the framework data and the new task load section data to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix; the initial knowledge matrix is an optimal knowledge matrix in a source task;

and the risk scheduling optimization module is used for carrying out online optimization on the new tasks according to the updated knowledge matrix corresponding to the minimum risk scheduling objective function value, obtaining and outputting a risk scheduling optimization result.

According to the power system risk scheduling method and system, the framework data and the new task load section data of the power system are obtained; and according to the framework data and the new task load section data, iteratively updating a preset initial knowledge matrix through a bacterial foraging reinforcement learning algorithm to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix. And performing online optimization of the new tasks according to the updated knowledge matrix corresponding to the minimum risk scheduling objective function value, obtaining a risk scheduling optimization result and outputting the risk scheduling optimization result. And taking the optimal knowledge matrix in the source task as an initial matrix of the new task to realize knowledge migration, and carrying out online optimization on the new task by utilizing bacterial foraging reinforcement learning based on the knowledge migration. The speed of online learning is greatly improved through transfer learning, the online dynamic optimization of the risk scheduling problem is realized, and when the problem scale is further enlarged, the faster solving speed can be ensured, so that the method can be suitable for the rapid optimization of large-scale complex risk scheduling.

Drawings

FIG. 1 is a flow chart of a power system risk scheduling method according to an embodiment;

FIG. 2 is a schematic diagram of knowledge acquisition of a bacterial foraging reinforcement learning algorithm based on knowledge migration in one embodiment;

FIG. 3 is a schematic diagram of dimension reduction based on knowledge extension in one embodiment;

FIG. 4 is a diagram illustrating knowledge migration in one embodiment;

FIG. 5 is a topology of a test system in one embodiment;

fig. 6 is a block diagram of a power system risk scheduling system in an embodiment.

Detailed Description

In one embodiment, a power system risk scheduling method, as shown in fig. 1, includes the following steps:

step S120: and acquiring the architecture data and the new task load section data of the power system.

The architecture data of the power system may include bus bar nodes, transmission lines, transformers, generators, and the like. The new task load profile data includes one or more load profiles, each load profile acting as a new task. And acquiring framework data and new task load section data of the power system for subsequent risk scheduling optimization.

Step S130: and according to the framework data and the new task load section data, iteratively updating a preset initial knowledge matrix through a bacterial foraging reinforcement learning algorithm to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix.

The initial knowledge matrix is the optimal knowledge matrix in the source task. And (3) taking the optimal knowledge matrix in the source task as an initial matrix of the new task to realize knowledge migration, and executing action selection by combining a random search mode of a bacterial colony and a probability space action selection strategy to realize online optimization of the new task by utilizing a bacterial foraging reinforcement learning algorithm (Transfer Bacteria Foraging Optimization, TBFO) based on knowledge migration.

The specific type of the initial knowledge matrix is not unique, and in this embodiment, the initial knowledge matrix is a Q matrix. In the Q learning algorithm, element Q (s, a) in the Q matrix represents the desire to select the jackpot value for action a in state s. The matrix records knowledge of the process by which the agent maps states to actions. And taking the Q matrix as a knowledge matrix for recording group optimization information, forming an initial knowledge matrix of a new task by using the knowledge matrix of a source task through analyzing the similarity among different optimization tasks, and realizing online dynamic optimization of different time section tasks in a knowledge migration mode so as to ensure the optimization reliability.

In the TBFO algorithm, a bacterial group obtains an action strategy aiming at a specific environmental state from an initial knowledge matrix, and the feedback information obtained from repeated experiments is used for updating the original knowledge to form an inherent reaction to the specific state so as to maximize the accumulated energy value in the foraging process of the bacterial group.

In one embodiment, step S130 includes steps 131 through 136.

Step 131: according to the architecture data and the new task load section data, controlling bacteria to perform trending operation, migration operation and replicative operation under the guidance of an initial knowledge matrix.

Under the guidance of the initial knowledge matrix, the bacteria acquire knowledge through a chemotactic operation, a migratory operation and a replicative operation. In the TBFO algorithm, all bacteria will search for the foraging area based on the initial knowledge matrix and feed back the resulting reward to the knowledge matrix. As shown in fig. 2, TBFO divides bacteria into two states, tending and migrating, according to the operation being performed. In the algorithm single iteration loop, two states are respectively given to a certain proportion of bacterial individuals, after each operation is executed by two groups of bacteria, the energy values of all bacteria are calculated and ordered, and replicative operation is carried out, so that the accumulated energy value in the foraging process of the bacterial group is maximized. In a new iteration cycle, reassigning the bacterial state according to the energy value in the previous iteration, keeping the area where the bacteria with larger energy value are located unchanged, performing chemotactic operation, and performing migration operation by the bacteria with lower energy value.

Specifically, based on energy value ranking, dominant individuals in the flora are placed in a trending state, and still assume the task of local searching. Its trending behavior can be represented by the following formula:

in θ ⁱ (j, k, l) is the position of the individual bacteria i after the first, k and j-th transfer operations; delta represents a unit vector in a random direction determined after the walk.

C _k (i) Either a fixed step size or a variable step size. In this embodiment, C _k (i) For non-linearly decreasing inertial step, C _k (i) The update method is as follows:

in the formula C _k (i) For the inertia step at the kth iteration, C ₀ For initial running step length, C _e For the final running step, cly is the maximum number of iterative steps.

For bacteria in a migrating state, when the bacteria meet the migration probability P _ed When the bacteria select the wheel disc according to the action probability matrix; otherwise, the bacteria migrate according to actions corresponding to the maximum knowledge elements (greedy strategy):

wherein: the superscript i represents the ith controllable variable, and corresponds to the ith sub-knowledge matrix, i epsilon M; m is a controllable variable set; superscript j represents the j-th bacterium, j epsilon N, N is a flora set; p (P) _ed Is the migration probability; r is a random number between 0 and 1; a, a _s Then it is the probability matrix P ⁱ Actions selected in the global scope. When the migration condition is satisfied, bacteria act on the action probability matrix P ⁱ Performing pseudo-random wheel selection; p (P) ⁱ The update mode of (2) is as follows:

wherein beta is a difference coefficient for amplifying Q ⁱ The variability of the matrix elements; e, e ⁱ Belonging to the intermediate calculation matrix.

In one embodiment, a crossover process is introduced in the replicative operation in the following manner:

θ ^i+S/2 (j,k,l)＝rθ ⁱ (j,k,l)+(1-r)θ ^i+S/2 (j,k,l)

wherein S is the number of individual bacteria, i is [1, S/2], and r is the random number in [0,1 ].

Step 132: according to the trending operation, the migration operation and the replicative operation of bacteria, calculating the tide value of the electric power system under the ground state and the preset faults.

After the trending operation, the migration operation and the replicative operation of the bacteria are finished, calculating the tide value of the electric power system under the ground state and the preset faults according to the corresponding results. The ground state means that the system has no system fault, and the specific type of the preset fault is not unique.

Step 133: and calculating according to the power flow value of the power system in the ground state and under the preset fault to obtain the risk scheduling objective function value.

The immediate prize value reflects the direction of optimization in the TBFO algorithm, and the flora obtains an optimal strategy by iteratively optimizing the knowledge matrix in hopes of obtaining the maximum jackpot prize function value. In the risk dispatch mathematical model, the objective function is the inverse of the algorithmic reward function, and it is desirable to minimize the objective function by optimization. In this embodiment, the bonus function is designed as follows:

wherein F is _C Fuel cost as described by a nonlinear function, I _R And describing a system security risk index for the nonlinear utility function. C (C) _V Is the degree of violation of the total constraint of the system in the ground state, c ₁ 、c ₂ Respectively matching the magnitude relation omega between the fuel cost and the risk index ₁ 、ω ₂ Respectively used for reflecting the emphasis degree of the corresponding target.

Step 134: and reassigning the bacterial state according to the risk scheduling objective function value. After the risk scheduling objective function value is calculated, reassigning the bacterial status according to the risk scheduling objective function value.

Step 135: and carrying out iterative updating on the initial knowledge matrix according to the reassigned bacterial state to obtain an updated knowledge matrix. In one embodiment, step 135 includes step 11 and step 12.

Step 11: and performing dimension reduction on the initial knowledge matrix to obtain a plurality of sub-knowledge matrices.

As shown in FIG. 3, to effectively solve the problem of "dimension disaster", dimension reduction is performed by knowledge extension, and the initial knowledge matrix Q is divided into a plurality of sub-knowledge matrices Q ⁱ One-to-one correspondence with each variable. The variables are related by knowledge matrix, and the elements in adjacent matrix are related knowledge, that is, x _i Is a motion space A of (1) _i I.e. x _i+1 State space S of (2) _i+1 . Only the variable x is determined first _i Is able to select x based on the selection result _i+1 Thereby forming a chain extension among related knowledge and realizing the decomposition dimension reduction of the knowledge matrix.

Step 12: and updating the multiple sub-knowledge matrixes according to the reassigned bacterial state to obtain an updated knowledge matrix. And updating the plurality of sub-knowledge matrixes, and obtaining the updated knowledge matrix from the updated plurality of sub-knowledge matrixes.

The flora is used as a multi-body to cooperatively update the knowledge matrix, all bacteria share one knowledge matrix, and a plurality of knowledge elements can be updated simultaneously in a single iteration, so that the optimizing efficiency is greatly improved. Each test is performedAnd evaluating the rewards of each subject after wrong exploration. After introducing flora synergy, sub-knowledge matrix Q ⁱ The updating mode is as follows:

wherein: r(s) ^ij _k ，s ^ij _k+1 ，a ^ij _k ) Representing the kth iteration in state s _k Lower selection action a _k Transition to state s _k+1 The obtained reward function value; alpha is a learning factor and gamma is a discount factor.

In another embodiment, step 135 includes step 21 and step 23.

Step 21: and calculating the active power deviation of each source task and the new task in the initial knowledge matrix according to the reassigned bacterial state.

Active power bias is defined as the similarity between the source and new tasks and divides the active demand from small to large into multiple load sections:

[P _Ds1 ,P _Ds2 ),[P _Ds2 ,P _Ds3 ),...[P _Dsi-1 ,P _Dsi )...,[P _Dsn-1 ,P _Dsn )

step 22: and sequencing the source tasks from large to small according to the active power deviation, and acquiring the source tasks with the preset number. The specific value of the preset number is not unique, and in this embodiment, the preset number is two.

Step 23: and updating the initial knowledge matrix according to the acquired source task to obtain an updated knowledge matrix.

Taking matrix updating of two source tasks as an example, firstly calculating contribution coefficients of transfer learning of the two source tasks, and then updating an initial knowledge matrix according to the transfer coefficients to obtain a knowledge matrix of a new task.

Specifically, assume a newThe active requirement of task x is P _Dx ，P _Di 、P _Dk Is the load of two sections closest to the task x in the source task and satisfies P _Di <P _Dx <P _Dk Then two source tasks P _Di 、P _Dk Contribution coefficient η to transfer learning ₁ 、η ₂ Can be calculated by the following formula:

by using a linear migration mode, a knowledge matrix of a new task x can be obtained:

and the knowledge with high similarity with the new task is utilized, the source task section information closest to the load requirement of the new task is used for migration, negative interference of invalid knowledge on the learning quality and rate of the new task during migration is avoided, and the calculation accuracy is improved.

It may be understood that in one embodiment, the initial knowledge matrix may be first dimension reduced to obtain a plurality of sub-knowledge matrices, and then the plurality of sub-knowledge matrices may be updated by using knowledge with high similarity to the new task to obtain an updated knowledge matrix.

Step 136: judging whether the iterative update meets a preset condition.

The specific type of the preset condition is not unique, in this embodiment, the preset condition is k>k _max Or (b)Wherein k is _max Representing a preset maximum iteration number; />For knowledge matrix->Reflecting the front and back of the 2-norm of (2)The degree of deviation of the knowledge matrix in two iterations.

Judging whether the iterative updating meets the preset condition, if not, taking the updated knowledge matrix as an initial knowledge matrix, returning to the step 131, and updating the knowledge matrix again; if yes, the iteration updating is finished, and the finally obtained knowledge matrix is used as an optimization matrix required by new task optimization.

Step S140: and performing online optimization of the new tasks according to the updated knowledge matrix corresponding to the minimum risk scheduling objective function value, obtaining a risk scheduling optimization result and outputting the risk scheduling optimization result.

After the iteration updating of the initial knowledge matrix is finished, the updated knowledge matrix corresponding to the minimum risk scheduling objective function value is used as an optimization matrix to perform online optimization on the new task, and a risk scheduling optimization result is obtained and output. The specific mode of outputting the risk scheduling optimization result is not unique, and the risk scheduling optimization result can be output to a memory for storage or output to a display for display.

Furthermore, in one embodiment, prior to step S130, the power system risk scheduling method further comprises step 110.

Step 110: and receiving a source task for training to obtain an optimal knowledge matrix as an initial knowledge matrix.

Step 110 may be before step S120 or after step S120. The TBFO algorithm performs a series of source tasks in a pre-learning stage to obtain an optimal knowledge matrix and mine initial knowledge therefrom, ready for new tasks related in the future. As shown in FIG. 4, the relevant initial knowledge from the source task will be used in online optimization, source task Q, based on the similarity between the source task and the new task _S The initial knowledge matrix will migrate to the new task Q _N Is used to determine the initial knowledge matrix of (1).

In order to facilitate better understanding of the power system risk scheduling method described above, a detailed explanation will be given below in connection with specific embodiments.

And taking a certain reliability test system as a simulation object of risk scheduling. The reference capacity of the system is selected to be 100MVA, and the system has 24 bus nodes, 34 transmission lines/transformers and 32 generators, and the topological structure of the system is shown in figure 5. Of all 10 generator nodes, the generator node 21 with the largest single machine capacity is defined as a balance node of the whole system, and the remaining 9 nodes are PV (voltage control) nodes.

In order to test the adaptability of the algorithm to the optimization of different load levels, the embodiment carries out the risk scheduling optimization simulation of 96 sections. In this embodiment, a typical daily load curve is selected, and a section is divided every fifteen minutes according to the time sequence, to obtain sections 1 to 96.

Based on the platform, the risk scheduling optimization is carried out according to the following steps.

(1) Selecting the generator active output P at the PV node _G To control variables, the action variable space A (A _PG1 ，A _PG2 ，…，A _PGi ) And the space of the control variables is in one-to-one correspondence, and i is the total number of units on the PV node. The action space of the previous variable is the state space of the next variable. The sub-knowledge matrix corresponding to each variable state-action space is Q ^PG1 ，Q ^PG2 ，…，Q ^PGi . Under the guidance of the knowledge matrix, the bacteria acquire knowledge through a chemotactic operation, a migration operation and a replicative operation.

(2) The calculation of the objective function of the risk dispatch depends on the nonlinear power flow calculation. If the standard BFO algorithm is used, N is set _ed 、N _re And N _c Operands representing migration, replication and trending actions respectively, the maximum number of nominations being N _s The set of expected faults includes faults N _p If so, the number of times of load flow calculation can reach N _ed N _re N _c N _s (N _p +1) times, making the solution process extremely slow. By improving the algorithm optimizing mode, the nesting circulation of the original algorithm is removed, and the algorithm efficiency is improved. The bacterial population performs action selection in combination with a random search pattern and probability space action selection strategy of the BFO algorithm.

(3) The units with the same fuel cost coefficient on the same node are divided into a control variable, and the active output of 31 units is divided into 13 variables. And taking the output of the former unit as the state space of the latter unit. The state space of the first set is the active power of the current section, so that the dimension of the knowledge matrix is reduced.

(4) In the TBFO algorithm, the immediate prize value reflects the direction of optimization and the flora obtains an optimal strategy by iteratively optimizing the knowledge matrix in hopes of obtaining the maximum jackpot prize function value.

(5) Active power bias is defined as the similarity between the source task and the new task, and the active demand is divided into multiple load sections from small to large.

In order to avoid negative interference of invalid knowledge on the learning quality and rate of the new task during the migration, knowledge with high similarity to the new task should be utilized as much as possible in the learning process, and in this embodiment, only two source task section information closest to the load requirement of the new task are used for the migration. Suppose the active demand of new task x is P _Dx ，P _Di 、P _Dk Is the load of two sections closest to the task x in the source task and satisfies P _Di <P _Dx <P _Dk And acquiring contribution coefficients of the two source tasks to transfer learning, and obtaining a knowledge matrix of the new task x by using a linear transfer mode.

After the iteration updating of the initial knowledge matrix is completed, the updated knowledge matrix corresponding to the minimum risk scheduling objective function value is used as an optimization matrix to online perform new tasks, and a risk scheduling optimization result is obtained and output.

According to the power system risk scheduling method, the optimal knowledge matrix in the source task is used as the initial matrix of the new task to realize knowledge migration, and the new task is optimized on line by utilizing bacterial foraging reinforcement learning based on the knowledge migration. The speed of online learning is greatly improved through transfer learning, the online dynamic optimization of the risk scheduling problem is realized, and when the problem scale is further enlarged, the faster solving speed can be ensured, so that the method can be suitable for the rapid optimization of large-scale complex risk scheduling.

In one embodiment, a power system risk scheduling system, as shown in fig. 6, includes a task data acquisition module 120, a knowledge matrix update module 130, and a risk scheduling optimization module 140.

The task data acquisition module 120 is configured to acquire architecture data of the power system and new task load profile data. The architecture data of the power system may include bus bar nodes, transmission lines, transformers, generators, and the like, and the new mission load profile data includes one or more load profiles. And acquiring framework data and new task load section data of the power system for subsequent risk scheduling optimization.

The knowledge matrix updating module 130 is configured to iteratively update a preset initial knowledge matrix according to the architecture data and the new task load section data by using a bacterial foraging reinforcement learning algorithm, so as to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix.

The initial knowledge matrix is the optimal knowledge matrix in the source task. And the optimal knowledge matrix in the source task is used as an initial matrix of the new task to realize knowledge migration, and the random search mode of the bacterial colony combined with the bacterial foraging optimization algorithm and the probability space action selection strategy are used for executing action selection, so that the online optimization of the new task by utilizing the TBFO algorithm is realized.

The specific type of the initial knowledge matrix is not unique, and in this embodiment, the initial knowledge matrix is a Q matrix. And taking the Q matrix as a knowledge matrix for recording group optimization information, forming an initial knowledge matrix of a new task by using the knowledge matrix of a source task through analyzing the similarity among different optimization tasks, and realizing online dynamic optimization of different time section tasks in a knowledge migration mode so as to ensure the optimization reliability.

In one embodiment, the knowledge matrix update module 130 includes a first processing unit, a second processing unit, a third processing unit, a fourth processing unit, a fifth processing unit, and a sixth processing unit.

The first processing unit is used for controlling bacteria to perform trending operation, migration operation and replicative operation under the guidance of the initial knowledge matrix according to the architecture data and the new task load section data.

Under the guidance of the initial knowledge matrix, the bacteria acquire knowledge through a chemotactic operation, a migratory operation and a replicative operation. Specifically, based on energy value ranking, dominant individuals in the flora are placed in a trending state, and still assume the task of local searching. Its trending behavior can be represented by the following formula:

when the migration condition is satisfied, bacteria act on the action probability matrix P ⁱ Performing pseudo-random roulette selection a _S ；P ⁱ The update mode of (2) is as follows:

θ ^i+S/2 (j,k,l)＝rθ ⁱ (j,k,l)+(1-r)θ ^i+S/2 (j,k,l)

the second processing unit is used for calculating the tide value of the electric power system under the ground state and the preset faults according to the trend operation, the migration operation and the replication operation of bacteria.

And the third processing unit is used for calculating and obtaining a risk scheduling objective function value according to the power flow value of the power system in the ground state and under the preset fault.

The fourth processing unit is used for reassigning the bacterial state according to the risk scheduling objective function value. After the risk scheduling objective function value is calculated, reassigning the bacterial status according to the risk scheduling objective function value.

And the fifth processing unit is used for iteratively updating the initial knowledge matrix according to the reassigned bacterial state to obtain an updated knowledge matrix.

In one embodiment, the fifth processing unit includes a dimension reduction unit and a matrix update unit.

The dimension reduction unit is used for dimension reduction of the initial knowledge matrix to obtain a plurality of sub-knowledge matrices. Dimension reduction by knowledge extension, initial knowledgeThe knowledge matrix Q is divided into a plurality of sub-knowledge matrices Q ⁱ One-to-one correspondence with each variable.

The matrix updating unit is used for updating the plurality of sub-knowledge matrices according to the reassigned bacterial state to obtain an updated knowledge matrix. And updating the plurality of sub-knowledge matrixes, and obtaining the updated knowledge matrix from the updated plurality of sub-knowledge matrixes.

The flora is used as a multi-body to cooperatively update the knowledge matrix, all bacteria share one knowledge matrix, and a plurality of knowledge elements can be updated simultaneously in a single iteration, so that the optimizing efficiency is greatly improved. And each subject is subjected to rewarding value evaluation after each trial-and-error exploration. After introducing flora synergy, sub-knowledge matrix Q ⁱ The updating mode is as follows:

in another embodiment, the fifth processing unit includes a computing unit, an extracting unit, and an updating unit.

The calculating unit is used for calculating the active power deviation of each source task and the new task in the initial knowledge matrix according to the reassigned bacterial state. Active power bias is defined as the similarity between the source and new tasks and divides the active demand from small to large into multiple load sections:

the extraction unit is used for sequencing the source tasks from large to small according to the active power deviation, and acquiring the source tasks with the preset number. The specific value of the preset number is not unique, and in this embodiment, the preset number is two.

The updating unit is used for updating the initial knowledge matrix according to the acquired source task to obtain an updated knowledge matrix.

Taking matrix updating of two source tasks as an example, firstly calculating contribution coefficients of transfer learning of the two source tasks, and then updating an initial knowledge matrix according to the transfer coefficients to obtain a knowledge matrix of a new task. Two source tasks P _Di 、P _Dk Contribution coefficient η to transfer learning ₁ 、η ₂ Can be calculated by the following formula:

It will be appreciated that in one embodiment, it may also be that the fifth processing unit comprises a dimension reduction unit and a matrix updating unit, the matrix updating unit comprising a calculation unit, an extraction unit and an updating unit. The initial knowledge matrix is firstly dimension reduced to obtain a plurality of sub-knowledge matrices, and then the sub-knowledge matrices are updated by using knowledge with high similarity with the new task to obtain updated knowledge matrices.

The sixth processing unit is used for judging whether the iterative updating meets preset conditions, taking the updated knowledge matrix as an initial knowledge matrix when the iterative updating does not meet the preset conditions, controlling the first processing unit to perform trending operation, migration operation and replicative operation under the guidance of the initial knowledge matrix according to the architecture data and the new task load section data again.

The specific type of the preset condition is not unique, in this embodiment, the preset condition is k>k _max Or (b)Judging whether the iterative updating meets the preset condition, if not, carrying out iterative updating again by taking the updated knowledge matrix as an initial knowledge matrix, if so, ending the iterative updating, and taking the finally obtained knowledge matrix as an optimization matrix required by new task optimization.

The risk scheduling optimization module 140 is configured to perform online optimization of a new task according to the updated knowledge matrix corresponding to the time when the risk scheduling objective function value is minimum, obtain a risk scheduling optimization result, and output the risk scheduling optimization result.

Furthermore, in one embodiment, the power system risk scheduling system further comprises a matrix training module.

The matrix training module is configured to receive a source task for training before the knowledge matrix updating module 130 iteratively updates a preset initial knowledge matrix according to the architecture data and the new task load section data by using a bacterial foraging reinforcement learning algorithm to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix, and obtain an optimal knowledge matrix as the initial knowledge matrix. The TBFO algorithm performs a series of source tasks in a pre-learning stage to obtain an optimal knowledge matrix and mine initial knowledge therefrom, ready for new tasks related in the future.

According to the power system risk scheduling system, the optimal knowledge matrix in the source task is used as the initial matrix of the new task to realize knowledge migration, and the new task is optimized on line by utilizing bacterial foraging reinforcement learning based on the knowledge migration. The speed of online learning is greatly improved through transfer learning, the online dynamic optimization of the risk scheduling problem is realized, and when the problem scale is further enlarged, the faster solving speed can be ensured, so that the method can be suitable for the rapid optimization of large-scale complex risk scheduling.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. The power system risk scheduling method is characterized by comprising the following steps of:

acquiring architecture data and new task load section data of a power system;

performing online optimization of new tasks according to the updated knowledge matrix corresponding to the minimum risk scheduling objective function value, obtaining a risk scheduling optimization result and outputting the risk scheduling optimization result;

the step of iteratively updating a preset initial knowledge matrix through a bacterial foraging reinforcement learning algorithm according to the framework data and the new task load section data to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix comprises the following steps:

according to the architecture data and the new task load section data, controlling bacteria to perform trend operation, migration operation and replicative operation under the guidance of the initial knowledge matrix;

calculating the tide value of the electric power system under the ground state and preset faults according to the trend operation, the migration operation and the replicative operation of the bacteria;

calculating according to the power flow value of the power system in the ground state and a preset fault to obtain a risk scheduling objective function value;

reassigning bacterial status according to the risk scheduling objective function value;

iteratively updating the initial knowledge matrix according to the reassigned bacterial state to obtain an updated knowledge matrix;

judging whether the iterative update meets a preset condition or not;

if not, taking the updated knowledge matrix as the initial knowledge matrix, and returning the updated knowledge matrix to the step of controlling bacteria to perform trending operation, migration operation and replicative operation under the guidance of the initial knowledge matrix according to the framework data and the new task load section data.

2. The power system risk scheduling method of claim 1, wherein the initial knowledge matrix is a Q matrix.

3. The power system risk scheduling method according to claim 1, wherein the step of iteratively updating the initial knowledge matrix according to the reassigned bacterial status to obtain an updated knowledge matrix comprises the steps of:

dimension reduction is carried out on the initial knowledge matrix to obtain a plurality of sub-knowledge matrices;

and updating the plurality of sub-knowledge matrixes according to the reassigned bacterial state to obtain an updated knowledge matrix.

4. The power system risk scheduling method according to claim 1, wherein the step of iteratively updating a preset initial knowledge matrix by a bacterial foraging reinforcement learning algorithm according to the architecture data and the new task load section data to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix is preceded by the step of:

and receiving a source task for training to obtain an optimal knowledge matrix as the initial knowledge matrix.

5. A power system risk scheduling system, comprising:

the risk scheduling optimization module is used for carrying out online optimization on the new tasks according to the updated knowledge matrix corresponding to the time when the risk scheduling objective function value is minimum, obtaining and outputting a risk scheduling optimization result;

the knowledge matrix updating module comprises a first processing unit, a second processing unit, a third processing unit, a fourth processing unit, a fifth processing unit and a sixth processing unit;

the first processing unit is used for controlling bacteria to perform trending operation, migration operation and replicative operation under the guidance of the initial knowledge matrix according to the framework data and the new task load section data;

the second processing unit is used for calculating the tide value of the electric power system under the ground state and the preset faults according to the trend operation, the migration operation and the replicative operation of the bacteria;

the third processing unit is used for calculating a risk scheduling objective function value according to the power flow value of the power system in the ground state and a preset fault;

the fourth processing unit is used for reassigning the bacterial state according to the risk scheduling objective function value;

the fifth processing unit is configured to iteratively update the initial knowledge matrix according to the reassigned bacterial status to obtain an updated knowledge matrix;

the sixth processing unit is configured to determine whether the iterative update meets a preset condition, and when the iterative update does not meet the preset condition, take the updated knowledge matrix as the initial knowledge matrix, and control the first processing unit to perform trend operation, migration operation and replicative operation under the guidance of the initial knowledge matrix according to the architecture data and the new task load section data again.

6. The power system risk scheduling system of claim 5, wherein the initial knowledge matrix is a Q matrix.

7. The power system risk scheduling system of claim 5, wherein the fifth processing unit comprises:

the dimension reduction unit is used for dimension reduction of the initial knowledge matrix to obtain a plurality of sub knowledge matrices;

and the matrix updating unit is used for updating the plurality of sub-knowledge matrices according to the reassigned bacterial state to obtain an updated knowledge matrix.

8. The power system risk scheduling system according to claim 5, further comprising a matrix training module, wherein the matrix training module is configured to, before the knowledge matrix updating module performs iterative updating on a preset initial knowledge matrix according to the architecture data and the new task load section data by using a bacterial foraging reinforcement learning algorithm to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix, train a receiving source task, and obtain an optimal knowledge matrix as the initial knowledge matrix.