CN106296044B - Power system risk scheduling method and system - Google Patents

Power system risk scheduling method and system Download PDF

Info

Publication number
CN106296044B
CN106296044B CN201610882652.1A CN201610882652A CN106296044B CN 106296044 B CN106296044 B CN 106296044B CN 201610882652 A CN201610882652 A CN 201610882652A CN 106296044 B CN106296044 B CN 106296044B
Authority
CN
China
Prior art keywords
matrix
knowledge matrix
knowledge
risk scheduling
power system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610882652.1A
Other languages
Chinese (zh)
Other versions
CN106296044A (en
Inventor
郭晓斌
许爱东
简淦杨
魏文潇
占恺峤
史训涛
谭勤学
吴俊阳
韩传家
余涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
CSG Electric Power Research Institute
Original Assignee
South China University of Technology SCUT
CSG Electric Power Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT, CSG Electric Power Research Institute filed Critical South China University of Technology SCUT
Priority to CN201610882652.1A priority Critical patent/CN106296044B/en
Publication of CN106296044A publication Critical patent/CN106296044A/en
Application granted granted Critical
Publication of CN106296044B publication Critical patent/CN106296044B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention relates to a power system risk scheduling method and a power system risk scheduling system, which are used for acquiring framework data and new task load section data of a power system; and according to the framework data and the new task load section data, iteratively updating a preset initial knowledge matrix through a bacterial foraging reinforcement learning algorithm to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix. And performing online optimization of the new tasks according to the updated knowledge matrix corresponding to the minimum risk scheduling objective function value, obtaining a risk scheduling optimization result and outputting the risk scheduling optimization result. And taking the optimal knowledge matrix in the source task as an initial matrix of the new task to realize knowledge migration, and carrying out online optimization on the new task by utilizing bacterial foraging reinforcement learning based on the knowledge migration. The speed of online learning is greatly improved through transfer learning, the online dynamic optimization of the risk scheduling problem is realized, and when the problem scale is further enlarged, the faster solving speed can be ensured, so that the method can be suitable for the rapid optimization of large-scale complex risk scheduling.

Description

Power system risk scheduling method and system
Technical Field
The invention relates to the technical field of power grids, in particular to a power system risk scheduling method and system.
Background
In recent years, with the development of regional power grid interconnection and high-voltage long-distance large-capacity power transmission, the safe and stable operation of a power system faces more serious challenges. In order to better balance the safety and economic benefit of the system, enhance the level of the operation risk resistance of the scheduling operation, introduce the risk theory of the power system in the power generation optimization, and carry out a great deal of research on risk scheduling.
The traditional power system risk scheduling method is to apply intelligent algorithms such as genetics (genetic algorithm, GA), quantum genetics (quantum genetic algorithm, QGA), bee colony (artificial bee colony, ABC), particle swarm (particle swarm optimization, PSO) and the like to each optimization problem of the power system. However, the optimization of the intelligent algorithm on similar tasks is carried out in isolation, experience and knowledge of the past tasks cannot be effectively saved, self-learning capability is lacked, and the intelligent algorithm needs to be reinitialized when each new task is executed, so that the optimization efficiency is low, and the intelligent algorithm is difficult to adapt to the rapid optimization of large-scale complex risk scheduling.
Disclosure of Invention
Based on the foregoing, it is necessary to provide a power system risk scheduling method and system that can adapt to the rapid optimization of large-scale complex risk scheduling.
A power system risk scheduling method, comprising the steps of:
acquiring architecture data and new task load section data of a power system;
according to the framework data and the new task load section data, iteratively updating a preset initial knowledge matrix through a bacterial foraging reinforcement learning algorithm to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix; the initial knowledge matrix is an optimal knowledge matrix in a source task;
and performing online optimization of the new tasks according to the updated knowledge matrix corresponding to the minimum risk scheduling objective function value, obtaining a risk scheduling optimization result and outputting the risk scheduling optimization result.
A power system risk dispatch system comprising:
the task data acquisition module is used for acquiring the architecture data of the power system and the section data of the new task load;
the knowledge matrix updating module is used for carrying out iterative updating on a preset initial knowledge matrix through a bacterial foraging reinforcement learning algorithm according to the framework data and the new task load section data to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix; the initial knowledge matrix is an optimal knowledge matrix in a source task;
and the risk scheduling optimization module is used for carrying out online optimization on the new tasks according to the updated knowledge matrix corresponding to the minimum risk scheduling objective function value, obtaining and outputting a risk scheduling optimization result.
According to the power system risk scheduling method and system, the framework data and the new task load section data of the power system are obtained; and according to the framework data and the new task load section data, iteratively updating a preset initial knowledge matrix through a bacterial foraging reinforcement learning algorithm to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix. And performing online optimization of the new tasks according to the updated knowledge matrix corresponding to the minimum risk scheduling objective function value, obtaining a risk scheduling optimization result and outputting the risk scheduling optimization result. And taking the optimal knowledge matrix in the source task as an initial matrix of the new task to realize knowledge migration, and carrying out online optimization on the new task by utilizing bacterial foraging reinforcement learning based on the knowledge migration. The speed of online learning is greatly improved through transfer learning, the online dynamic optimization of the risk scheduling problem is realized, and when the problem scale is further enlarged, the faster solving speed can be ensured, so that the method can be suitable for the rapid optimization of large-scale complex risk scheduling.
Drawings
FIG. 1 is a flow chart of a power system risk scheduling method according to an embodiment;
FIG. 2 is a schematic diagram of knowledge acquisition of a bacterial foraging reinforcement learning algorithm based on knowledge migration in one embodiment;
FIG. 3 is a schematic diagram of dimension reduction based on knowledge extension in one embodiment;
FIG. 4 is a diagram illustrating knowledge migration in one embodiment;
FIG. 5 is a topology of a test system in one embodiment;
fig. 6 is a block diagram of a power system risk scheduling system in an embodiment.
Detailed Description
In one embodiment, a power system risk scheduling method, as shown in fig. 1, includes the following steps:
step S120: and acquiring the architecture data and the new task load section data of the power system.
The architecture data of the power system may include bus bar nodes, transmission lines, transformers, generators, and the like. The new task load profile data includes one or more load profiles, each load profile acting as a new task. And acquiring framework data and new task load section data of the power system for subsequent risk scheduling optimization.
Step S130: and according to the framework data and the new task load section data, iteratively updating a preset initial knowledge matrix through a bacterial foraging reinforcement learning algorithm to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix.
The initial knowledge matrix is the optimal knowledge matrix in the source task. And (3) taking the optimal knowledge matrix in the source task as an initial matrix of the new task to realize knowledge migration, and executing action selection by combining a random search mode of a bacterial colony and a probability space action selection strategy to realize online optimization of the new task by utilizing a bacterial foraging reinforcement learning algorithm (Transfer Bacteria Foraging Optimization, TBFO) based on knowledge migration.
The specific type of the initial knowledge matrix is not unique, and in this embodiment, the initial knowledge matrix is a Q matrix. In the Q learning algorithm, element Q (s, a) in the Q matrix represents the desire to select the jackpot value for action a in state s. The matrix records knowledge of the process by which the agent maps states to actions. And taking the Q matrix as a knowledge matrix for recording group optimization information, forming an initial knowledge matrix of a new task by using the knowledge matrix of a source task through analyzing the similarity among different optimization tasks, and realizing online dynamic optimization of different time section tasks in a knowledge migration mode so as to ensure the optimization reliability.
In the TBFO algorithm, a bacterial group obtains an action strategy aiming at a specific environmental state from an initial knowledge matrix, and the feedback information obtained from repeated experiments is used for updating the original knowledge to form an inherent reaction to the specific state so as to maximize the accumulated energy value in the foraging process of the bacterial group.
In one embodiment, step S130 includes steps 131 through 136.
Step 131: according to the architecture data and the new task load section data, controlling bacteria to perform trending operation, migration operation and replicative operation under the guidance of an initial knowledge matrix.
Under the guidance of the initial knowledge matrix, the bacteria acquire knowledge through a chemotactic operation, a migratory operation and a replicative operation. In the TBFO algorithm, all bacteria will search for the foraging area based on the initial knowledge matrix and feed back the resulting reward to the knowledge matrix. As shown in fig. 2, TBFO divides bacteria into two states, tending and migrating, according to the operation being performed. In the algorithm single iteration loop, two states are respectively given to a certain proportion of bacterial individuals, after each operation is executed by two groups of bacteria, the energy values of all bacteria are calculated and ordered, and replicative operation is carried out, so that the accumulated energy value in the foraging process of the bacterial group is maximized. In a new iteration cycle, reassigning the bacterial state according to the energy value in the previous iteration, keeping the area where the bacteria with larger energy value are located unchanged, performing chemotactic operation, and performing migration operation by the bacteria with lower energy value.
Specifically, based on energy value ranking, dominant individuals in the flora are placed in a trending state, and still assume the task of local searching. Its trending behavior can be represented by the following formula:
in θ i (j, k, l) is the position of the individual bacteria i after the first, k and j-th transfer operations; delta represents a unit vector in a random direction determined after the walk.
C k (i) Either a fixed step size or a variable step size. In this embodiment, C k (i) For non-linearly decreasing inertial step, C k (i) The update method is as follows:
in the formula C k (i) For the inertia step at the kth iteration, C 0 For initial running step length, C e For the final running step, cly is the maximum number of iterative steps.
For bacteria in a migrating state, when the bacteria meet the migration probability P ed When the bacteria select the wheel disc according to the action probability matrix; otherwise, the bacteria migrate according to actions corresponding to the maximum knowledge elements (greedy strategy):
wherein: the superscript i represents the ith controllable variable, and corresponds to the ith sub-knowledge matrix, i epsilon M; m is a controllable variable set; superscript j represents the j-th bacterium, j epsilon N, N is a flora set; p (P) ed Is the migration probability; r is a random number between 0 and 1; a, a s Then it is the probability matrix P i Actions selected in the global scope. When the migration condition is satisfied, bacteria act on the action probability matrix P i Performing pseudo-random wheel selection; p (P) i The update mode of (2) is as follows:
wherein beta is a difference coefficient for amplifying Q i The variability of the matrix elements; e, e i Belonging to the intermediate calculation matrix.
In one embodiment, a crossover process is introduced in the replicative operation in the following manner:
θ i+S/2 (j,k,l)=rθ i (j,k,l)+(1-r)θ i+S/2 (j,k,l)
wherein S is the number of individual bacteria, i is [1, S/2], and r is the random number in [0,1 ].
Step 132: according to the trending operation, the migration operation and the replicative operation of bacteria, calculating the tide value of the electric power system under the ground state and the preset faults.
After the trending operation, the migration operation and the replicative operation of the bacteria are finished, calculating the tide value of the electric power system under the ground state and the preset faults according to the corresponding results. The ground state means that the system has no system fault, and the specific type of the preset fault is not unique.
Step 133: and calculating according to the power flow value of the power system in the ground state and under the preset fault to obtain the risk scheduling objective function value.
The immediate prize value reflects the direction of optimization in the TBFO algorithm, and the flora obtains an optimal strategy by iteratively optimizing the knowledge matrix in hopes of obtaining the maximum jackpot prize function value. In the risk dispatch mathematical model, the objective function is the inverse of the algorithmic reward function, and it is desirable to minimize the objective function by optimization. In this embodiment, the bonus function is designed as follows:
wherein F is C Fuel cost as described by a nonlinear function, I R And describing a system security risk index for the nonlinear utility function. C (C) V Is the degree of violation of the total constraint of the system in the ground state, c 1 、c 2 Respectively matching the magnitude relation omega between the fuel cost and the risk index 1 、ω 2 Respectively used for reflecting the emphasis degree of the corresponding target.
Step 134: and reassigning the bacterial state according to the risk scheduling objective function value. After the risk scheduling objective function value is calculated, reassigning the bacterial status according to the risk scheduling objective function value.
Step 135: and carrying out iterative updating on the initial knowledge matrix according to the reassigned bacterial state to obtain an updated knowledge matrix. In one embodiment, step 135 includes step 11 and step 12.
Step 11: and performing dimension reduction on the initial knowledge matrix to obtain a plurality of sub-knowledge matrices.
As shown in FIG. 3, to effectively solve the problem of "dimension disaster", dimension reduction is performed by knowledge extension, and the initial knowledge matrix Q is divided into a plurality of sub-knowledge matrices Q i One-to-one correspondence with each variable. The variables are related by knowledge matrix, and the elements in adjacent matrix are related knowledge, that is, x i Is a motion space A of (1) i I.e. x i+1 State space S of (2) i+1 . Only the variable x is determined first i Is able to select x based on the selection result i+1 Thereby forming a chain extension among related knowledge and realizing the decomposition dimension reduction of the knowledge matrix.
Step 12: and updating the multiple sub-knowledge matrixes according to the reassigned bacterial state to obtain an updated knowledge matrix. And updating the plurality of sub-knowledge matrixes, and obtaining the updated knowledge matrix from the updated plurality of sub-knowledge matrixes.
The flora is used as a multi-body to cooperatively update the knowledge matrix, all bacteria share one knowledge matrix, and a plurality of knowledge elements can be updated simultaneously in a single iteration, so that the optimizing efficiency is greatly improved. Each test is performedAnd evaluating the rewards of each subject after wrong exploration. After introducing flora synergy, sub-knowledge matrix Q i The updating mode is as follows:
wherein: r(s) ij k ,s ij k+1 ,a ij k ) Representing the kth iteration in state s k Lower selection action a k Transition to state s k+1 The obtained reward function value; alpha is a learning factor and gamma is a discount factor.
In another embodiment, step 135 includes step 21 and step 23.
Step 21: and calculating the active power deviation of each source task and the new task in the initial knowledge matrix according to the reassigned bacterial state.
Active power bias is defined as the similarity between the source and new tasks and divides the active demand from small to large into multiple load sections:
[P Ds1 ,P Ds2 ),[P Ds2 ,P Ds3 ),...[P Dsi-1 ,P Dsi )...,[P Dsn-1 ,P Dsn )
step 22: and sequencing the source tasks from large to small according to the active power deviation, and acquiring the source tasks with the preset number. The specific value of the preset number is not unique, and in this embodiment, the preset number is two.
Step 23: and updating the initial knowledge matrix according to the acquired source task to obtain an updated knowledge matrix.
Taking matrix updating of two source tasks as an example, firstly calculating contribution coefficients of transfer learning of the two source tasks, and then updating an initial knowledge matrix according to the transfer coefficients to obtain a knowledge matrix of a new task.
Specifically, assume a newThe active requirement of task x is P Dx ,P Di 、P Dk Is the load of two sections closest to the task x in the source task and satisfies P Di <P Dx <P Dk Then two source tasks P Di 、P Dk Contribution coefficient η to transfer learning 1 、η 2 Can be calculated by the following formula:
by using a linear migration mode, a knowledge matrix of a new task x can be obtained:
and the knowledge with high similarity with the new task is utilized, the source task section information closest to the load requirement of the new task is used for migration, negative interference of invalid knowledge on the learning quality and rate of the new task during migration is avoided, and the calculation accuracy is improved.
It may be understood that in one embodiment, the initial knowledge matrix may be first dimension reduced to obtain a plurality of sub-knowledge matrices, and then the plurality of sub-knowledge matrices may be updated by using knowledge with high similarity to the new task to obtain an updated knowledge matrix.
Step 136: judging whether the iterative update meets a preset condition.
The specific type of the preset condition is not unique, in this embodiment, the preset condition is k>k max Or (b)Wherein k is max Representing a preset maximum iteration number; />For knowledge matrix->Reflecting the front and back of the 2-norm of (2)The degree of deviation of the knowledge matrix in two iterations.
Judging whether the iterative updating meets the preset condition, if not, taking the updated knowledge matrix as an initial knowledge matrix, returning to the step 131, and updating the knowledge matrix again; if yes, the iteration updating is finished, and the finally obtained knowledge matrix is used as an optimization matrix required by new task optimization.
Step S140: and performing online optimization of the new tasks according to the updated knowledge matrix corresponding to the minimum risk scheduling objective function value, obtaining a risk scheduling optimization result and outputting the risk scheduling optimization result.
After the iteration updating of the initial knowledge matrix is finished, the updated knowledge matrix corresponding to the minimum risk scheduling objective function value is used as an optimization matrix to perform online optimization on the new task, and a risk scheduling optimization result is obtained and output. The specific mode of outputting the risk scheduling optimization result is not unique, and the risk scheduling optimization result can be output to a memory for storage or output to a display for display.
Furthermore, in one embodiment, prior to step S130, the power system risk scheduling method further comprises step 110.
Step 110: and receiving a source task for training to obtain an optimal knowledge matrix as an initial knowledge matrix.
Step 110 may be before step S120 or after step S120. The TBFO algorithm performs a series of source tasks in a pre-learning stage to obtain an optimal knowledge matrix and mine initial knowledge therefrom, ready for new tasks related in the future. As shown in FIG. 4, the relevant initial knowledge from the source task will be used in online optimization, source task Q, based on the similarity between the source task and the new task S The initial knowledge matrix will migrate to the new task Q N Is used to determine the initial knowledge matrix of (1).
In order to facilitate better understanding of the power system risk scheduling method described above, a detailed explanation will be given below in connection with specific embodiments.
And taking a certain reliability test system as a simulation object of risk scheduling. The reference capacity of the system is selected to be 100MVA, and the system has 24 bus nodes, 34 transmission lines/transformers and 32 generators, and the topological structure of the system is shown in figure 5. Of all 10 generator nodes, the generator node 21 with the largest single machine capacity is defined as a balance node of the whole system, and the remaining 9 nodes are PV (voltage control) nodes.
In order to test the adaptability of the algorithm to the optimization of different load levels, the embodiment carries out the risk scheduling optimization simulation of 96 sections. In this embodiment, a typical daily load curve is selected, and a section is divided every fifteen minutes according to the time sequence, to obtain sections 1 to 96.
Based on the platform, the risk scheduling optimization is carried out according to the following steps.
(1) Selecting the generator active output P at the PV node G To control variables, the action variable space A (A PG1 ,A PG2 ,…,A PGi ) And the space of the control variables is in one-to-one correspondence, and i is the total number of units on the PV node. The action space of the previous variable is the state space of the next variable. The sub-knowledge matrix corresponding to each variable state-action space is Q PG1 ,Q PG2 ,…,Q PGi . Under the guidance of the knowledge matrix, the bacteria acquire knowledge through a chemotactic operation, a migration operation and a replicative operation.
(2) The calculation of the objective function of the risk dispatch depends on the nonlinear power flow calculation. If the standard BFO algorithm is used, N is set ed 、N re And N c Operands representing migration, replication and trending actions respectively, the maximum number of nominations being N s The set of expected faults includes faults N p If so, the number of times of load flow calculation can reach N ed N re N c N s (N p +1) times, making the solution process extremely slow. By improving the algorithm optimizing mode, the nesting circulation of the original algorithm is removed, and the algorithm efficiency is improved. The bacterial population performs action selection in combination with a random search pattern and probability space action selection strategy of the BFO algorithm.
(3) The units with the same fuel cost coefficient on the same node are divided into a control variable, and the active output of 31 units is divided into 13 variables. And taking the output of the former unit as the state space of the latter unit. The state space of the first set is the active power of the current section, so that the dimension of the knowledge matrix is reduced.
(4) In the TBFO algorithm, the immediate prize value reflects the direction of optimization and the flora obtains an optimal strategy by iteratively optimizing the knowledge matrix in hopes of obtaining the maximum jackpot prize function value.
(5) Active power bias is defined as the similarity between the source task and the new task, and the active demand is divided into multiple load sections from small to large.
In order to avoid negative interference of invalid knowledge on the learning quality and rate of the new task during the migration, knowledge with high similarity to the new task should be utilized as much as possible in the learning process, and in this embodiment, only two source task section information closest to the load requirement of the new task are used for the migration. Suppose the active demand of new task x is P Dx ,P Di 、P Dk Is the load of two sections closest to the task x in the source task and satisfies P Di <P Dx <P Dk And acquiring contribution coefficients of the two source tasks to transfer learning, and obtaining a knowledge matrix of the new task x by using a linear transfer mode.
After the iteration updating of the initial knowledge matrix is completed, the updated knowledge matrix corresponding to the minimum risk scheduling objective function value is used as an optimization matrix to online perform new tasks, and a risk scheduling optimization result is obtained and output.
According to the power system risk scheduling method, the optimal knowledge matrix in the source task is used as the initial matrix of the new task to realize knowledge migration, and the new task is optimized on line by utilizing bacterial foraging reinforcement learning based on the knowledge migration. The speed of online learning is greatly improved through transfer learning, the online dynamic optimization of the risk scheduling problem is realized, and when the problem scale is further enlarged, the faster solving speed can be ensured, so that the method can be suitable for the rapid optimization of large-scale complex risk scheduling.
In one embodiment, a power system risk scheduling system, as shown in fig. 6, includes a task data acquisition module 120, a knowledge matrix update module 130, and a risk scheduling optimization module 140.
The task data acquisition module 120 is configured to acquire architecture data of the power system and new task load profile data. The architecture data of the power system may include bus bar nodes, transmission lines, transformers, generators, and the like, and the new mission load profile data includes one or more load profiles. And acquiring framework data and new task load section data of the power system for subsequent risk scheduling optimization.
The knowledge matrix updating module 130 is configured to iteratively update a preset initial knowledge matrix according to the architecture data and the new task load section data by using a bacterial foraging reinforcement learning algorithm, so as to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix.
The initial knowledge matrix is the optimal knowledge matrix in the source task. And the optimal knowledge matrix in the source task is used as an initial matrix of the new task to realize knowledge migration, and the random search mode of the bacterial colony combined with the bacterial foraging optimization algorithm and the probability space action selection strategy are used for executing action selection, so that the online optimization of the new task by utilizing the TBFO algorithm is realized.
The specific type of the initial knowledge matrix is not unique, and in this embodiment, the initial knowledge matrix is a Q matrix. And taking the Q matrix as a knowledge matrix for recording group optimization information, forming an initial knowledge matrix of a new task by using the knowledge matrix of a source task through analyzing the similarity among different optimization tasks, and realizing online dynamic optimization of different time section tasks in a knowledge migration mode so as to ensure the optimization reliability.
In one embodiment, the knowledge matrix update module 130 includes a first processing unit, a second processing unit, a third processing unit, a fourth processing unit, a fifth processing unit, and a sixth processing unit.
The first processing unit is used for controlling bacteria to perform trending operation, migration operation and replicative operation under the guidance of the initial knowledge matrix according to the architecture data and the new task load section data.
Under the guidance of the initial knowledge matrix, the bacteria acquire knowledge through a chemotactic operation, a migratory operation and a replicative operation. Specifically, based on energy value ranking, dominant individuals in the flora are placed in a trending state, and still assume the task of local searching. Its trending behavior can be represented by the following formula:
C k (i) Either a fixed step size or a variable step size. In this embodiment, C k (i) For non-linearly decreasing inertial step, C k (i) The update method is as follows:
for bacteria in a migrating state, when the bacteria meet the migration probability P ed When the bacteria select the wheel disc according to the action probability matrix; otherwise, the bacteria migrate according to actions corresponding to the maximum knowledge elements (greedy strategy):
when the migration condition is satisfied, bacteria act on the action probability matrix P i Performing pseudo-random roulette selection a S ;P i The update mode of (2) is as follows:
in one embodiment, a crossover process is introduced in the replicative operation in the following manner:
θ i+S/2 (j,k,l)=rθ i (j,k,l)+(1-r)θ i+S/2 (j,k,l)
the second processing unit is used for calculating the tide value of the electric power system under the ground state and the preset faults according to the trend operation, the migration operation and the replication operation of bacteria.
After the trending operation, the migration operation and the replicative operation of the bacteria are finished, calculating the tide value of the electric power system under the ground state and the preset faults according to the corresponding results. The ground state means that the system has no system fault, and the specific type of the preset fault is not unique.
And the third processing unit is used for calculating and obtaining a risk scheduling objective function value according to the power flow value of the power system in the ground state and under the preset fault.
The immediate prize value reflects the direction of optimization in the TBFO algorithm, and the flora obtains an optimal strategy by iteratively optimizing the knowledge matrix in hopes of obtaining the maximum jackpot prize function value. In the risk dispatch mathematical model, the objective function is the inverse of the algorithmic reward function, and it is desirable to minimize the objective function by optimization. In this embodiment, the bonus function is designed as follows:
wherein F is C Fuel cost as described by a nonlinear function, I R And describing a system security risk index for the nonlinear utility function. C (C) V Is the degree of violation of the total constraint of the system in the ground state, c 1 、c 2 Respectively matching the magnitude relation omega between the fuel cost and the risk index 1 、ω 2 Respectively used for reflecting the emphasis degree of the corresponding target.
The fourth processing unit is used for reassigning the bacterial state according to the risk scheduling objective function value. After the risk scheduling objective function value is calculated, reassigning the bacterial status according to the risk scheduling objective function value.
And the fifth processing unit is used for iteratively updating the initial knowledge matrix according to the reassigned bacterial state to obtain an updated knowledge matrix.
In one embodiment, the fifth processing unit includes a dimension reduction unit and a matrix update unit.
The dimension reduction unit is used for dimension reduction of the initial knowledge matrix to obtain a plurality of sub-knowledge matrices. Dimension reduction by knowledge extension, initial knowledgeThe knowledge matrix Q is divided into a plurality of sub-knowledge matrices Q i One-to-one correspondence with each variable.
The matrix updating unit is used for updating the plurality of sub-knowledge matrices according to the reassigned bacterial state to obtain an updated knowledge matrix. And updating the plurality of sub-knowledge matrixes, and obtaining the updated knowledge matrix from the updated plurality of sub-knowledge matrixes.
The flora is used as a multi-body to cooperatively update the knowledge matrix, all bacteria share one knowledge matrix, and a plurality of knowledge elements can be updated simultaneously in a single iteration, so that the optimizing efficiency is greatly improved. And each subject is subjected to rewarding value evaluation after each trial-and-error exploration. After introducing flora synergy, sub-knowledge matrix Q i The updating mode is as follows:
in another embodiment, the fifth processing unit includes a computing unit, an extracting unit, and an updating unit.
The calculating unit is used for calculating the active power deviation of each source task and the new task in the initial knowledge matrix according to the reassigned bacterial state. Active power bias is defined as the similarity between the source and new tasks and divides the active demand from small to large into multiple load sections:
[P Ds1 ,P Ds2 ),[P Ds2 ,P Ds3 ),...[P Dsi-1 ,P Dsi )...,[P Dsn-1 ,P Dsn )
the extraction unit is used for sequencing the source tasks from large to small according to the active power deviation, and acquiring the source tasks with the preset number. The specific value of the preset number is not unique, and in this embodiment, the preset number is two.
The updating unit is used for updating the initial knowledge matrix according to the acquired source task to obtain an updated knowledge matrix.
Taking matrix updating of two source tasks as an example, firstly calculating contribution coefficients of transfer learning of the two source tasks, and then updating an initial knowledge matrix according to the transfer coefficients to obtain a knowledge matrix of a new task. Two source tasks P Di 、P Dk Contribution coefficient η to transfer learning 1 、η 2 Can be calculated by the following formula:
by using a linear migration mode, a knowledge matrix of a new task x can be obtained:
and the knowledge with high similarity with the new task is utilized, the source task section information closest to the load requirement of the new task is used for migration, negative interference of invalid knowledge on the learning quality and rate of the new task during migration is avoided, and the calculation accuracy is improved.
It will be appreciated that in one embodiment, it may also be that the fifth processing unit comprises a dimension reduction unit and a matrix updating unit, the matrix updating unit comprising a calculation unit, an extraction unit and an updating unit. The initial knowledge matrix is firstly dimension reduced to obtain a plurality of sub-knowledge matrices, and then the sub-knowledge matrices are updated by using knowledge with high similarity with the new task to obtain updated knowledge matrices.
The sixth processing unit is used for judging whether the iterative updating meets preset conditions, taking the updated knowledge matrix as an initial knowledge matrix when the iterative updating does not meet the preset conditions, controlling the first processing unit to perform trending operation, migration operation and replicative operation under the guidance of the initial knowledge matrix according to the architecture data and the new task load section data again.
The specific type of the preset condition is not unique, in this embodiment, the preset condition is k>k max Or (b)Judging whether the iterative updating meets the preset condition, if not, carrying out iterative updating again by taking the updated knowledge matrix as an initial knowledge matrix, if so, ending the iterative updating, and taking the finally obtained knowledge matrix as an optimization matrix required by new task optimization.
The risk scheduling optimization module 140 is configured to perform online optimization of a new task according to the updated knowledge matrix corresponding to the time when the risk scheduling objective function value is minimum, obtain a risk scheduling optimization result, and output the risk scheduling optimization result.
After the iteration updating of the initial knowledge matrix is finished, the updated knowledge matrix corresponding to the minimum risk scheduling objective function value is used as an optimization matrix to perform online optimization on the new task, and a risk scheduling optimization result is obtained and output. The specific mode of outputting the risk scheduling optimization result is not unique, and the risk scheduling optimization result can be output to a memory for storage or output to a display for display.
Furthermore, in one embodiment, the power system risk scheduling system further comprises a matrix training module.
The matrix training module is configured to receive a source task for training before the knowledge matrix updating module 130 iteratively updates a preset initial knowledge matrix according to the architecture data and the new task load section data by using a bacterial foraging reinforcement learning algorithm to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix, and obtain an optimal knowledge matrix as the initial knowledge matrix. The TBFO algorithm performs a series of source tasks in a pre-learning stage to obtain an optimal knowledge matrix and mine initial knowledge therefrom, ready for new tasks related in the future.
According to the power system risk scheduling system, the optimal knowledge matrix in the source task is used as the initial matrix of the new task to realize knowledge migration, and the new task is optimized on line by utilizing bacterial foraging reinforcement learning based on the knowledge migration. The speed of online learning is greatly improved through transfer learning, the online dynamic optimization of the risk scheduling problem is realized, and when the problem scale is further enlarged, the faster solving speed can be ensured, so that the method can be suitable for the rapid optimization of large-scale complex risk scheduling.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (8)

1. The power system risk scheduling method is characterized by comprising the following steps of:
acquiring architecture data and new task load section data of a power system;
according to the framework data and the new task load section data, iteratively updating a preset initial knowledge matrix through a bacterial foraging reinforcement learning algorithm to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix; the initial knowledge matrix is an optimal knowledge matrix in a source task;
performing online optimization of new tasks according to the updated knowledge matrix corresponding to the minimum risk scheduling objective function value, obtaining a risk scheduling optimization result and outputting the risk scheduling optimization result;
the step of iteratively updating a preset initial knowledge matrix through a bacterial foraging reinforcement learning algorithm according to the framework data and the new task load section data to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix comprises the following steps:
according to the architecture data and the new task load section data, controlling bacteria to perform trend operation, migration operation and replicative operation under the guidance of the initial knowledge matrix;
calculating the tide value of the electric power system under the ground state and preset faults according to the trend operation, the migration operation and the replicative operation of the bacteria;
calculating according to the power flow value of the power system in the ground state and a preset fault to obtain a risk scheduling objective function value;
reassigning bacterial status according to the risk scheduling objective function value;
iteratively updating the initial knowledge matrix according to the reassigned bacterial state to obtain an updated knowledge matrix;
judging whether the iterative update meets a preset condition or not;
if not, taking the updated knowledge matrix as the initial knowledge matrix, and returning the updated knowledge matrix to the step of controlling bacteria to perform trending operation, migration operation and replicative operation under the guidance of the initial knowledge matrix according to the framework data and the new task load section data.
2. The power system risk scheduling method of claim 1, wherein the initial knowledge matrix is a Q matrix.
3. The power system risk scheduling method according to claim 1, wherein the step of iteratively updating the initial knowledge matrix according to the reassigned bacterial status to obtain an updated knowledge matrix comprises the steps of:
dimension reduction is carried out on the initial knowledge matrix to obtain a plurality of sub-knowledge matrices;
and updating the plurality of sub-knowledge matrixes according to the reassigned bacterial state to obtain an updated knowledge matrix.
4. The power system risk scheduling method according to claim 1, wherein the step of iteratively updating a preset initial knowledge matrix by a bacterial foraging reinforcement learning algorithm according to the architecture data and the new task load section data to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix is preceded by the step of:
and receiving a source task for training to obtain an optimal knowledge matrix as the initial knowledge matrix.
5. A power system risk scheduling system, comprising:
the task data acquisition module is used for acquiring the architecture data of the power system and the section data of the new task load;
the knowledge matrix updating module is used for carrying out iterative updating on a preset initial knowledge matrix through a bacterial foraging reinforcement learning algorithm according to the framework data and the new task load section data to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix; the initial knowledge matrix is an optimal knowledge matrix in a source task;
the risk scheduling optimization module is used for carrying out online optimization on the new tasks according to the updated knowledge matrix corresponding to the time when the risk scheduling objective function value is minimum, obtaining and outputting a risk scheduling optimization result;
the knowledge matrix updating module comprises a first processing unit, a second processing unit, a third processing unit, a fourth processing unit, a fifth processing unit and a sixth processing unit;
the first processing unit is used for controlling bacteria to perform trending operation, migration operation and replicative operation under the guidance of the initial knowledge matrix according to the framework data and the new task load section data;
the second processing unit is used for calculating the tide value of the electric power system under the ground state and the preset faults according to the trend operation, the migration operation and the replicative operation of the bacteria;
the third processing unit is used for calculating a risk scheduling objective function value according to the power flow value of the power system in the ground state and a preset fault;
the fourth processing unit is used for reassigning the bacterial state according to the risk scheduling objective function value;
the fifth processing unit is configured to iteratively update the initial knowledge matrix according to the reassigned bacterial status to obtain an updated knowledge matrix;
the sixth processing unit is configured to determine whether the iterative update meets a preset condition, and when the iterative update does not meet the preset condition, take the updated knowledge matrix as the initial knowledge matrix, and control the first processing unit to perform trend operation, migration operation and replicative operation under the guidance of the initial knowledge matrix according to the architecture data and the new task load section data again.
6. The power system risk scheduling system of claim 5, wherein the initial knowledge matrix is a Q matrix.
7. The power system risk scheduling system of claim 5, wherein the fifth processing unit comprises:
the dimension reduction unit is used for dimension reduction of the initial knowledge matrix to obtain a plurality of sub knowledge matrices;
and the matrix updating unit is used for updating the plurality of sub-knowledge matrices according to the reassigned bacterial state to obtain an updated knowledge matrix.
8. The power system risk scheduling system according to claim 5, further comprising a matrix training module, wherein the matrix training module is configured to, before the knowledge matrix updating module performs iterative updating on a preset initial knowledge matrix according to the architecture data and the new task load section data by using a bacterial foraging reinforcement learning algorithm to obtain a corresponding risk scheduling objective function value and an updated knowledge matrix, train a receiving source task, and obtain an optimal knowledge matrix as the initial knowledge matrix.
CN201610882652.1A 2016-10-08 2016-10-08 Power system risk scheduling method and system Active CN106296044B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610882652.1A CN106296044B (en) 2016-10-08 2016-10-08 Power system risk scheduling method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610882652.1A CN106296044B (en) 2016-10-08 2016-10-08 Power system risk scheduling method and system

Publications (2)

Publication Number Publication Date
CN106296044A CN106296044A (en) 2017-01-04
CN106296044B true CN106296044B (en) 2023-08-25

Family

ID=57717240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610882652.1A Active CN106296044B (en) 2016-10-08 2016-10-08 Power system risk scheduling method and system

Country Status (1)

Country Link
CN (1) CN106296044B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106526432B (en) * 2017-01-06 2019-05-10 中国南方电网有限责任公司电网技术研究中心 A kind of fault location algorithm and device based on BFOA
CN108549907B (en) * 2018-04-11 2021-11-16 武汉大学 Data verification method based on multi-source transfer learning
CN108734419B (en) * 2018-06-15 2021-07-06 大连理工大学 Blast furnace gas scheduling system modeling method based on knowledge migration
CN109460890B (en) * 2018-09-21 2021-08-06 浙江大学 Intelligent self-healing method based on reinforcement learning and control performance monitoring
CN109873406B (en) * 2019-03-28 2019-11-22 华中科技大学 A kind of electric system weakness route discrimination method
CN110048461B (en) * 2019-05-16 2021-07-02 广东电网有限责任公司 Multi-virtual power plant decentralized self-discipline optimization method
JP7242508B2 (en) * 2019-10-29 2023-03-20 株式会社東芝 Information processing device, information processing method, and program
CN111626539B (en) * 2020-03-03 2023-06-16 中国南方电网有限责任公司 Q reinforcement learning-based power grid operation section dynamic generation method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103199544A (en) * 2013-03-26 2013-07-10 上海理工大学 Reactive power optimization method of electrical power system
WO2014090037A1 (en) * 2012-12-10 2014-06-19 中兴通讯股份有限公司 Task scheduling method and system in cloud computing
CN105023056A (en) * 2015-06-26 2015-11-04 华南理工大学 Power grid optimal carbon energy composite flow obtaining method based on swarm intelligence reinforcement learning
CN105373183A (en) * 2015-10-20 2016-03-02 同济大学 Method for tracking whole-situation maximum power point in photovoltaic array

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014090037A1 (en) * 2012-12-10 2014-06-19 中兴通讯股份有限公司 Task scheduling method and system in cloud computing
CN103199544A (en) * 2013-03-26 2013-07-10 上海理工大学 Reactive power optimization method of electrical power system
CN105023056A (en) * 2015-06-26 2015-11-04 华南理工大学 Power grid optimal carbon energy composite flow obtaining method based on swarm intelligence reinforcement learning
CN105373183A (en) * 2015-10-20 2016-03-02 同济大学 Method for tracking whole-situation maximum power point in photovoltaic array

Also Published As

Publication number Publication date
CN106296044A (en) 2017-01-04

Similar Documents

Publication Publication Date Title
CN106296044B (en) Power system risk scheduling method and system
Kumar et al. A hybrid multi-agent based particle swarm optimization algorithm for economic power dispatch
CN105226643B (en) Operation of Electric Systems simulation model quickly generates and method for solving under security constraint
CN109214599B (en) Method for predicting link of complex network
Sun et al. An online generator start-up algorithm for transmission system self-healing based on MCTS and sparse autoencoder
Li et al. Short-term economic environmental hydrothermal scheduling using improved multi-objective gravitational search algorithm
CN107609667B (en) Heat supply load prediction method and system based on Box _ cox transformation and UFCNN
CN107766995A (en) Power-system short-term load forecasting method based on depth recurrent neural network
CN105164593A (en) Method and device for controlling energy-generating system which can be operated with renewable energy source
CN109978253B (en) Electric power system short-term load prediction method based on incremental learning
Duan et al. A deep reinforcement learning based approach for optimal active power dispatch
CN106980906B (en) Spark-based Ftrl voltage prediction method
CN108182490A (en) A kind of short-term load forecasting method under big data environment
Aurasopon et al. An improved local search involving bee colony optimization using lambda iteration combined with a golden section search method to solve an economic dispatch problem
CN116345555A (en) CNN-ISCA-LSTM model-based short-term photovoltaic power generation power prediction method
Zhang et al. Short term load forecasting based on IGSA-ELM algorithm
Jamshidi et al. Using artificial neural networks and system identification methods for electricity price modeling
CN113435595A (en) Two-stage optimization method for extreme learning machine network parameters based on natural evolution strategy
CN117057623A (en) Comprehensive power grid safety optimization scheduling method, device and storage medium
CN116954162A (en) Method and apparatus for generating control strategy for industrial system
CN110674460A (en) E-Seq2Seq technology-based data driving type unit combination intelligent decision method
CN114169416B (en) Short-term load prediction method based on migration learning under small sample set
CN115907000A (en) Small sample learning method for optimal power flow prediction of power system
Sulaiman et al. Cuckoo search for determining Artificial Neural Network training parameters in modeling operating photovoltaic module temperature
CN106327079A (en) Method for evaluating reactive optimization control of power distributing network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210810

Address after: 510663 3 building, 3, 4, 5 and J1 building, 11 building, No. 11, Ke Xiang Road, Luogang District Science City, Guangzhou, Guangdong.

Applicant after: ELECTRIC POWER Research Institute CHINA SOUTHERN POWER GRID

Applicant after: SOUTH CHINA University OF TECHNOLOGY

Address before: 510080 water Donggang 8, Dongfeng East Road, Yuexiu District, Guangzhou, Guangdong.

Applicant before: ELECTRIC POWER Research Institute CHINA SOUTHERN POWER GRID

Applicant before: POWER GRID TECHNOLOGY RESEARCH CENTER. CHINA SOUTHERN POWER GRID

Applicant before: SOUTH CHINA University OF TECHNOLOGY

GR01 Patent grant
GR01 Patent grant