CN111914361B

CN111914361B - Wind turbine blade rapid design optimization method based on reinforcement learning

Info

Publication number: CN111914361B
Application number: CN202010676474.3A
Authority: CN
Inventors: 贾良跃; 郝佳; 王国新; 阎艳; 子曌; 朱志成
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2020-07-14
Filing date: 2020-07-14
Publication date: 2023-03-31
Anticipated expiration: 2040-07-14
Also published as: CN111914361A

Abstract

The invention discloses a wind turbine blade rapid design optimization method based on reinforcement learning. The method is based on a reinforcement learning method, provides directional guidance in the process of TAD optimization of the blade, promotes the blade model to evolve towards a larger energy obtaining direction, and greatly improves the optimization efficiency. Meanwhile, due to the reusability of the reinforcement learning method, the trained optimization model can be continuously reused under different wind speeds, and the search process of the optimal TAD of the blade under different wind speeds can get rid of the dilemma from 0. The optimization model trained under the original wind speed is used as an initial model, and then the optimization model is adjusted to adapt to a new wind speed environment, so that the training time of the optimization model is greatly shortened, and the TAD optimization speed of the blade is improved.

Description

Wind turbine blade rapid design optimization method based on reinforcement learning

Technical Field

The invention relates to the technical field of design optimization of blades of wind driven generators, in particular to a method for quickly designing and optimizing blades of a wind turbine based on reinforcement learning.

Background

The growing demand for energy, the rapid depletion of fossil fuel reserves and the constant call for environmental protection have led to the rapid development of alternative renewable energy sources. Wind energy is a major renewable clean energy source and is widely used worldwide due to its easy capture and large capacity. In order to obtain more energy and reduce the cost of obtaining unit wind energy, the conventional commercial wind energy generator optimizes and adjusts the torsion angle of the blades, so that the energy obtained by each part of the blades can be locally optimal, and the energy obtained by the whole blades can be overall optimal. Under the condition of hardly increasing the cost, the efficiency and the total energy of wind energy acquisition are greatly improved. To achieve such capability, the core problem to be solved is how to effectively identify the optimal blade Torsional Angle Distribution (TAD) in real time under the complicated and variable natural wind environment, so that the control system can adjust the TAD to achieve the optimal wind energy harvesting amount. The effective method for identifying the optimal TAD of the conventional blade is to combine the optimization method with a simulator, and gradually adjust the TAD of the blade in an automated trial and error-evaluation manner until the requirements are met.

Since blades are typically deployed in dynamic wind environments where wind speeds vary widely, finding an optimal TAD for different wind speeds is crucial in the design process. To achieve this, traditional methods combine Evolutionary Algorithms (EAs) with Blade Element Momentum (BEM) or Computational Fluid Dynamics (CFD). However, the existing methods can only find the optimal TAD given a fixed wind speed. When the wind speed changes, the optimization model needs to be retrained, which greatly increases the design time of the blade TAD and prolongs the design period of the blade.

Therefore, a new solution is needed to improve the efficiency of the TAD search of the optimized blade and realize the rapid design of the torsion angle of the wind turbine blade while ensuring the accuracy of the TAD search of the optimized blade.

Disclosure of Invention

In view of the above, the invention provides a wind turbine blade rapid design optimization method based on reinforcement learning, which adopts an offline training-online application mode to rapidly realize optimal TAD search, and integrates aerodynamic performance and expert experience to guide a TAD optimization mode in an optimization process, so as to realize directional exploration in the optimization process, avoid a large number of random search processes in the later period, and greatly improve the efficiency and accuracy of optimization model training.

The invention discloses a method for rapidly designing and optimizing a wind turbine blade based on reinforcement learning, which comprises the following steps of:

step 1, constructing a TAD calculation model and an environment model;

the TAD calculation model calculates the optimal TAD according to the wind speed and the blade structure;

the environment model carries out the aerodynamic performance analysis of the blade according to TAD generated by the TAD calculation model;

step 2, training the TAD calculation model by adopting a reinforcement learning method to obtain a trained TAD calculation model; wherein, a long-term reward mechanism is adopted, and the reward function is

Wherein R (tau) ⁱ ) For learning τ for the ith time ⁱ The jackpot of (a); t is the total number of steps of the study; gamma is the discount rate, r _t′ The instant reward of the step t' in the study is given; r _t Long-term rewards for step t; the instant prize r _t′ The aerodynamic performance is mainly obtained by an environment model and is obtained by combining expert experience;

and 3, adopting the trained TAD calculation model, and carrying out real-time search and output of the optimal TAD according to the current wind speed and the blade structure.

Preferably, the environmental model uses a wind energy acquisition coefficient as an aerodynamic performance analysis result.

Preferably, the instant prize is obtained according to the following: calculating whether the current TAD follows a monotonic decrease, and if not, awarding r immediately _t′ = 10u, wherein u represents the number of times that the monotonically decreasing adjustment is not satisfied; if followed, an immediate award r _t′ ＝10C _p -10N, wherein C _p For the wind energy capture coefficient, N is the number of times the twist angle range is exceeded.

Preferably, the environmental model is obtained by using computational fluid dynamics, a blade velocity momentum theory or a proxy model.

Preferably, the environment model adopts an artificial neural network agent model.

Preferably, the environment model is a 4-layer artificial neural network, and the propagation mode is reverse propagation; inputs to the artificial neural network are TAD and wind speed V _w The output is the wind energy capture coefficient C after the network evaluation _p 。

Preferably, the TAD calculation model employs an optimization model or an agent.

Preferably, an Actor-Critic learning method is adopted to construct and train the agent; the agent comprises an action executor and a state commenter; wherein the long-term prize value generated in the prize function is translated by a "status evaluator" into an "agent" internal prize value, which is then provided to an internal "action executor" for directing the generation of a new TAD.

Preferably, the action executor and the state evaluator are parameterized by a neural network.

Has the advantages that:

the invention provides a method (RL-TAD) for rapidly designing and optimizing a wind turbine blade based on reinforcement learning, which is based on the reinforcement learning method, provides directional guidance in the process of TAD optimization of the blade, promotes a blade model to evolve towards a larger energy obtaining direction, and greatly improves the optimization efficiency. Meanwhile, due to the reusability of the reinforcement learning method, the trained optimization model can be continuously reused under different wind speeds, and the searching process of the optimal TAD of the blade under different wind speeds can get rid of the embarrassment from 0. The optimization model trained under the original wind speed is used as an initial model, and then the optimization model is adjusted to adapt to a new wind speed environment, so that the training time of the optimization model is greatly shortened, and the TAD optimization speed of the blade is improved.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a blade aerodynamic assessment model based on an artificial neural network.

FIG. 3 is an organizational structure of a reward function.

Detailed Description

The invention is described in detail below by way of example with reference to the accompanying drawings.

The invention provides a method for rapidly designing and optimizing a wind turbine blade based on reinforcement learning, and provides a blade design optimization framework based on reinforcement learning. The method mainly comprises 2 processes of an off-line model training stage and an on-line model application stage. In the off-line model training stage, the training of the TAD calculation model is realized, and the main function of the TAD calculation model is to complete the receiving of any wind speed and the input of the blade structure and output the corresponding optimal TAD. Then, in the "on-line model application phase", the trained TAD calculation model is used to quickly complete the optimal TAD search for a specific wind speed and a fixed blade structure. The optimization of the TAD calculation model is instructive and is carried out according to the aerodynamic performance of the environment model and expert experience, the exploration of the direction in the optimization process is realized, a large number of random search processes are avoided in the later period, and the efficiency and the accuracy of the optimization model training can be greatly improved. Meanwhile, the reinforcement learning method has reusability, can realize the functions of off-line training and on-line application, the reinforcement learning model trained in the off-line stage can be stored to realize the application in the on-line stage, the extremely time-consuming model training process is completed off-line, and the on-line model using process only involves extremely short calculation, so that the rapid design requirement of the optimal TAD of the wind turbine blade can be met.

The flow chart of the invention is shown in fig. 1, and specifically comprises the following steps:

step 1, the whole model training and using steps start from an off-line model training stage. The goal of the "offline model training phase" is to develop a trained TAD computational model that can find the optimal TAD at any wind speed. This phase is mainly composed of 3 large core components, including: "TAD calculation model", "environment model", and "reward function". Firstly, the TAD calculation model outputs the optimal torsion angle component of the blade structure under any wind speed by receiving the blade structure parameters of a fixed structureA cloth (TAD); then, the environment model receives the TAD generated by the TAD calculation model, the aerodynamic performance of the blade is analyzed, and parameters for representing the aerodynamic performance are many, such as energy cost, annual energy acquisition quantity, inertia moment, thrust, torque and the like; the present embodiment employs a wind energy capture coefficient (C) representative thereof _p ) Performing blade performance evaluation as core performance; finally, the "reward function" is obtained by fusing the aerodynamic performance analysis results (the wind energy capture coefficient C in the embodiment) _p ) And expert experience of blade design, and continuously guiding the training direction of the TAD calculation model: if the current TAD has better aerodynamic performance analysis results (i.e. better C) _p Value), and if expert experience is satisfied, rewarding the current TAD calculation model (i.e. continuing the current training direction), otherwise penalizing the current TAD calculation model (i.e. changing the training direction), and implementing continuous training of the TAD calculation model through the reward and penalty system until convergence requirements are satisfied. The method specifically comprises the following substeps:

step 1.1, construct "environmental model"

The main function of the environment model is to realize the analysis of the aerodynamic performance of the blade, and the embodiment selects the wind energy acquisition coefficient (C) of the wind turbine blade _p ) And performing performance evaluation. Typically, computational Fluid Dynamics (CFD), blade velocity momentum theory (BEM), or a proxy model is used as an "environmental model" to compute the aerodynamic performance of a wind turbine blade. In consideration of the timeliness of the calculation, the embodiment adopts a proxy model mode, and the proxy model has great advantages in calculation time and can quickly search for the optimal TAD. There are various proxy models, and the present embodiment adopts an Artificial Neural Network (ANN), which is a proxy model for "environmental model" to evaluate the wind energy capture coefficient (C) of the wind turbine blade _p )。

The artificial neural network is a computational model based on the structure and function of the biological neural network, and in the embodiment, the network is trained by adopting a 4-layer ANN and a back propagation algorithm. Inputs to the ANN network are TAD and wind speed (V) _w ) The output is the wind estimated by the networkCan obtain the coefficient (C) _p ). The detailed parameters of the network are shown in table 1.

TABLE 1 neural network hyperparameters

/>

Step 1.2, construct the "reward function"

The reward function integrates the output of the environment model and the professional experience in blade design, and then gives reward and punishment information to guide the training direction of the TAD calculation model. It is considered that not only the current reward value influences the optimization process of TAD, but also the future reward value influences the generation of TAD of the next generation, which also conforms to the core idea of the markov decision chain. Therefore, to provide more effective and reasonable guidance, the present invention proposes a long-term reward mechanism that takes into account the impact of future reward values on current TAD optimization, which includes 3 immediate rewards based on blade design experience.

TAD optimization is a process of continuously adjusting the initial TAD, and this adjustment process can be regarded as an optimization trajectory τ ⁱ . Each track tau ⁱ Consists of T steps. R (tau) ⁱ ) Representing the track τ ⁱ The cumulative reward of (4), which can evaluate the optimization performance of the track, a higher value indicates a better TAD optimization. Once a satisfactory optimized trajectory τ with the highest reward is found ⁱ We consider the TAD optimization process to be complete. R is _t Is the jackpot for the intermediate step t. At the track tau ⁱ TAD optimization decisions for each step will follow the highest R _t The value principle, i.e. optimization of each TAD step, can result in the highest cumulative prize. Their reward formula is as follows

Wherein r is _t′ Is the instant prize of step t'; γ is the discount rate. Gamma ray ^t′-t Represents the discount rate of all future steps t' to the current step t, since gamma is [0,1 ]]The farther the future step is from the current step t, the smaller the discount rate of the current step and the smaller the influence on the current prize. In the present invention, a jackpot R is awarded _t As a long-term reward. Instant reward r _t′ The method is mainly formed by combining aerodynamic performance obtained by an environment model and expert experience. The present embodiment illustrates a setting process of an instant reward by taking a wind energy obtaining coefficient as an example:

wind energy acquisition coefficient reward:

since the optimization of the blade design in the present invention aims to obtain a higher wind energy capture coefficient (C) _p ) Thus C _p The larger the value, the higher the prize value given. The wind energy capture coefficient reward function is as follows:

r ₁ ＝C _p *10

engineering experience awards:

in the field of vane TAD design, there are two general engineering experiences with the form of TAD: 1) The torsion angle is monotonically decreased; 2) The maximum and minimum range of the torsion angle is limited. Based on these 2 engineering experiences, we propose two engineering experience rewards. Furthermore, to represent the continuous variable TAD, we use discrete twist angles for the n blade cross-sections.

X＝[Tw ₁ ,Tw ₂ ,…Tw _i …,Tw _n-1 ,Tw _n ]i＝1,2,…,n

L<Tw _i <U

Wherein, tw _i Is the ith ^th Twist angle of the cross section of each blade. L and U represent the upper and lower limits of the twist angle, respectively.

A) Monotonically decreasing:

for wind turbine blade design, to obtain a good C _p The Twist Angle Distribution (TAD) of the blade needs to follow a monotonically decreasing engineering experience, i.e. the twist angle decreases monotonically from the blade root to the blade tip. Thus, for any continuous blade cross-section, when 1 ≦ i<i +1 is less than or equal to n, there is an inequality Tw _i ≥Tw _i+1 Then a positive reward (+ 0) is awarded. However, once Tw appears _i <Tw _i+1 Then a reverse penalty (-10) will be set. The monotonically decreasing reward function formula is as follows:

r ₂ ＝-10*u

u denotes the number of times that the monotonically decreasing adjustment is not satisfied.

B) And (3) range constraint:

after a literature review of blade TAD optimization, we found that the twist angle of the blade is almost in the range of [ -5, +45], and therefore this twist angle range constraint participates as another engineering experience in the construction of the reward function. The range-bound reward function formula is as follows:

r ₃ ＝-10*N

n is the number of times the twist angle range is exceeded.

The wind energy acquisition coefficient reward and the engineering experience reward are fused:

based on TAD search knowledge and a large number of experiments, we find that the instant reward function composition structure shown in FIG. 3 can achieve the best learning efficiency of the RL-TAD model. After TAD is entered, a "monotonically decreasing" reward (r) is first calculated ₂ ). If the monotone decreasing principle is not satisfied, the instant reward is directly set as r ₂ . If the adjustment and decrease principle is satisfied, the 'monotone decrease' reward is set as 0, the 'range limit' reward and the 'wind energy acquisition coefficient' reward are continuously calculated and added to form the final instant reward r. This structure shows that monotonic decrease is the main condition for RL-TAD model training, and the "range limit" constraint and the "wind energy capture coefficient" target are not considered until the "monotonic decrease" constraint is satisfied.

Step 1.3, constructing a TAD calculation model "

The functions realized by the TAD calculation model are as follows: the Twist Angle Distribution (TAD) of the blade structure can be found under any wind speed by receiving the blade with a fixed structure. Optimization models such as gradient-based optimization models (newton method, steepest descent method, batch gradient descent method), evolutionary algorithm-based optimization models (genetic algorithm, particle swarm algorithm, differential evolution algorithm), and the like, and agents, and the like, may be used. The present embodiment employs the "agent" for TAD calculation in consideration of reusability of the "agent" and the ability to adapt to a variable environment. In general, methods such as Q-learning, deep Q Network, policy Gradient, actor-Critic and Deep Deterministic Policy Gradient can be used to construct and train the "agent". In the embodiment, the agent is constructed and trained by adopting the Actor-Critic learning method, and the learning method has high training efficiency and high speed. The "broker" architecture consists of two basic components: an "action executor" and a "status commenter". The long-term prize value generated in the prize function is translated by the "status evaluator" into an "agent" internal prize value, which is then provided to an internal "action executor" for guidance in generating a new TAD. In the present embodiment, both the "action performer" and the "state evaluator" are parameterized by a neural network.

The ultimate goal of the "agent" is always to obtain the highest reward return. Therefore, the objective function definition that collects long-term reward values is the main task. We construct the objective function and its gradient as follows:

where θ is a weight parameter of the "action executor" Actor neural network. In the present invention, the state S at step t is defined _t Is the wind speed V _w And set of TAD, i.e. s _t ＝(V _w TAD); action a _t The modified size for TAD is indicated, i.e. a = Δ TAD. Pi _θ (a _t |s _t ) Is an action execution strategy function defined by theta, which represents TAD at a specific wind speed, and needs to be subjected to a for increasing the reward value _t The probability of this event occurring is modified. The higher the probability the more likely this action a will be performed _t 。R _t Indicates that at the t-th step, action a is performed _t Long term prize value brought, so

Represents the sum of the reward values of an optimized track from the 1 st step to the last T step. Meanwhile, in order to improve accuracy and effectiveness of reward measurement and calculation, summarizing calculation of m optimized tracks is carried out, and the expected reward is obtained to obtain a final objective function J (theta). Finally, the gradient of J (theta) is used for training the action execution strategy function pi _θ (·)。

To further refine the objective function, it is necessary to have a reward value R for each step _t And (6) adjusting. First, an action cost function is introduced into the reward of each step

Q(s _t ,a _t ) Indicating that a specific action a is performed in a certain state _t The desired prize value of. At the same time, a state cost function is introduced

V(s _t ) Is shown in state s _t All possible actions a are performed in case of _t The desired prize value brought. The objective function gradient is as follows.

Q(s _t ,a _t )-V(s _t ) Is shown in state s _t Performing a specific action a _t The difference between the value of the reward accrued and the average reward accrued for performing all actions, if Q(s) _t ,a _t )>V(s _t ) Then, it means to perform action a _t The result is a positive reward, and vice versa. Experiments prove that the relative reward function can achieve better model training effect. And due to Q(s) _t ,a _t )＝r _t+1 +γV(s _t+1 ) The objective function gradient is as follows.

Action execution strategy pi in objective function _θ (a _t |s _t ) Represented by a "motion actuator" neural network, a cost function V(s) _t ) Represented by a "motion actuator" neural network. Training the neural network parameters of the action executor based on the objective function gradient, wherein a parameter updating formula is shown as follows:

"State estimator" neural network parameter update dependent on realistic reward value r _t+1 +γV(s _t+1 ) Difference from the evaluation prize value: delta _TD (t)＝r _t+1 +γV(s _t+1 )-V(s _t ) The parameter update formula is as follows:

wherein, θ, ω are parameters of the two networks, respectively; α, β represent the learning rates of the two networks, respectively.

TABLE 2 neural network hyper-parameters in A-C learning model

It should be noted that the "agent" is composed of nerves, and the network parameters thereof are continuously updated by incremental training, that is, after the training of the "agent" at a certain wind speed is completed, the new "agent" inherits the network parameters of the old "agent" for the new wind speed.

And 2, generating a TAD generator of the online application based on the trained agent. The TAD generator can perform real-time search of optimal TAD and perform quick output for any random wind speed and fixed wind turbine structure.

In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for rapidly designing and optimizing a wind turbine blade based on reinforcement learning is characterized by comprising the following steps:

step 1, constructing a TAD calculation model and an environment model;

step 2, training the TAD calculation model by adopting a reinforcement learning method to obtain a trained TAD calculation model; the method adopts a long-term reward mechanism to train the TAD calculation model in the reinforcement learning method, wherein a reward function in the long-term reward mechanism is

Wherein R (tau) ⁱ ) For learning τ for the ith time ⁱ The jackpot of (1); t is the total number of steps of the study; gamma is the conversion rate, r _t′ Awarding the instant points in the step t' in the study; r _t Long-term awards for step t; the instant prize r _t′ Aerodynamics derived from environmental modelsPerformance is obtained by combining expert experience;

2. The reinforcement learning-based wind turbine blade rapid design optimization method of claim 1, wherein the environmental model adopts a wind energy acquisition coefficient as an aerodynamic performance analysis result.

3. The reinforcement learning-based wind turbine blade rapid design optimization method of claim 2, wherein the instant reward is obtained according to the following modes: calculating whether the current TAD follows a monotonic decrease, and if not, awarding r immediately _t′ = 10u, wherein u represents the number of times that the monotonically decreasing adjustment is not satisfied; if followed, then an immediate reward r _t′ ＝10C _p -10N, wherein C _p For wind energy capture coefficients, N is the number of times the twist angle range is exceeded.

4. The reinforcement learning-based wind turbine blade rapid design optimization method of claim 1 or 2, wherein the environmental model is obtained by using computational fluid dynamics, a blade velocity momentum theory or a proxy model.

5. The reinforcement learning-based wind turbine blade rapid design optimization method of claim 4, wherein the environment model adopts an artificial neural network agent model.

6. The wind turbine blade rapid design optimization method based on reinforcement learning as claimed in claim 5, wherein the environment model is a 4-layer artificial neural network, and the propagation mode is back propagation; inputs to the artificial neural network are TAD and wind speed V _w The output is the wind energy capture coefficient C after the network evaluation _p 。

7. The reinforcement learning-based wind turbine blade rapid design optimization method of claim 1, wherein the TAD calculation model adopts an optimization model or an agent.

8. The reinforcement learning-based wind turbine blade rapid design optimization method of claim 7, wherein an Actor-Critic learning method is adopted to construct and train the agent; the agent comprises an action executor and a state evaluator; wherein the long-term prize value generated in the prize function is translated by the state evaluator into an agent internal prize value, which is then provided to the internal action performer for guidance in generating a new TAD.

9. The method as claimed in claim 8, wherein the action executor and the state evaluator are parameterized by a neural network.