CN111914361B - Wind turbine blade rapid design optimization method based on reinforcement learning - Google Patents

Wind turbine blade rapid design optimization method based on reinforcement learning Download PDF

Info

Publication number
CN111914361B
CN111914361B CN202010676474.3A CN202010676474A CN111914361B CN 111914361 B CN111914361 B CN 111914361B CN 202010676474 A CN202010676474 A CN 202010676474A CN 111914361 B CN111914361 B CN 111914361B
Authority
CN
China
Prior art keywords
tad
model
reinforcement learning
blade
optimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010676474.3A
Other languages
Chinese (zh)
Other versions
CN111914361A (en
Inventor
贾良跃
郝佳
王国新
阎艳
子曌
朱志成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202010676474.3A priority Critical patent/CN111914361B/en
Publication of CN111914361A publication Critical patent/CN111914361A/en
Application granted granted Critical
Publication of CN111914361B publication Critical patent/CN111914361B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/17Mechanical parametric or variational design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/28Design optimisation, verification or simulation using fluid dynamics, e.g. using Navier-Stokes equations or computational fluid dynamics [CFD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/08Fluids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/14Force analysis or force optimisation, e.g. static or dynamic forces
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E10/00Energy generation through renewable energy sources
    • Y02E10/70Wind energy
    • Y02E10/72Wind turbines with rotation axis in wind direction

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Pure & Applied Mathematics (AREA)
  • Biophysics (AREA)
  • Fluid Mechanics (AREA)
  • Algebra (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Mathematics (AREA)
  • Wind Motors (AREA)

Abstract

The invention discloses a wind turbine blade rapid design optimization method based on reinforcement learning. The method is based on a reinforcement learning method, provides directional guidance in the process of TAD optimization of the blade, promotes the blade model to evolve towards a larger energy obtaining direction, and greatly improves the optimization efficiency. Meanwhile, due to the reusability of the reinforcement learning method, the trained optimization model can be continuously reused under different wind speeds, and the search process of the optimal TAD of the blade under different wind speeds can get rid of the dilemma from 0. The optimization model trained under the original wind speed is used as an initial model, and then the optimization model is adjusted to adapt to a new wind speed environment, so that the training time of the optimization model is greatly shortened, and the TAD optimization speed of the blade is improved.

Description

Wind turbine blade rapid design optimization method based on reinforcement learning
Technical Field
The invention relates to the technical field of design optimization of blades of wind driven generators, in particular to a method for quickly designing and optimizing blades of a wind turbine based on reinforcement learning.
Background
The growing demand for energy, the rapid depletion of fossil fuel reserves and the constant call for environmental protection have led to the rapid development of alternative renewable energy sources. Wind energy is a major renewable clean energy source and is widely used worldwide due to its easy capture and large capacity. In order to obtain more energy and reduce the cost of obtaining unit wind energy, the conventional commercial wind energy generator optimizes and adjusts the torsion angle of the blades, so that the energy obtained by each part of the blades can be locally optimal, and the energy obtained by the whole blades can be overall optimal. Under the condition of hardly increasing the cost, the efficiency and the total energy of wind energy acquisition are greatly improved. To achieve such capability, the core problem to be solved is how to effectively identify the optimal blade Torsional Angle Distribution (TAD) in real time under the complicated and variable natural wind environment, so that the control system can adjust the TAD to achieve the optimal wind energy harvesting amount. The effective method for identifying the optimal TAD of the conventional blade is to combine the optimization method with a simulator, and gradually adjust the TAD of the blade in an automated trial and error-evaluation manner until the requirements are met.
Since blades are typically deployed in dynamic wind environments where wind speeds vary widely, finding an optimal TAD for different wind speeds is crucial in the design process. To achieve this, traditional methods combine Evolutionary Algorithms (EAs) with Blade Element Momentum (BEM) or Computational Fluid Dynamics (CFD). However, the existing methods can only find the optimal TAD given a fixed wind speed. When the wind speed changes, the optimization model needs to be retrained, which greatly increases the design time of the blade TAD and prolongs the design period of the blade.
Therefore, a new solution is needed to improve the efficiency of the TAD search of the optimized blade and realize the rapid design of the torsion angle of the wind turbine blade while ensuring the accuracy of the TAD search of the optimized blade.
Disclosure of Invention
In view of the above, the invention provides a wind turbine blade rapid design optimization method based on reinforcement learning, which adopts an offline training-online application mode to rapidly realize optimal TAD search, and integrates aerodynamic performance and expert experience to guide a TAD optimization mode in an optimization process, so as to realize directional exploration in the optimization process, avoid a large number of random search processes in the later period, and greatly improve the efficiency and accuracy of optimization model training.
The invention discloses a method for rapidly designing and optimizing a wind turbine blade based on reinforcement learning, which comprises the following steps of:
step 1, constructing a TAD calculation model and an environment model;
the TAD calculation model calculates the optimal TAD according to the wind speed and the blade structure;
the environment model carries out the aerodynamic performance analysis of the blade according to TAD generated by the TAD calculation model;
step 2, training the TAD calculation model by adopting a reinforcement learning method to obtain a trained TAD calculation model; wherein, a long-term reward mechanism is adopted, and the reward function is
Figure BDA0002584233480000021
Figure BDA0002584233480000022
Wherein R (tau) i ) For learning τ for the ith time i The jackpot of (a); t is the total number of steps of the study; gamma is the discount rate, r t′ The instant reward of the step t' in the study is given; r t Long-term rewards for step t; the instant prize r t′ The aerodynamic performance is mainly obtained by an environment model and is obtained by combining expert experience;
and 3, adopting the trained TAD calculation model, and carrying out real-time search and output of the optimal TAD according to the current wind speed and the blade structure.
Preferably, the environmental model uses a wind energy acquisition coefficient as an aerodynamic performance analysis result.
Preferably, the instant prize is obtained according to the following: calculating whether the current TAD follows a monotonic decrease, and if not, awarding r immediately t′ = 10u, wherein u represents the number of times that the monotonically decreasing adjustment is not satisfied; if followed, an immediate award r t′ =10C p -10N, wherein C p For the wind energy capture coefficient, N is the number of times the twist angle range is exceeded.
Preferably, the environmental model is obtained by using computational fluid dynamics, a blade velocity momentum theory or a proxy model.
Preferably, the environment model adopts an artificial neural network agent model.
Preferably, the environment model is a 4-layer artificial neural network, and the propagation mode is reverse propagation; inputs to the artificial neural network are TAD and wind speed V w The output is the wind energy capture coefficient C after the network evaluation p
Preferably, the TAD calculation model employs an optimization model or an agent.
Preferably, an Actor-Critic learning method is adopted to construct and train the agent; the agent comprises an action executor and a state commenter; wherein the long-term prize value generated in the prize function is translated by a "status evaluator" into an "agent" internal prize value, which is then provided to an internal "action executor" for directing the generation of a new TAD.
Preferably, the action executor and the state evaluator are parameterized by a neural network.
Has the advantages that:
the invention provides a method (RL-TAD) for rapidly designing and optimizing a wind turbine blade based on reinforcement learning, which is based on the reinforcement learning method, provides directional guidance in the process of TAD optimization of the blade, promotes a blade model to evolve towards a larger energy obtaining direction, and greatly improves the optimization efficiency. Meanwhile, due to the reusability of the reinforcement learning method, the trained optimization model can be continuously reused under different wind speeds, and the searching process of the optimal TAD of the blade under different wind speeds can get rid of the embarrassment from 0. The optimization model trained under the original wind speed is used as an initial model, and then the optimization model is adjusted to adapt to a new wind speed environment, so that the training time of the optimization model is greatly shortened, and the TAD optimization speed of the blade is improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a blade aerodynamic assessment model based on an artificial neural network.
FIG. 3 is an organizational structure of a reward function.
Detailed Description
The invention is described in detail below by way of example with reference to the accompanying drawings.
The invention provides a method for rapidly designing and optimizing a wind turbine blade based on reinforcement learning, and provides a blade design optimization framework based on reinforcement learning. The method mainly comprises 2 processes of an off-line model training stage and an on-line model application stage. In the off-line model training stage, the training of the TAD calculation model is realized, and the main function of the TAD calculation model is to complete the receiving of any wind speed and the input of the blade structure and output the corresponding optimal TAD. Then, in the "on-line model application phase", the trained TAD calculation model is used to quickly complete the optimal TAD search for a specific wind speed and a fixed blade structure. The optimization of the TAD calculation model is instructive and is carried out according to the aerodynamic performance of the environment model and expert experience, the exploration of the direction in the optimization process is realized, a large number of random search processes are avoided in the later period, and the efficiency and the accuracy of the optimization model training can be greatly improved. Meanwhile, the reinforcement learning method has reusability, can realize the functions of off-line training and on-line application, the reinforcement learning model trained in the off-line stage can be stored to realize the application in the on-line stage, the extremely time-consuming model training process is completed off-line, and the on-line model using process only involves extremely short calculation, so that the rapid design requirement of the optimal TAD of the wind turbine blade can be met.
The flow chart of the invention is shown in fig. 1, and specifically comprises the following steps:
step 1, the whole model training and using steps start from an off-line model training stage. The goal of the "offline model training phase" is to develop a trained TAD computational model that can find the optimal TAD at any wind speed. This phase is mainly composed of 3 large core components, including: "TAD calculation model", "environment model", and "reward function". Firstly, the TAD calculation model outputs the optimal torsion angle component of the blade structure under any wind speed by receiving the blade structure parameters of a fixed structureA cloth (TAD); then, the environment model receives the TAD generated by the TAD calculation model, the aerodynamic performance of the blade is analyzed, and parameters for representing the aerodynamic performance are many, such as energy cost, annual energy acquisition quantity, inertia moment, thrust, torque and the like; the present embodiment employs a wind energy capture coefficient (C) representative thereof p ) Performing blade performance evaluation as core performance; finally, the "reward function" is obtained by fusing the aerodynamic performance analysis results (the wind energy capture coefficient C in the embodiment) p ) And expert experience of blade design, and continuously guiding the training direction of the TAD calculation model: if the current TAD has better aerodynamic performance analysis results (i.e. better C) p Value), and if expert experience is satisfied, rewarding the current TAD calculation model (i.e. continuing the current training direction), otherwise penalizing the current TAD calculation model (i.e. changing the training direction), and implementing continuous training of the TAD calculation model through the reward and penalty system until convergence requirements are satisfied. The method specifically comprises the following substeps:
step 1.1, construct "environmental model"
The main function of the environment model is to realize the analysis of the aerodynamic performance of the blade, and the embodiment selects the wind energy acquisition coefficient (C) of the wind turbine blade p ) And performing performance evaluation. Typically, computational Fluid Dynamics (CFD), blade velocity momentum theory (BEM), or a proxy model is used as an "environmental model" to compute the aerodynamic performance of a wind turbine blade. In consideration of the timeliness of the calculation, the embodiment adopts a proxy model mode, and the proxy model has great advantages in calculation time and can quickly search for the optimal TAD. There are various proxy models, and the present embodiment adopts an Artificial Neural Network (ANN), which is a proxy model for "environmental model" to evaluate the wind energy capture coefficient (C) of the wind turbine blade p )。
The artificial neural network is a computational model based on the structure and function of the biological neural network, and in the embodiment, the network is trained by adopting a 4-layer ANN and a back propagation algorithm. Inputs to the ANN network are TAD and wind speed (V) w ) The output is the wind estimated by the networkCan obtain the coefficient (C) p ). The detailed parameters of the network are shown in table 1.
TABLE 1 neural network hyperparameters
Figure BDA0002584233480000051
/>
Figure BDA0002584233480000061
Step 1.2, construct the "reward function"
The reward function integrates the output of the environment model and the professional experience in blade design, and then gives reward and punishment information to guide the training direction of the TAD calculation model. It is considered that not only the current reward value influences the optimization process of TAD, but also the future reward value influences the generation of TAD of the next generation, which also conforms to the core idea of the markov decision chain. Therefore, to provide more effective and reasonable guidance, the present invention proposes a long-term reward mechanism that takes into account the impact of future reward values on current TAD optimization, which includes 3 immediate rewards based on blade design experience.
TAD optimization is a process of continuously adjusting the initial TAD, and this adjustment process can be regarded as an optimization trajectory τ i . Each track tau i Consists of T steps. R (tau) i ) Representing the track τ i The cumulative reward of (4), which can evaluate the optimization performance of the track, a higher value indicates a better TAD optimization. Once a satisfactory optimized trajectory τ with the highest reward is found i We consider the TAD optimization process to be complete. R is t Is the jackpot for the intermediate step t. At the track tau i TAD optimization decisions for each step will follow the highest R t The value principle, i.e. optimization of each TAD step, can result in the highest cumulative prize. Their reward formula is as follows
Figure BDA0002584233480000062
Figure BDA0002584233480000063
Wherein r is t′ Is the instant prize of step t'; γ is the discount rate. Gamma ray t′-t Represents the discount rate of all future steps t' to the current step t, since gamma is [0,1 ]]The farther the future step is from the current step t, the smaller the discount rate of the current step and the smaller the influence on the current prize. In the present invention, a jackpot R is awarded t As a long-term reward. Instant reward r t′ The method is mainly formed by combining aerodynamic performance obtained by an environment model and expert experience. The present embodiment illustrates a setting process of an instant reward by taking a wind energy obtaining coefficient as an example:
wind energy acquisition coefficient reward:
since the optimization of the blade design in the present invention aims to obtain a higher wind energy capture coefficient (C) p ) Thus C p The larger the value, the higher the prize value given. The wind energy capture coefficient reward function is as follows:
r 1 =C p *10
engineering experience awards:
in the field of vane TAD design, there are two general engineering experiences with the form of TAD: 1) The torsion angle is monotonically decreased; 2) The maximum and minimum range of the torsion angle is limited. Based on these 2 engineering experiences, we propose two engineering experience rewards. Furthermore, to represent the continuous variable TAD, we use discrete twist angles for the n blade cross-sections.
X=[Tw 1 ,Tw 2 ,…Tw i …,Tw n-1 ,Tw n ]i=1,2,…,n
L<Tw i <U
Wherein, tw i Is the ith th Twist angle of the cross section of each blade. L and U represent the upper and lower limits of the twist angle, respectively.
A) Monotonically decreasing:
for wind turbine blade design, to obtain a good C p The Twist Angle Distribution (TAD) of the blade needs to follow a monotonically decreasing engineering experience, i.e. the twist angle decreases monotonically from the blade root to the blade tip. Thus, for any continuous blade cross-section, when 1 ≦ i<i +1 is less than or equal to n, there is an inequality Tw i ≥Tw i+1 Then a positive reward (+ 0) is awarded. However, once Tw appears i <Tw i+1 Then a reverse penalty (-10) will be set. The monotonically decreasing reward function formula is as follows:
r 2 =-10*u
u denotes the number of times that the monotonically decreasing adjustment is not satisfied.
B) And (3) range constraint:
after a literature review of blade TAD optimization, we found that the twist angle of the blade is almost in the range of [ -5, +45], and therefore this twist angle range constraint participates as another engineering experience in the construction of the reward function. The range-bound reward function formula is as follows:
r 3 =-10*N
n is the number of times the twist angle range is exceeded.
The wind energy acquisition coefficient reward and the engineering experience reward are fused:
based on TAD search knowledge and a large number of experiments, we find that the instant reward function composition structure shown in FIG. 3 can achieve the best learning efficiency of the RL-TAD model. After TAD is entered, a "monotonically decreasing" reward (r) is first calculated 2 ). If the monotone decreasing principle is not satisfied, the instant reward is directly set as r 2 . If the adjustment and decrease principle is satisfied, the 'monotone decrease' reward is set as 0, the 'range limit' reward and the 'wind energy acquisition coefficient' reward are continuously calculated and added to form the final instant reward r. This structure shows that monotonic decrease is the main condition for RL-TAD model training, and the "range limit" constraint and the "wind energy capture coefficient" target are not considered until the "monotonic decrease" constraint is satisfied.
Step 1.3, constructing a TAD calculation model "
The functions realized by the TAD calculation model are as follows: the Twist Angle Distribution (TAD) of the blade structure can be found under any wind speed by receiving the blade with a fixed structure. Optimization models such as gradient-based optimization models (newton method, steepest descent method, batch gradient descent method), evolutionary algorithm-based optimization models (genetic algorithm, particle swarm algorithm, differential evolution algorithm), and the like, and agents, and the like, may be used. The present embodiment employs the "agent" for TAD calculation in consideration of reusability of the "agent" and the ability to adapt to a variable environment. In general, methods such as Q-learning, deep Q Network, policy Gradient, actor-Critic and Deep Deterministic Policy Gradient can be used to construct and train the "agent". In the embodiment, the agent is constructed and trained by adopting the Actor-Critic learning method, and the learning method has high training efficiency and high speed. The "broker" architecture consists of two basic components: an "action executor" and a "status commenter". The long-term prize value generated in the prize function is translated by the "status evaluator" into an "agent" internal prize value, which is then provided to an internal "action executor" for guidance in generating a new TAD. In the present embodiment, both the "action performer" and the "state evaluator" are parameterized by a neural network.
The ultimate goal of the "agent" is always to obtain the highest reward return. Therefore, the objective function definition that collects long-term reward values is the main task. We construct the objective function and its gradient as follows:
Figure BDA0002584233480000091
Figure BDA0002584233480000092
where θ is a weight parameter of the "action executor" Actor neural network. In the present invention, the state S at step t is defined t Is the wind speed V w And set of TAD, i.e. s t =(V w TAD); action a t The modified size for TAD is indicated, i.e. a = Δ TAD. Pi θ (a t |s t ) Is an action execution strategy function defined by theta, which represents TAD at a specific wind speed, and needs to be subjected to a for increasing the reward value t The probability of this event occurring is modified. The higher the probability the more likely this action a will be performed t 。R t Indicates that at the t-th step, action a is performed t Long term prize value brought, so
Figure BDA0002584233480000093
Represents the sum of the reward values of an optimized track from the 1 st step to the last T step. Meanwhile, in order to improve accuracy and effectiveness of reward measurement and calculation, summarizing calculation of m optimized tracks is carried out, and the expected reward is obtained to obtain a final objective function J (theta). Finally, the gradient of J (theta) is used for training the action execution strategy function pi θ (·)。
To further refine the objective function, it is necessary to have a reward value R for each step t And (6) adjusting. First, an action cost function is introduced into the reward of each step
Figure BDA0002584233480000094
Q(s t ,a t ) Indicating that a specific action a is performed in a certain state t The desired prize value of. At the same time, a state cost function is introduced
Figure BDA0002584233480000101
V(s t ) Is shown in state s t All possible actions a are performed in case of t The desired prize value brought. The objective function gradient is as follows.
Figure BDA0002584233480000102
Q(s t ,a t )-V(s t ) Is shown in state s t Performing a specific action a t The difference between the value of the reward accrued and the average reward accrued for performing all actions, if Q(s) t ,a t )>V(s t ) Then, it means to perform action a t The result is a positive reward, and vice versa. Experiments prove that the relative reward function can achieve better model training effect. And due to Q(s) t ,a t )=r t+1 +γV(s t+1 ) The objective function gradient is as follows.
Figure DEST_PATH_IMAGE001
Action execution strategy pi in objective function θ (a t |s t ) Represented by a "motion actuator" neural network, a cost function V(s) t ) Represented by a "motion actuator" neural network. Training the neural network parameters of the action executor based on the objective function gradient, wherein a parameter updating formula is shown as follows:
Figure BDA0002584233480000104
"State estimator" neural network parameter update dependent on realistic reward value r t+1 +γV(s t+1 ) Difference from the evaluation prize value: delta TD (t)=r t+1 +γV(s t+1 )-V(s t ) The parameter update formula is as follows:
Figure BDA0002584233480000105
wherein, θ, ω are parameters of the two networks, respectively; α, β represent the learning rates of the two networks, respectively.
TABLE 2 neural network hyper-parameters in A-C learning model
Figure BDA0002584233480000106
It should be noted that the "agent" is composed of nerves, and the network parameters thereof are continuously updated by incremental training, that is, after the training of the "agent" at a certain wind speed is completed, the new "agent" inherits the network parameters of the old "agent" for the new wind speed.
And 2, generating a TAD generator of the online application based on the trained agent. The TAD generator can perform real-time search of optimal TAD and perform quick output for any random wind speed and fixed wind turbine structure.
In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A method for rapidly designing and optimizing a wind turbine blade based on reinforcement learning is characterized by comprising the following steps:
step 1, constructing a TAD calculation model and an environment model;
the TAD calculation model calculates the optimal TAD according to the wind speed and the blade structure;
the environment model carries out the aerodynamic performance analysis of the blade according to TAD generated by the TAD calculation model;
step 2, training the TAD calculation model by adopting a reinforcement learning method to obtain a trained TAD calculation model; the method adopts a long-term reward mechanism to train the TAD calculation model in the reinforcement learning method, wherein a reward function in the long-term reward mechanism is
Figure FDA0003978019750000011
Wherein R (tau) i ) For learning τ for the ith time i The jackpot of (1); t is the total number of steps of the study; gamma is the conversion rate, r t′ Awarding the instant points in the step t' in the study; r t Long-term awards for step t; the instant prize r t′ Aerodynamics derived from environmental modelsPerformance is obtained by combining expert experience;
and 3, adopting the trained TAD calculation model, and carrying out real-time search and output of the optimal TAD according to the current wind speed and the blade structure.
2. The reinforcement learning-based wind turbine blade rapid design optimization method of claim 1, wherein the environmental model adopts a wind energy acquisition coefficient as an aerodynamic performance analysis result.
3. The reinforcement learning-based wind turbine blade rapid design optimization method of claim 2, wherein the instant reward is obtained according to the following modes: calculating whether the current TAD follows a monotonic decrease, and if not, awarding r immediately t′ = 10u, wherein u represents the number of times that the monotonically decreasing adjustment is not satisfied; if followed, then an immediate reward r t′ =10C p -10N, wherein C p For wind energy capture coefficients, N is the number of times the twist angle range is exceeded.
4. The reinforcement learning-based wind turbine blade rapid design optimization method of claim 1 or 2, wherein the environmental model is obtained by using computational fluid dynamics, a blade velocity momentum theory or a proxy model.
5. The reinforcement learning-based wind turbine blade rapid design optimization method of claim 4, wherein the environment model adopts an artificial neural network agent model.
6. The wind turbine blade rapid design optimization method based on reinforcement learning as claimed in claim 5, wherein the environment model is a 4-layer artificial neural network, and the propagation mode is back propagation; inputs to the artificial neural network are TAD and wind speed V w The output is the wind energy capture coefficient C after the network evaluation p
7. The reinforcement learning-based wind turbine blade rapid design optimization method of claim 1, wherein the TAD calculation model adopts an optimization model or an agent.
8. The reinforcement learning-based wind turbine blade rapid design optimization method of claim 7, wherein an Actor-Critic learning method is adopted to construct and train the agent; the agent comprises an action executor and a state evaluator; wherein the long-term prize value generated in the prize function is translated by the state evaluator into an agent internal prize value, which is then provided to the internal action performer for guidance in generating a new TAD.
9. The method as claimed in claim 8, wherein the action executor and the state evaluator are parameterized by a neural network.
CN202010676474.3A 2020-07-14 2020-07-14 Wind turbine blade rapid design optimization method based on reinforcement learning Active CN111914361B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010676474.3A CN111914361B (en) 2020-07-14 2020-07-14 Wind turbine blade rapid design optimization method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010676474.3A CN111914361B (en) 2020-07-14 2020-07-14 Wind turbine blade rapid design optimization method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN111914361A CN111914361A (en) 2020-11-10
CN111914361B true CN111914361B (en) 2023-03-31

Family

ID=73280328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010676474.3A Active CN111914361B (en) 2020-07-14 2020-07-14 Wind turbine blade rapid design optimization method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN111914361B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113821903B (en) * 2021-07-09 2024-02-06 腾讯科技(深圳)有限公司 Temperature control method and equipment, modularized data center and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103649528A (en) * 2011-05-19 2014-03-19 米塔科技有限公司 Method of wind turbine yaw angle control and wind turbine
CN111241752A (en) * 2020-01-16 2020-06-05 北京航空航天大学 Centrifugal impeller comprehensive optimization method based on digital twinning and reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9310165B2 (en) * 2002-05-18 2016-04-12 John Curtis Bell Projectile sighting and launching control system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103649528A (en) * 2011-05-19 2014-03-19 米塔科技有限公司 Method of wind turbine yaw angle control and wind turbine
CN111241752A (en) * 2020-01-16 2020-06-05 北京航空航天大学 Centrifugal impeller comprehensive optimization method based on digital twinning and reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Research on Evolutionary Level Set Method and Gaussian Mixture Model Based Target Shape Design Optimization Problem 》;Liangyue Jia 等;《Journals & Magazines:Journals & Magazines》;20190813;第7卷;正文第104096 - 104107页 *
《基于空气动力学的风力机优化设计》;梁孟;《电子技术与软件工程》;20190315(第6期);正文第205页 *

Also Published As

Publication number Publication date
CN111914361A (en) 2020-11-10

Similar Documents

Publication Publication Date Title
CN112668235B (en) Robot control method based on off-line model pre-training learning DDPG algorithm
CN108133258B (en) Hybrid global optimization method
CN110806759A (en) Aircraft route tracking method based on deep reinforcement learning
Qu et al. An improved TLBO based memetic algorithm for aerodynamic shape optimization
CN104102769B (en) Artificial intelligence-based method for establishing real time part level model of turbo shaft engine
CN103729695A (en) Short-term power load forecasting method based on particle swarm and BP neural network
CN110674965A (en) Multi-time step wind power prediction method based on dynamic feature selection
Jia et al. A reinforcement learning based blade twist angle distribution searching method for optimizing wind turbine energy power
CN114362175B (en) Wind power prediction method and system based on depth certainty strategy gradient algorithm
Baheri et al. Altitude optimization of airborne wind energy systems: A Bayesian optimization approach
CN111914361B (en) Wind turbine blade rapid design optimization method based on reinforcement learning
CN115409645A (en) Comprehensive energy system energy management method based on improved deep reinforcement learning
Hein et al. Generating interpretable fuzzy controllers using particle swarm optimization and genetic programming
CN113156900A (en) Machining deformation control method based on meta reinforcement learning
Khalil et al. A novel cascade-loop controller for load frequency control of isolated microgrid via dandelion optimizer
CN111832911A (en) Underwater combat effectiveness evaluation method based on neural network algorithm
CN114139778A (en) Wind turbine generator power prediction modeling method and device
Tong et al. Enhancing rolling horizon evolution with policy and value networks
CN113033012A (en) Hierarchical data-driven wind power plant generated power optimization scheme
CN110598911B (en) Wind speed prediction method for wind turbine of wind power plant
CN111563614A (en) Load prediction method based on adaptive neural network and TLBO algorithm
CN116663637A (en) Multi-level agent synchronous nesting training method
CN115864409A (en) Power grid section power adjustment strategy based on deep reinforcement learning
CN116484675A (en) Crack propagation life prediction method and system for ship engine blade
Zhang et al. Gliding control of underwater gliding snake-like robot based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant