CN113885328A - Nuclear power tracking control method based on integral reinforcement learning - Google Patents

Nuclear power tracking control method based on integral reinforcement learning Download PDF

Info

Publication number
CN113885328A
CN113885328A CN202111212559.7A CN202111212559A CN113885328A CN 113885328 A CN113885328 A CN 113885328A CN 202111212559 A CN202111212559 A CN 202111212559A CN 113885328 A CN113885328 A CN 113885328A
Authority
CN
China
Prior art keywords
nuclear power
evaluation network
iteration
strategy
tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111212559.7A
Other languages
Chinese (zh)
Inventor
仲伟峰
王蒙轩
赵晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN202111212559.7A priority Critical patent/CN113885328A/en
Publication of CN113885328A publication Critical patent/CN113885328A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Monitoring And Testing Of Nuclear Reactors (AREA)

Abstract

The invention discloses a nuclear power tracking control method based on integral reinforcement learning, which comprises the following steps: selecting an initial strategy, initializing relevant parameters, and selecting an initial power point and an expected power point; starting global iteration, starting local iteration, training an evaluation network by utilizing a strategy iteration integral reinforcement learning algorithm, correcting a network weight, wherein the evaluation network is used for approximating a tracking error performance index function, evaluating the performance of a current tracking error control system by utilizing the evaluation network weight, selecting an optimal control strategy through an execution process, and minimizing the total cost of one-time global iteration; judging whether the current local iteration is finished, if not, returning to the local iteration, otherwise, updating an iteration performance index function and a tracking control law to obtain an optimal tracking control strategy; and (4) completing the iteration of the global strategy to obtain an optimal tracking control strategy, tracking to an expected power point, and calculating the total cost. Therefore, the invention can continuously learn and adjust the current strategy to track to the expected power point.

Description

Nuclear power tracking control method based on integral reinforcement learning
Technical Field
The embodiment of the invention relates to the technical field of power control of nuclear power units, in particular to a nuclear power tracking control method based on integral reinforcement learning.
Background
In recent years, due to coal combustion power generation, the greenhouse effect and air pollution caused by the coal combustion power generation are increasingly serious, and the resource reserve amount of the coal combustion power generation is also reduced year by year. The nuclear energy is used as a clean energy source, has the advantages of no pollution and low transportation cost, is widely concerned by various countries, and is applied and popularized to the power generation industry. The safety of nuclear power systems is also always concerned by various fields, so the problem of power regulation becomes a focus. A stable, safe and efficient power control method of a nuclear power unit is particularly important for the whole nuclear power industry.
In view of the above, the present invention is particularly proposed.
Disclosure of Invention
In view of the above, the present invention is proposed to provide a nuclear power tracking control method based on reinforcement integral learning, which at least partially solves the above problems.
In order to achieve the above object, according to one aspect of the present invention, the following technical solutions are provided:
a method of nuclear power tracking control based on reinforcement integral learning, the method comprising:
s1: selecting an initial strategy, initializing relevant parameters, and selecting an initial power point and an expected power point;
s2: performing global iteration, and updating an iterative tracking error performance index function according to an iterative control sequence to obtain an optimal tracking error performance index function;
s3: performing local iteration, training an evaluation network by using an integral reinforcement learning algorithm, correcting the weight of the evaluation network, and obtaining an optimal error control strategy by using the optimal tracking error performance index function;
s4: judging whether the current local iteration is finished, if not, returning to the local iteration step, otherwise, updating the iterative tracking error performance index function and the control law to obtain the optimal tracking error performance index function;
s5: and (4) completing the iteration of the global strategy to obtain an optimal tracking control strategy, tracking to an expected power point, and calculating the total cost.
Compared with the prior art, the technical scheme at least has the following beneficial effects:
the embodiment of the invention constructs the self-learning power tracking controller based on the self-adaptive dynamic programming algorithm through the neural network, can continuously learn, adjust and adapt to different nuclear power states through real-time operation, and can track the working points of different nuclear power units.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention, are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention without limiting the invention to the right. It is obvious that the drawings in the following description are only some embodiments, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 is a schematic illustration of a nuclear power system model shown in accordance with an exemplary embodiment;
fig. 2 is a flowchart illustrating a nuclear power generating unit power tracking control method based on integral intensification according to an exemplary embodiment.
Detailed Description
In order to more clearly illustrate the objects, technical solutions and advantages of the present invention, the present invention is further described in detail below with reference to the accompanying drawings in combination with specific examples.
Adaptive dynamic programming has evolved rapidly since the 80's of the 20 th century, as proposed by Paul j. The method is mainly used for solving the problem of 'dimension disaster' in dynamic planning, and the specific solution method is to solve through multiple iterative optimization. In recent years, adaptive dynamic programming algorithms have shown great advantages in solving optimal control. The adaptive dynamic programming method generally uses a controller-evaluator (operator-critical) structure and a neural network to approximate the tracking error performance index function and the control strategy, gradually approximates the equation analytic solution by adopting an iterative method, and finally converges to the optimal tracking error performance index function and the optimal tracking control strategy.
The self-adaptive dynamic programming method utilizes a function approximation structure (such as a neural network) to approximate a tracking error performance index function and a control strategy in a dynamic programming equation so as to meet the optimization principle, thereby obtaining the optimal error control and the optimal tracking error performance index function of the system. The self-adaptive dynamic planning structure mainly comprises a dynamic system, a control network and an evaluation network. The evaluation network is used for approximating the optimal cost function and giving an evaluation guide to execute the network to generate optimal control. After the output of the execution network acts on the dynamic system, the evaluation network is influenced through rewards/punishments generated at different stages of the dynamic system, and the update control strategy of the execution network is known, so that the total cost (namely the sum of the rewards/punishments) reaches the optimal value.
The method for the integral reinforcement learning self-adaptive dynamic planning does not depend on a system model, and the weights of the controller and the evaluator neural network are adjusted based on the system state generated in real time and corresponding control actions. Finally, the integration reinforcement learning self-adaptive dynamic programming method can be operated on line, and the controller and the evaluator neural network can be finally converged to the optimal control strategy and the optimal tracking error performance index function in an iterative mode. The method is particularly suitable for solving the optimal control problem on line of a linear or nonlinear continuous system.
FIG. 1 is a schematic diagram of a nuclear power system in which an embodiment of the present invention is applied, and schematically illustrates a reaction heat transfer model diagram of the nuclear power system. The nuclear power system consists of one reactor and two cooling stacks. Wherein Q only represents heat transfer and has no practical meaning for a nuclear power system model. The nuclear Power system comprises five system states, wherein the Power percentage represents the generated Power percentage of the system (the full-load generated Power is 2500 MW); the Delayed nuclear concentration represents the relative concentration of Delayed neutrons in a reaction kettle of the nuclear power system; the Reactor core Temperature is the average Temperature of the Reactor core of the nuclear power system (meanwhile, T can be usedfRepresents); coolant output Temperature represents the average Temperature of the Coolant inside the nuclear power system; the Reactor coefficient represents the reactivity change of the nuclear power system caused by the up-and-down movement of the control rod. The system only uses the reaction speed of the control rod as a control signal, and when the control rod moves up and down at a certain speed, the internal reaction of the reactor core of the system changes along with the control rod. The faster the control rod moves upward and the more violent the reaction. The control rod moves downwards, and vice versa.
As shown in fig. 2, an embodiment of the present invention provides a method for tracking and controlling power of a nuclear power system based on integral reinforcement learning, where the method may include step S1 and step S5.
S1: the initialization parameters include: nuclear power system parameters, evaluation network parameters, global iteration duration, integral time constant, local iteration duration, convergence accuracy and target parameters; the nuclear power system parameters are nuclear power model system parameters, and the model comprises five system input and output states.
The nuclear power system model mainly comprises a reactor core internal neutron reaction equation, two temperature feedback models of a reactor and a reactivity equation of a control rod. In the study of reactor characteristics, a control rod control method is often used. Because the control rod has very strong neutron absorbing capacity, and the translation rate is easily controlled moreover, convenient operation, the influence of the high control rod of accuracy to reactivity control to the reactivity can embody through two kinds of modes: a change in position and a change in velocity.
In addition, selection of an initial power operating point and a desired power operating point is required, and an initial stability control strategy is determined. The following parameters are also initialized: global training time-step, local iteration time-step, neural network structure (such as number of input nodes, number of hidden nodes, and number of output layer nodes), neural network weights.
Illustratively, the structure of the evaluation network is set to be 5-15-1, wherein 5 is the number of input nodes of the evaluation network, 15 is the number of hidden nodes of the evaluation network, 1 is the number of output nodes of the evaluation network, the number of hidden nodes can be adjusted according to experience to obtain the best approximation effect, and the convergence precision is defined to be 1.0 multiplied by 10-2
In the execution stage, the embodiment of the invention uses the simplified finite dimension control variable, namely, the finite and determined nuclear power working condition point is set for tracking.
In practical application, the selection of the initial working condition point and the expected working condition point can be set according to actual requirements, wherein the power model and parameter setting of the nuclear power unit also need to have practical significance.
S2: when global training is carried out, updating an iterative tracking error performance index function according to an iterative control sequence so as to obtain an optimality tracking error performance index function;
specifically, according to the requirement of the integral reinforcement learning method of the controller, weight initialization training work needs to be performed on the evaluation network.
Training an evaluation network by using an integral reinforcement learning algorithm: evaluating the input values of the network includes: five states x (t) of nuclear power unit working point and five states x of nuclear power unit expected working pointd(t) nuclear power unit tracking error control strategy ue(t) the output value is a tracking error performance indicator function Je(t) of (d). Wherein, Je(t) the tracking error performance indicator function is referred to as the J function for short. Optimal tracking error control strategy ue(t) is approximated by a tracking error performance indicator function obtained from the evaluation network.
The weight initialization of the evaluation network is performed within the global iteration. Preferably, the weight value can be initialized again when global iteration starts each time, so that the convergence of the evaluation network is better ensured on the basis of ensuring the stability and the convergence speed of the evaluation network, and an optimal tracking control strategy of the power of the nuclear power system can be found as soon as possible.
In the execution stage, input data of the evaluation network are five state outputs x (t) and an expected power point x of the nuclear power unitd(t) difference xe(t) and an optimal tracking error control strategy u obtained from the trained evaluation networke(t) of (d). Evaluating the output data of the network as a tracking error performance index function Je(t)。
According to the Bellman equation, utilizing the output J of the evaluation network at the next momente(T + T) and utility function U (T) are calculated to obtain output data J at the current momente(t), the calculation formula is as follows:
Figure RE-GDA0003392010600000061
using global iterative error control law
Figure RE-GDA0003392010600000062
To update the global iteration JeA function.
The following example describes the process of obtaining the optimal tracking error performance indicator function in detail.
Let t time, x (t) be five input and output states of the nuclear power unit at the time, xd(t) As the desired power point, we have a systematic tracking error xe(t),ue(t) a tracking error control strategy; the error control system can be defined as:
xe(t+1)=f(x(t)-xd(t),ue(t),t)
wherein f can be derived from a nuclear power unit power model. The utility function is defined as follows:
U(t)=α[xe(t)]2+β[ue(t)]2
wherein α and β are constants; u. ofeAnd (t) is the difference value of the control law of the nuclear power unit at the current time and the expected working control law. And the utility function U (t) represents the sum of the difference value of the current working point and the expected working point of the nuclear power unit at the time t and the utility of the control rod control law.
We give a new form of utility function:
Figure RE-GDA0003392010600000071
wherein, Q and R are positive definite matrices, and our global tracking error performance index function can be defined as:
Figure RE-GDA0003392010600000072
the Hamiltonian equation can be derived as follows:
Figure RE-GDA0003392010600000073
then we have one
Figure RE-GDA0003392010600000074
Such that the following equation is satisfied:
Figure RE-GDA0003392010600000075
the optimal tracking error control law can be expressed as:
Figure RE-GDA0003392010600000076
defining initial error control law
Figure RE-GDA0003392010600000077
For the
Figure RE-GDA0003392010600000078
And
Figure RE-GDA0003392010600000079
we have
Figure RE-GDA00033920106000000710
Where i is 0,1,2, …, the error tracking control law can be obtained by the following equation:
Figure RE-GDA00033920106000000711
when the temperature is within a range of T → ∞,
Figure RE-GDA00033920106000000712
it converges to an optimum value.
S3: performing local iteration, training an evaluation network by using an integral reinforcement learning algorithm, correcting the weight of the evaluation network, and obtaining an optimal error control strategy by using the optimal tracking error performance index function;
the goal of the local training iteration is to obtain the optimum
Figure RE-GDA00033920106000000810
Under the condition of given initial stable control strategy, let us make the control law ue 0. Let the integration duration T equal to 1, and select the local training iteration duration as 30 steps.
The tracking error performance index function updating rule is as follows:
Figure RE-GDA0003392010600000081
the optimal error control law update rule is as follows:
Figure RE-GDA0003392010600000082
when the temperature is T → ∞ times,
Figure RE-GDA0003392010600000083
will converge to the optimum value
Figure RE-GDA0003392010600000084
Then, the weight of the evaluation network is updated to approximate the optimal tracking error performance index function.
Wherein, the updating rule is as follows:
Figure RE-GDA0003392010600000085
Figure RE-GDA0003392010600000086
Figure RE-GDA0003392010600000087
Figure RE-GDA0003392010600000088
WCL=-(XTX)-1(XTY)
wherein,
Figure RE-GDA0003392010600000089
for evaluating weight vector deviation of network, X is weight vector inner product difference of network, Y is utility function value of network approximation, WCLTo evaluate the weight of the network.
Since the error control strategy and the tracking error performance indicator function change with the weights of the controller and the evaluator neural network, adjusting the weights of the controller and the evaluator neural network means updating of the error control strategy and the tracking error performance indicator function. In the execution stage, limited control variables are substituted into the optimal tracking error performance index function approximated by the evaluation network
Figure RE-GDA0003392010600000091
The optimal error control strategy is obtained approximately according to a tracking error performance index function obtained by an evaluation network, and a control variable which enables the optimal tracking error performance index function to be minimum is selected as the optimal tracking error control strategy:
Figure RE-GDA0003392010600000092
the evaluation network is used for approximating an optimal tracking error performance index function, evaluating the performance of the nuclear power control rod system by using the evaluation network weight, and selecting an optimal tracking control strategy through an execution flow to minimize the total tracking error cost of global training.
S4: judging whether the current local iteration is finished, if not, returning to the local iteration step, otherwise, updating the iterative tracking error performance index function and the error control law to obtain the optimal tracking error performance index function;
specifically, after local iteration is completed, whether the current iteration number reaches an iteration threshold value is determined, and if yes, an iterative tracking error performance index function and an error control law are updated to obtain an optimal tracking error performance index function and an optimal error control strategy.
If not, go to step S3; otherwise, step S5 is executed.
S5: and (4) finishing the iteration of the global strategy to obtain an optimal tracking error control strategy, tracking to a desired power point, and calculating the total cost (tracking error and control rod control cost).
Calculation of the total cost requires an optimal tracking error control strategy
Figure RE-GDA0003392010600000093
Substitution into the actual model, here due to the utility function U (x)e,ue) Is dependent on the actual model, so the total cost can be approximated to the resulting optimality tracking error performance indicator function
Figure RE-GDA0003392010600000094
Although the steps in this embodiment are described in the foregoing sequence, those skilled in the art will understand that, in order to achieve the effect of this embodiment, the different steps need not be executed in such a sequence, and may be executed simultaneously (in parallel) or in an inverted sequence, and these simple changes are all within the protection scope of the present invention. The technical solutions provided by the embodiments of the present invention are described in detail above. Although specific examples have been employed herein to illustrate the principles and practice of the invention, the foregoing descriptions of embodiments are merely provided to assist in understanding the principles of embodiments of the invention; also, it will be apparent to those skilled in the art that variations may be made in the embodiments and applications of the invention without departing from the spirit and scope of the invention.
It should be noted that the flowcharts mentioned herein are not limited to the forms shown herein, and may be divided and/or combined.
It should be noted that: the numerals and text in the figures are only used to illustrate the invention more clearly and are not to be considered as an undue limitation of the scope of the invention.
The present invention is not limited to the above-described embodiments, and any variations, modifications, or alterations that may occur to one skilled in the art without departing from the spirit of the invention fall within the scope of the invention.

Claims (8)

1. A nuclear power system power tracking control method based on integral reinforcement learning is characterized by comprising the following steps:
s1: selecting an initial strategy, initializing relevant parameters, and selecting an initial power point and an expected power point;
s2: performing global iteration, and updating an iterative tracking error performance index function according to an iterative control sequence to obtain an optimal tracking error performance index function;
s3: performing local iteration, training an evaluation network by using an integral reinforcement learning algorithm, correcting the weight of the evaluation network, and obtaining an optimal tracking control strategy by using the optimal tracking performance index function;
s4: judging whether the current local iteration is finished, if not, returning to the local iteration step, otherwise, updating the iterative tracking error performance index function and the tracking control law to obtain the optimal tracking error performance index function;
s5: and (4) completing the iteration of the global strategy to obtain an optimal tracking control strategy, tracking to an expected power point, and calculating the total cost.
2. The method according to claim 1, wherein in the step S1, the initialization parameters comprise: nuclear power system parameters, evaluation network parameters, global iteration duration, integral time constant, local iteration duration, convergence accuracy and target parameters; the nuclear power system parameters are nuclear power model system parameters, and the model comprises five system input and output states.
3. The method of claim 2Method, characterized in that the structure of the evaluation network is set to 5-15-1 and the convergence accuracy is defined to be 1.0 x 10-2Wherein, 5 is the number of input nodes of the evaluation network, 15 is the number of hidden nodes of the evaluation network, and 1 is the number of output nodes of the evaluation network.
4. The method of claim 1, wherein the step S1 further comprises selecting an initial control strategy, wherein the error control strategy is obtained by a conventional PID or MPC strategy, so as to obtain an initial stable control rate.
5. The method of claim 1, wherein in step S3, the input data of the evaluation network includes 5 operating states x (t) of the nuclear power plant and an operating state point x of the desired powerd(t) tracking error value xe(t), and tracking control strategy u for nuclear power control rodse(t); the output data of the evaluation network comprises: tracking error performance indicator function Je(t);
According to the Bellman equation, utilizing the output J of the next integration moment of the evaluation networke(T + T) and a utility function U (T), and calculating output data J at the current moment by the following formulae(t):
Figure FDA0003307990460000021
Wherein x ise(t) is the working state point x of 5 working states x (t) and expected power of the nuclear power unitd(t) tracking error value xe(t); utility function U (t) represents the tracking error value x at time te(t) and tracking control strategy u of nuclear power control rode(t) sum of the utilities.
6. The method of claim 5, wherein the utility function U (t) is calculated by:
U(t)=α[xe(t)]2+β[ue(t)]2
wherein α and β are constants; u. ofeAnd (t) is the difference value of the control law of the nuclear power unit at the current time and the expected working control law.
7. The method of claim 1, wherein in the step S3, the input data of the execution phase of the evaluation network includes relative power coefficient of the nuclear power plant to be controlled, relative concentration of delayed neutrons, average temperature of the reactor core, average temperature of coolant, and reactivity of control rods; the output data of the execution stage of the evaluation network comprises an optimal tracking control strategy; and the optimal tracking control strategy is obtained approximately according to a tracking error performance index function obtained by the evaluation network.
8. The method according to claim 1, wherein in the step S3, the update rule of the evaluation network is as follows:
Figure FDA0003307990460000031
Figure FDA0003307990460000032
Figure FDA0003307990460000033
Figure FDA0003307990460000034
WCL=-(XTX)-1(XTY)
wherein,
Figure FDA0003307990460000035
for evaluating weight vector deviation of network, X is weight vector inner product difference of network, Y is utility function value of network approximation, WCLTo evaluate the weight of the network.
CN202111212559.7A 2021-10-18 2021-10-18 Nuclear power tracking control method based on integral reinforcement learning Pending CN113885328A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111212559.7A CN113885328A (en) 2021-10-18 2021-10-18 Nuclear power tracking control method based on integral reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111212559.7A CN113885328A (en) 2021-10-18 2021-10-18 Nuclear power tracking control method based on integral reinforcement learning

Publications (1)

Publication Number Publication Date
CN113885328A true CN113885328A (en) 2022-01-04

Family

ID=79003527

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111212559.7A Pending CN113885328A (en) 2021-10-18 2021-10-18 Nuclear power tracking control method based on integral reinforcement learning

Country Status (1)

Country Link
CN (1) CN113885328A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880942A (en) * 2022-05-23 2022-08-09 西安交通大学 Nuclear reactor power and axial power distribution reinforcement learning decoupling control method
CN117075588A (en) * 2023-10-18 2023-11-17 北京网藤科技有限公司 Safety prediction fitting method and system for industrial automation control behaviors

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103217899A (en) * 2013-01-30 2013-07-24 中国科学院自动化研究所 Q-function self-adaptation dynamic planning method based on data
CN104022503A (en) * 2014-06-18 2014-09-03 中国科学院自动化研究所 Electric-energy optimal control method for intelligent micro-grid with energy storage device
CN105843037A (en) * 2016-04-11 2016-08-10 中国科学院自动化研究所 Q-learning based control method for temperatures of smart buildings
US20190384237A1 (en) * 2018-06-13 2019-12-19 Mitsubishi Electric Research Laboratories, Inc. System and Method for Data-Driven Output Feedback Control
CN111650830A (en) * 2020-05-20 2020-09-11 天津大学 Four-rotor aircraft robust tracking control method based on iterative learning
CN111679577A (en) * 2020-05-27 2020-09-18 北京交通大学 Speed tracking control method and automatic driving control system of high-speed train

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103217899A (en) * 2013-01-30 2013-07-24 中国科学院自动化研究所 Q-function self-adaptation dynamic planning method based on data
CN104022503A (en) * 2014-06-18 2014-09-03 中国科学院自动化研究所 Electric-energy optimal control method for intelligent micro-grid with energy storage device
CN105843037A (en) * 2016-04-11 2016-08-10 中国科学院自动化研究所 Q-learning based control method for temperatures of smart buildings
US20190384237A1 (en) * 2018-06-13 2019-12-19 Mitsubishi Electric Research Laboratories, Inc. System and Method for Data-Driven Output Feedback Control
CN111650830A (en) * 2020-05-20 2020-09-11 天津大学 Four-rotor aircraft robust tracking control method based on iterative learning
CN111679577A (en) * 2020-05-27 2020-09-18 北京交通大学 Speed tracking control method and automatic driving control system of high-speed train

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880942A (en) * 2022-05-23 2022-08-09 西安交通大学 Nuclear reactor power and axial power distribution reinforcement learning decoupling control method
CN114880942B (en) * 2022-05-23 2024-03-12 西安交通大学 Nuclear reactor power and axial power distribution reinforcement learning decoupling control method
CN117075588A (en) * 2023-10-18 2023-11-17 北京网藤科技有限公司 Safety prediction fitting method and system for industrial automation control behaviors
CN117075588B (en) * 2023-10-18 2024-01-23 北京网藤科技有限公司 Safety prediction fitting method and system for industrial automation control behaviors

Similar Documents

Publication Publication Date Title
CN113885328A (en) Nuclear power tracking control method based on integral reinforcement learning
CN109901403A (en) A kind of face autonomous underwater robot neural network S control method
CN104991444B (en) Non-linearity PID self-adaptation control method based on Nonlinear Tracking Differentiator
Taeib et al. Tuning optimal PID controller
CN111324167B (en) Photovoltaic power generation maximum power point tracking control method
Gouadria et al. Comparison between self-tuning fuzzy PID and classic PID controllers for greenhouse system
Chidrawar et al. Generalized predictive control and neural generalized predictive control
CN113868961A (en) Power tracking control method based on adaptive value iteration nuclear power system
CN114722693A (en) Optimization method of two-type fuzzy control parameter of water turbine regulating system
Kostadinov et al. Online weight-adaptive nonlinear model predictive control
Ramírez et al. Min-max predictive control of a heat exchanger using a neural network solver
CN116755409B (en) Coal-fired power generation system coordination control method based on value distribution DDPG algorithm
CN115327890B (en) Method for optimizing main steam pressure of PID control thermal power depth peak shaving unit by improved crowd searching algorithm
Yu et al. A Knowledge-based reinforcement learning control approach using deep Q network for cooling tower in HVAC systems
CN116880191A (en) Intelligent control method of process industrial production system based on time sequence prediction
Feng et al. Nonlinear model predictive control for pumped storage plants based on online sequential extreme learning machine with forgetting factor
Berger et al. Neurodynamic programming approach for the PID controller adaptation
Wakitani et al. Design and application of a data-driven PID controller
Hajipour et al. Optimized neuro observer-based sliding mode control for a nonlinear system using fuzzy static sliding surface
Yao et al. An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning
El Aoud et al. Intelligent control for a greenhouse climate
CN112615364A (en) Novel wide-area intelligent cooperative control method for power grid stability control device
Liu On a method of single neural PID feedback compensation control
CN117970782B (en) Fuzzy PID control method based on fish scale evolution GSOM improvement
CN111663032B (en) Active disturbance rejection temperature control method for amorphous iron core annealing furnace

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination