CN114626509A

CN114626509A - Method for reconstructing explicit model prediction control based on deep learning

Info

Publication number: CN114626509A
Application number: CN202210313902.5A
Authority: CN
Inventors: 张聚; 施超; 牛彦; 潘伟栋; 陈德臣
Original assignee: Hangzhou Normal University
Current assignee: Hangzhou Normal University
Priority date: 2022-03-28
Filing date: 2022-03-28
Publication date: 2022-06-14
Anticipated expiration: 2042-03-28
Also published as: CN114626509B

Abstract

The invention discloses a prediction control method for reconstructing an explicit model based on deep learning, which comprises the following steps: step 1) re-optimizing and expressing the explicit model predictive control into a multi-parameter quadratic programming problem; step 2), collecting data and building a deep neural network; step 3), training the built deep neural network; step 4), verifying the feasibility of the deep neural network; step 5), reconstructing explicit model prediction control; and 6) optimizing the reconstructed parameters. The invention integrates the deep learning model and the explicit model prediction control, solves the problems of high calculation resource requirement and long calculation time of the traditional model prediction control, ensures the control precision and the prediction accuracy and improves the calculation efficiency.

Description

Method for reconstructing explicit model prediction control based on deep learning

Technical Field

The invention belongs to the technical field of deep learning, is used for developing an accurate agent model based on deep learning and an off-line explicit optimal solution thereof, and particularly relates to a prediction control method based on a deep learning reconstruction model.

Background

Deep learning models are a class of approximate models that have proven to be highly predictive of complex phenomena. The introduction of deep learning models into the equation requiring optimization provides a method to reduce the complexity of the problem and maintain the accuracy of the model. A deep learning model in the form of a neural network with corrective linear units therein can be accurately recast into a multi-parameter quadratic programming formulation. However, developing optimal solutions in online applications involving explicit model predictive control remains a challenge. Multi-parameter programming alleviates the burden of online computation to solve optimization problems involving bounded, uncertain parameters. There is still great room for improvement in offline computing.

Deep learning is a method of approximating complex systems and tasks by developing complex mathematical models using large amounts of data. These approximation models are increasingly valuable as data-driven modeling techniques, and it is important to incorporate deep learning into the optimization formulation. The use of neural networks as surrogate models has been successful in various environments, such as modeling; optimizing and controlling; and (4) regression and classification. In all of these applications, artificial neural network models are developed that are used to represent complex, non-linear processes. However, due to the inherent, unavoidable and unavoidable complexity, obtaining a global solution to the corresponding optimization problem involving the neural network imposes a significant computational burden.

Because of their highly connected structure, deep learning models are adept at expressing complex functional relationships. Their ability to approximate functions to arbitrary levels of precision is due to the existence of an exponential number of piecewise connected hyperplanes based on the size of the neural network. Given an optimization problem with highly complex and nonlinear components, with a correct linear-activated neural network unit (ReLU), has proven to have high performance for regression-based problems, and can be incorporated into the optimization formula as a surrogate model.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a method for combining deep learning and multi-parameter planning, which is used for developing an accurate agent model based on the deep learning and an off-line explicit optimal solution thereof.

The method is characterized by using an integrated deep learning model, particularly a neural network with a correction linear unit, and an explicit model prediction control. The multiparameter quadratic programming formula can be accurately recast. Recasting a deep learning model as a set of piecewise-linear functions enables incorporation of predictive models into model-based control strategies, such as explicit model predictive control. To reduce the computational burden of solving the piecewise linear optimization problem on-line, a full off-line explicit solution to the optimal control problem is obtained using multi-parameter planning. However, in online applications where time is a critical factor, determining the optimal solution is challenging due to the inherent non-complexity of the resulting discrete optimization equations. Multiparameter programming is an effective method to reduce the computational burden on-line by developing optimal solutions off-line. By introducing more advanced surrogate models in the multi-parameter optimization formulation, the advantages of the developed parametric solutions, namely (I) the ability to obtain an optimal solution without having to solve the optimization problem each time an uncertain parameter is identified, (ii) having a prior mapping of the solution, and (iii) having an explicit functional relationship between the optimization variables and the uncertain parameters, are improved. One key drawback in these multi-parameter model formulas is that they rely on linear or piecewise linear constraints. Therefore, in order to incorporate more complex phenomena into the parametric formulation, optimization is required. Developing an accurate approximation model to represent the nonlinear functional relationship is not straightforward, and deep learning models based on the ReLU activation function make up for this gap. Neural networks using the ReLU activation function can be accurately represented in multi-parameter quadratic programming formulas. Accurate recasting of a neural network and the ReLU activation function as a new approach to incorporate deep learning models into optimization-based formulas. The gap between model accuracy and computational performance is thus reduced.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention are further described below. The method for reconstructing the explicit model prediction control based on the deep learning comprises the following steps:

step (1), re-optimizing and expressing the explicit model predictive control into a multi-parameter quadratic programming problem;

the explicit model predictive control problem is as follows:

wherein

Is a vector that contains the control input sequence,

and

is a weighting matrix, the weighting matrix is selected such that P ≧ 0 and Q ≧ 0 are semi-positive, R > 0 is positive;

is a vector of the states of the image data,

is a control input to the control unit,

is a matrix of the system and is,

is an input matrix, controllable for (a, B); k denotes the kth sample point, x_NIndicating the terminal state; n is_xA dimension representing a system state; n is_uA dimension representing a system control input; n is_cxRepresenting the number of hyperplanes that make up a state-bounded polyhedral set; n is a radical of an alkyl radical_cfRepresenting the number of hyperplanes constituting a terminal-bounded polyhedral set; n is_cuRepresenting the number of hyperplanes of the input bounded polyhedron set;

the state, terminal and input constraints are a bounded set of polyhedrons, passing through a matrix

Sum vector

Defining; selecting the terminal cost defined by P and the recursion feasibility of the terminal set which should ensure the stability of the closed-loop system and the optimization problem; depending on the prediction boundary N, there is a set of initial states x for which a solution is feasible_initCalled feasible region, the optimization formula (1) reformulates as a multi-parameter quadratic programming problem, which depends only on the current system state x_init：

subject to C_cu≤Tx_init+c_c

Wherein

n_ineqIs the total number of inequalities in the multi-parameter problem;

representing the number of inequality constraints in equation (2);

represents the number of control variables in equation (2);

the multi-parameter quadratic programming problem solution is in the form of a piecewise affine function:

for n_rThe area of the image to be displayed is,

and

each region Θ_iAre all described by polyhedrons;

wherein

c_iRepresentation description region Θ_iNumber of inequalities of polyhedrons, a_i,jx_init≤b_i,j，j＝1,...,c_j，

And is

Step (2) of constructing a data set

Sampling the input space of the function, normalizing the input space to construct an input/output data set, wherein an initial state set x is constructed_initThe initial states of the middle sampling are input, and the input u is controlled to be output;

step (3), building a deep neural network, and training and verifying the deep neural network by using the data set:

the deep neural network comprises an input layer, three hidden layers and an output layer;

the neural network is added with a projection algorithm on the basis of full connection, so that the output of the neural network can be ensured to meet the constraint condition and feasibility of a control system; the fourth layer of the neural network is a projection algorithm layer, because some data are infeasible during training, the feasibility of the output of the neural network can be ensured through the projection algorithm;

each node in the deep neural network hidden layer is associated with an activation function; the activation function employs a ReLU activation function, which is defined in the equation:

y＝max{0,1} (5)

since the ReLU activation function is also a piecewise linear function, it can also be expressed by the equation:

the ReLU activation function approximates a function by decomposing the original function (5) into a set of piecewise hyperplanes; defining the exponential relation between the number of the segmented hyperplanes and the number of hidden layers of the deep neural network:

wherein L is the number of hidden layers, n_xThe number of input variables of the neural network, namely the number of system state variables, and M is the number of nodes;

the deep neural network with the ReLU activation function has the capability of accurate function approximation, and passes through any function of a piecewise affine hyperplane with exponential quantity;

and (4) reconstructing explicit model prediction control by using the deep neural network trained and verified in the step (3):

the neural network related to the ReLU activation function can accurately reconstruct the multi-parameter quadratic programming problem; using a multi-parameter quadratic programming formula allows neural networks to be directly embedded into optimization problems; the added complexity of this reconstruction to the overall optimization problem is to manage binary variables;

for a deep neural network with n nodes, the output takes the form:

x^k＝max{0,W^kx^k-1+b^k} (8)

where k is the number of network layers, W^kIs a weight matrix of the k-th layer, b^kIs the deviation vector of the k-th layer,

is the output of the previous layer(s),

is the output of the current layer;

accurately reconstructing the formula (8) in an optimization formula by containing binary variables; the k-th hidden layer reconstruction for the multi-parameter problem is as follows:

x^k≥0，s^k≥0，y∈{0,1}ⁿ；

wherein the variable y is a binary variable,

is an auxiliary variable vector, M is a large scalar value;

after the formula (8) is reconstructed by the formula (9), the total number of the binary variables y is equal to the total number of the nodes forming the hidden layer;

a reconstructed neural network with a ReLU activation function is an accurate reconstruction; the binary variable enables the activation function to output a value of 0 or x through the interlayer constraint; in addition, the number of equality and inequality constraints respectively grows linearly with the total number n of nodes; the combination of the reconstructed neural network into an optimization formula provides an effective strategy to maintain high precision of the proxy model;

and (5) optimizing the parameters of the deep neural network model reconstructed in the step (4):

after reconstruction, a regularization technology can be used for reducing the number of active nodes (nodes with values different from 0); reducing the number of active nodes directly reduces the number of binary variables needed in the reconstruction process; the neural network after training and processing minimizes unnecessary binary variables; for example, nodes in the hidden layer that are always positive are represented by linear activation functions; therefore, the node does not need a slack variable or a binary variable.

Step (6), realizing the attitude control of the helicopter by using the optimized deep neural network;

preferably, during deep neural network training, a dropout technology is adopted to train the model, update network parameters and prevent an overfitting phenomenon from occurring.

Preferably, during deep neural network validation, a k-fold cross-validation model is employed.

The invention has the following advantages:

1. the invention integrates the deep learning model and the explicit model prediction control, solves the problems of high calculation resource requirement and long calculation time of the traditional model prediction control, ensures the control precision and the prediction accuracy and improves the calculation efficiency.

2. The method has firm theoretical basis of steps, simple and clear steps and perfect theoretical support.

3. The deep neural network adds a projection algorithm on the basis of full connection, so that the output can be controlled through a feasible set of the output. The fourth layer of the network is the projection algorithm layer, because some data is not feasible when training, and then the feasibility of controlling the output can be improved through the projection algorithm.

2. The invention introduces binary variables, which can enable the activation function to pass through the constraint, thereby outputting the value of 0 or x. Therefore, the accuracy of the model can be improved by substituting the model into the optimization formula of the network.

Drawings

Fig. 1 is a flow framework diagram of deep learning model based integration with ReLU activation functions and explicit model predictive control in accordance with the method of the present invention.

FIG. 2 is a schematic diagram of a feedforward neural network structure involved in the method of the present invention.

FIG. 3 is a comparison of the approximate control law generated by deep learning of the method of the present invention with the conventional EMPC control law.

FIG. 4 is a comparison graph of deep learning of the method of the present invention and tracking simulation of altitude angle by conventional EMPC.

Fig. 5 is a comparison graph of deep learning of the method of the present invention and tracking simulation of the state trace of the pitch angle by the conventional EMPC.

FIG. 6 is a comparison graph of deep learning of the method of the present invention and tracking simulation of the state trajectory of the rotation angle by the conventional EMPC.

FIG. 7 is a comparison graph of experimental data of deep learning and the traditional EMPC control law.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

the invention discloses a prediction control method for reconstructing an explicit model based on deep learning, which is applied to the field of helicopter attitude control.

A helicopter attitude control method based on deep learning reconstruction explicit model predictive control specifically comprises the following steps as shown in figure 1:

step (1), firstly, analyzing and dynamically modeling a helicopter system: the method comprises the following steps of analyzing stress conditions corresponding to all shafts and components of the helicopter system in the operation process, and specifically comprises the following steps:

the three-degree-of-freedom helicopter system space state equation is as follows:

according to the modeling analysis of the helicopter, the altitude angle epsilon, the pitch angle p, the rotation angle r and the differential altitude angular velocity thereof are respectively selected

Pitch angular velocity

And rotational angular velocity

As state vectors, i.e.

The voltage of front and rear motors of the three-freedom-degree helicopter is selected as an input vector, namely u ═ V_fV_b]^TAltitude angle e, pitch angle p, and rotation angle r as output vector y ═ e p r]^T(ii) a Substituting the values in the specific relevant parameter table to obtain the coefficients of the A, B and C state equations as follows:

and (2) acquiring and processing data through an explicit model predictive control algorithm: re-optimizing and expressing the explicit model predictive control into a multi-parameter quadratic programming problem; the method comprises the following steps:

the explicit model predictive control problem is as follows:

wherein

Is a vector that contains the control input sequence,

and

is a weighting matrix selected such that P ≧ 0 and Q ≧ 0 are semi-positive, and R > 0 is positive.

Is a vector of the states of the memory cells,

is a control input to the control unit that,

is a matrix of the system, and,

is an input matrix, controllable for (a, B). k denotes the kth sample point, x_NIndicating the terminal state; n is a radical of an alkyl radical_xA dimension representing a system state; n is a radical of an alkyl radical_uA dimension representing a system control input; n is a radical of an alkyl radical_cxRepresenting the number of hyperplanes that make up a state-bounded polyhedral set; n is_cfRepresenting the number of hyperplanes constituting a terminal-bounded polyhedral set; n is_cuRepresenting the number of hyperplanes of the input bounded set of polyhedrons;

the state, terminal and input constraints are a bounded set of polyhedrons, by a matrix

Sum vector

And (4) defining.

The cost of the terminal defined by P is chosen and the set of terminals should guarantee the stability of the closed-loop system and the recursive feasibility of the optimization problem. Depending on the prediction boundary N, there is a set of initial states x for which a solution is feasible_initCalled feasible region, the optimization formula (1) reformulates as a multi-parameter quadratic programming problem, which depends only on the current system state x_init：

Wherein

n_ineqIs the total number of inequalities in the multi-parameter problem.

Representing the number of inequality constraints in equation (2);

represents the number of control variables in equation (2);

for n_rThe area of the image to be displayed is,

and

each region Θ_iAre described by polyhedrons.

Wherein

And is

Step (3) of constructing a data set

Sampling the input space of the function, normalizing the sampled input spacePost-processing to construct an input/output data set, wherein the initial state set x_initThe initial states of the middle sampling are input, and the control input u is output;

step (4), building a deep neural network as shown in fig. 2, and training and verifying the deep neural network by using the data set:

each node in the hidden layer is associated with an activation function. Common activation functions are the hyperbolic tangent function, and the rectified linear unit. The choice of a suitable activation function depends on the problem, and the neural network of activation functions always shows good performance in many applications in the corrected linear unit.

The activation function adopted by the hidden layer in the deep neural network DNN is a ReLU activation function, which is defined in an equation:

y＝max{0,1} (5)

it is also a piecewise linear function, which can also be expressed by the equation:

the ReLU activation function approximates a function by decomposing the original function into a set of piecewise hyperplanes. The number of segmented hyperplanes defining a ReLU neural network is exponentially related to the number of hidden layers of the neural network:

wherein L is the number of hidden layers, n_xM is the number of nodes. The deep neural network with the ReLU activation function has the capability of accurate function approximation through an arbitrary function with an exponential number of piecewise affine hyperplanes.

During the training process, each node in the neural network has an associated weight and bias term. During training, these values are determined to minimize a defined performance criterion. There are many strategies for determining the optimum, such as gradient descent, random gradient descent, leffinberg-marquardt, etc. These techniques determine a set of locally optimal weights and bias values that minimize the selected performance criteria. Determining a global optimum is desirable and challenging. In many cases, a local solution is sufficient. Before implementing these training algorithms, the input and output data are typically normalized to avoid any scaling problems. In the training step, it is important to ensure that the neural network developed does not over-fit the data. One simple strategy to avoid overfitting is to use a large amount of data in the training process. More advanced techniques to prevent overfitting are cost function regularization and batch normalization. Dropout encourages sparsity of the neural network, preventing random nodes and their connections during the over-fitting training process by "Dropout". Dropping random nodes forces the network to be resilient and to identify the most salient features of the data set. Cost function regularization adds an additional penalty term to the minimized objective function. These additional terms result in a trained neural network with less active nodes.

In validating the feasibility of deep neural networks, several techniques exist to ensure that the neural network fits the data properly, as well as to provide a realistic measure of the effectiveness of the model. Ensuring that the neural network developed is valid is typically achieved by comparing the expected output of the real model with the predicted output of the neural network. One common technique is k-fold cross-validation, which attempts to maximize the validation criteria to ensure that the trained neural network provides a good fit. There are various test metrics to quantify the fit of the developed model between the predicted and measured output datasets, these metrics including Mean Square Error (MSE) and Root Mean Square Error (RMSE).

And (5) reconstructing explicit model prediction control by using the trained and verified deep neural network:

neural networks involving the ReLU activation function can accurately recreate a multi-parameter quadratic programming problem. The use of the mp-Qp formula allows the neural network to be directly embedded into the optimization problem. This reconstruction adds complexity to the overall optimization problem by managing binary variables.

For any network layer with n nodes, the output takes the form:

x^k＝max{0,W^kx^k-1+b^k} (8)

is the output of the previous layer or layers,

is the output of the current layer.

The importance of the ReLU activation function is its piecewise linear nature. Thus, equation (8) can be accurately recast in the optimization equation by including binary variables. The k-th hidden layer reconstruction for the multi-parameter problem is as follows:

wherein the variable y is a binary variable,

is an auxiliary variable vector and M is a large scalar value.

After equation (8) is reconstructed by equation (9), the total number of binary variables y is equal to the total number of nodes constituting the hidden layer.

The recast neural network with the ReLU activation function is an accurate reconstruction. Binary variables enable the activation function to output a value of 0 or x through the interlayer constraint. In addition, the number of equality and inequality constraints respectively grows linearly with the total number of nodes n. Incorporating recast neural networks into the optimization formula provides an effective strategy to maintain high accuracy of the proxy model.

Step (6), optimizing the parameters of the deep neural network model reconstructed in the step (5):

training is performed after reconstruction, and the number of active nodes (nodes with values different from 0) can be reduced by using a regularization technology. Reducing the number of active nodes directly reduces the number of binary variables required in the recasting process. The trained and processed neural network may also minimize unnecessary binary variables. For example, nodes in the hidden layer that are always positive are represented by linear activation functions. Therefore, the node does not need a slack variable or a binary variable.

Case analysis

The invention aims at the three-degree-of-freedom helicopter, because of the self MIM0, higher order and nonlinear characteristics, the aircraft is controlled by utilizing the explicit model prediction to obtain output data, the data is used as training data, then the trained deep learning network is used for respectively carrying out experiments on the height axis, the rotating axis and the pitching axis, the performance of the deep learning combined with the explicit model prediction control method in the specific application of the three-degree-of-freedom helicopter is shown, and the superior performance of the invention is embodied by comparing the experimental results. The control signal can be fed back more quickly under the control of the deep learning network than the control of the explicit model prediction, the response time is shortened, and the stability of the system in the control process is improved due to the autonomous learning capability of the control system.

And obtaining output data of the system by an explicit model predictive control method, wherein the value of the input quantity and the correspondingly selected output quantity are correspondingly combined into data. Then, the table is analyzed to delete unnecessary data, correct abnormal data and repair the missing data. And finally, constructing a data set meeting the training requirement, converting the data set into a csv format, and dividing the data set into a training set, a verification set and a test set. The overall steps can be seen from the overall flow chart of fig. 2, a neural network is built on the tenserflow, data is imported, and then normalization processing is performed. Then 500 rounds of data training are set with a learning rate of 0.01, defining a mean square error loss function and creating an optimizer. As shown in the specification and the attached figure 5, after 500 training, the error of 0.12 still exists between the predicted value and the actual value, although the error exists between the finally obtained result and the calculation of the explicit model prediction control, the flight stability of the helicopter is not affected, and the helicopter can still keep stable flight within the error range. And comparing in the solution time, and obviously, the deep learning network has higher control efficiency than the common model prediction under the condition of the same parameters. The burden of the computer is greatly reduced on the storage space and the computation.

According to the experimental results and the program operation, under the same conditions and on the premise of stable operation, compared with common explicit model prediction control, the model control based on deep learning has the advantages of higher solving speed and better explicit calculation performance. In the aspect of control effect, the altitude angle, the rotation angle and the pitch angle of the three-freedom-degree helicopter can be effectively adjusted, an ideal stable state is quickly achieved, and the three-freedom-degree helicopter has good control performance.

Fig. 5 is a comparison diagram of deep learning of the method of the present invention and tracking simulation of a state trace of a pitch angle by a traditional EMPC.

Claims

1. A method for reconstructing explicit model prediction control based on deep learning is characterized in that: the method comprises the following steps:

the explicit model predictive control problem is as follows:

wherein

Is a vector that contains the control input sequence,

and

is a vector of the states of the memory cells,

is a control input to the control unit,

is a matrix of the system, and,

is an input matrix, controllable for (a, B); k denotes the kth sample point, x_NIndicating the terminal state; n is_xA dimension representing a system state; n is_uA dimension representing a system control input; n is_cxRepresenting the number of hyperplanes that make up a state-bounded polyhedral set; n is_cfRepresenting the number of hyperplanes constituting a terminal-bounded polyhedral set; n is_cuRepresenting the number of hyperplanes of the input bounded set of polyhedrons;

Sum vector

Defining; selecting the cost of the terminal defined by P and the recursion feasibility of the terminal set which should ensure the stability of the closed-loop system and the optimization problem; depending on the prediction boundary N, there is a set of initial states x for which a solution is feasible_initCalled feasible region, the optimization formula (1) reformulates as a multi-parameter quadratic programming problem, which depends only on the current system state x_init：

subject to C_cu≤Tx_init+c_c

Wherein

n_ineqIs the total number of inequalities in the multi-parameter problem;

representing the number of inequality constraints in equation (2);

represents the number of control variables in equation (2);

for n_rThe area of the image to be displayed is,

and

each region Θ_iAre all described by polyhedrons;

wherein

And is

Step (2) of constructing a data set

the neural network is added with a projection algorithm on the basis of full connection, so that the output of the neural network is ensured to meet the constraint condition and feasibility of a control system;

y＝max{0,1} (5)

the neural network related to the ReLU activation function can accurately reconstruct a multi-parameter quadratic programming problem; using a multi-parameter quadratic programming formula allows neural networks to be directly embedded into optimization problems; the added complexity of this reconstruction to the overall optimization problem is to manage binary variables;

for a deep neural network with n nodes, the output takes the form:

x^k＝max{0,W^kx^k-1+b^k} (8)

is the output of the previous layer or layers,

is the output of the current layer;

x^k≥0，s^k≥0，y∈{0,1}ⁿ；

wherein the variable y is a binary variable,

is an auxiliary variable vector, M is a large scalar value;

after the formula (8) is reconstructed by the formula (9), the total number of the binary variables y is equal to the total number of the nodes forming the hidden layer; the binary variable enables the activation function to output a value of 0 or x through the interlayer constraint;

reducing the number of active nodes of the deep neural network model by adopting a regularization technology;

and (6) realizing the attitude control of the helicopter by using the optimized deep neural network.

2. The method of claim 1, wherein during deep neural network training, the model is trained using a dropout technique to update network parameters and prevent overfitting.

3. The method of claim 1, wherein during deep neural network validation, a k-fold cross-validation model is employed.