CN112632860A

CN112632860A - Power transmission system model parameter identification method based on reinforcement learning

Info

Publication number: CN112632860A
Application number: CN202110002104.6A
Authority: CN
Inventors: 丁建完; 陈立平; 郭超; 彭奇
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2021-01-04
Filing date: 2021-01-04
Publication date: 2021-04-09
Anticipated expiration: 2041-01-04
Also published as: CN112632860B

Abstract

The invention discloses a method for identifying model parameters of a power transmission system based on reinforcement learning, and belongs to the field of system modeling simulation. Aiming at the problems of inconsistent sensitivity of model parameters of the power transmission system, low convergence rate and higher search range requirement of the existing identification algorithm, the invention constructs a reinforcement learning framework for identifying the model parameters of the power transmission system, and has the advantages of local optimization prevention, high convergence rate and large search range; the method adopts a staged identification process, the rough adjustment stage utilizes the characteristics of high convergence speed and large search range of reinforcement learning to quickly find the optimal subinterval of each parameter, and the fine adjustment stage utilizes the characteristics of high precision and strong global search capability of a heuristic algorithm to determine the final identification result in the optimal subinterval.

Description

Power transmission system model parameter identification method based on reinforcement learning

Technical Field

The invention belongs to the field of system modeling simulation, and particularly relates to a method for identifying parameters of a power transmission system model based on reinforcement learning.

Background

With the continuous development of the multi-field system modeling simulation technology, the Modelica language gradually becomes an industry standard for the development of the multi-field and multi-subject system simulation field, and the aim of the Modelica language is to define a universal programming language for complex system modeling. The Modelica model is a program which is written by using a Modelica language and has the capabilities of object-oriented modeling, multi-field unified modeling, non-causal statement type modeling and continuous discrete hybrid modeling, and can be used for engineers in different industries to build own simulation model systems and develop corresponding dynamic simulation. For the Modelica model, the model is required to be consistent with a physical prototype as much as possible, and the key point is the setting of model parameters, the process of building the model is only used for determining the basic form of the model, and accurate model parameters are required to be set for obtaining the optimal simulation performance.

The power transmission system is a device which obtains energy from a power source in the form of speed/angular velocity and force/moment under the control of a controller and transmits the energy to the next link of the system, and mainly comprises components such as a motor, a speed reducer and the like, and the states of energy loss, speed change and the like exist in the power transmission process.

The existing model parameter identification method has various types, the most common method is the least square method at present, and the method is a default parameter optimization method in modeling software. However, the least square estimation has too high dependency on data, the identification result is easily affected by noise, and the requirement on the initialization range of the parameters is strict. In addition, the commonly used identification method is an evolutionary Algorithm, such as Particle Swarm Optimization (PSO), Genetic Algorithm (GA), and the like, which has a strong ability to approach global optimum, but has a slow convergence rate and a precision that needs to be improved. Therefore, an identification method with high convergence rate and low requirement on the initial range of parameters is urgently needed to identify the parameters of the Modelica-based power transmission system model.

Disclosure of Invention

In view of the above defects or improvement requirements of the prior art, the present invention provides a method for identifying parameters of a model of a powertrain system based on reinforcement learning, and aims to provide an identification method with high convergence rate and low requirement for an initial range of parameters to identify parameters of a model lia-based model of a powertrain system.

In order to achieve the above object, the present invention provides a method for identifying parameters of a powertrain system model based on reinforcement learning, comprising:

s1, constructing a dynamic model of a power transmission system based on a multi-field unified modeling language Modelica;

s2, carrying out sensitivity analysis on the parameters to be identified of the model;

s3, roughly adjusting the parameters to be identified based on a reinforcement learning algorithm:

constructing a reinforcement learning framework for identifying the Modelica power transmission system model parameters;

performing iterative training by using a reinforcement learning framework to obtain an optimal subinterval of each parameter to be identified;

s4, parameter fine adjustment:

and (3) taking the mean square error of the measured data and the model estimation value as an objective function, and iteratively searching for optimization in a solution space formed by the model parameters to be identified to obtain each corresponding parameter value when the objective function value is minimum, wherein the parameter value is used as a final identification result.

Further, step S2 is specifically to perform parameter sensitivity analysis on the powertrain model with undetermined parameters by using a sobol method, and the specific steps are as follows:

01. respectively carrying out Monte Carlo sampling in the possible value range of N parameters to be identified to generate an initial sample matrix I A, an initial sample matrix II B and a cross sample matrix

Where i ═ {1, 2, …, N };

02. the sample matrix A, B,

As input, the power transmission system model is simulated and solved to respectively obtain an initial sample matrix A, an initial sample matrix B and a cross sample matrix

The model simulation result vectors f (A), f (B),

03. Calculating the global influence index S of each parameter based on the simulation result and the following formula_Ti：

Wherein Y represents f (A), f (B) and

a set of vectors is formed; va ((Y) represents the variance of the powertrain model output;

04. performing sensitivity sorting on the parameters to be identified according to the size of the parameter global influence index; larger impact factors indicate more sensitivity;

05. and combining the parameters to be identified with the sensitivity lower than a set threshold value.

Further, the reinforcement learning framework construction process specifically includes:

(1) model estimation value Y_estAnd measured value Y_meaAs an reinforcement learning objective function f (x);

(2) constructing a single step reward:

r＝min(1,max(0,(F(X_mean)-F(X_best))/(F(X_mean)-F(X_cur))))

wherein r represents a single step reward value, F (X)_cur) Denotes the value of the objective function under the current parameter, F (X)_best) Denotes the value of the objective function under the optimum parameter, F (X)_mean) Representing parametersObjective function value at average;

(3) minimum variation G according to parameter_i(i ═ 1,2, …, N) and the range setting action for each parameter:

splitting the search range of the ith parameter into

Selecting one subinterval, and randomly acquiring a value in the subinterval as an action; wherein,

is the maximum value of the ith parameter,

is the ith parameter minimum; minimum change G_iThe variable quantity of the ith parameter which is increased or decreased in each step in the identification process is referred to;

(4) constructing an action selection strategy:

01. selecting a search path:

determining whether the action of the next round of selection is on the left or the right of the current action, wherein the selection index is L_p(i, j) the calculation formula is:

wherein k represents the number of parameter transformation combinations;

is the first path and the current action a_i,jThe nth largest Q value, λ, among the adjacent k actions₁Is a path weight coefficient;

obtain a [0,1 ]]Random number of epsilon in between₁The search path/is determined using the following formula:

rand (1,2) represents a random probability distribution within the interval 1-2;

02. determining an action:

obtain a [0,1 ]]Random number of epsilon in between₂Action a is determined using the following formula:

q (i, m) represents the Q value of the ith parameter to be identified, epsilon₁And ε₂Is a random number used to ensure reinforcement learning exploratory properties;

(5) and (3) constructing an updating strategy of a Q value function:

the Q value function updating formula corresponding to the ith parameter is as follows:

Q^r+1(i,j)+＝α(r+(1-λ₂)max(L_p(i,j))+λ₂min(L_p(i,j))-Q^r(i,j))

where α is a hyperparameter controlling the learning rate, r is a single step reward, λ₂To control the over-parameters of the update amplitude.

Further, in the parameter coarse tuning stage, a reinforcement Learning framework based on a Q-Learning algorithm is used, and the iterative training specifically comprises the following steps:

(1) randomly initializing all parameters to be identified, substituting the parameters into a power transmission system model for calculation, and comparing the parameters with actually measured data to obtain a mean square error serving as an initial value of an optimal objective function F (X);

(2) the plurality of agents execute serial learning from large to small according to the sensitivity of the parameters to be identified; the link of adjusting the parameters to be identified according to the Q value table in the reinforcement learning frame is called an agent, and each parameter to be identified uniquely corresponds to one agent; the process of the intelligent agent adjusting the parameters to be identified is called the learning behavior of the intelligent agent;

the learning process comprises the following steps: randomly selecting an action a in the range of possible value intervals of the corresponding parameters of the current agent_rand(i) Fixing other parameters, applying the parameters to the power transmission system model to obtain the current parametersThe objective function value F (X)_cur) And a single step reward value r, updating the Q value table of the current agent according to the Q value updating strategy; if F (X)_cur)≤F(X_best) If the current parameter has a search value, the step (3) is carried out, otherwise, the learning process is repeated;

(3) selecting an action a according to an action selection policy_iter(i) Fixing other parameters, applying the parameters to FMU to obtain the value of target function F (X)_cur) And a reward and punishment value r, updating the Q value table;

(4) iteratively executing steps (2) and (3) for the ith parameter

Secondly, completing one identification of the ith parameter, and switching to the identification of the (i + 1) th parameter;

(5) all the parameters to be identified are serially executed in the steps (2) to (4) to complete a training period, and if the current completion period number is less than the given training times

And (2) entering the next training period, otherwise, finishing the training.

Further, step S4 is specifically to use a PSO optimization algorithm to fine tune the parameters to obtain a final identification result, and the specific steps are as follows:

setting a fine tuning range to take into account recognition errors of reinforcement learning

Wherein the mu value is set according to the searching capability of the particle swarm algorithm,

in order to strengthen the optimal subinterval obtained by learning,

the parameter is the optimal value of the ith parameter after reinforcement learning identification;

an N-dimensional space and a population of particles are initialized,creating a velocity vector v for each particle_iPosition vector x_iHistorical optimal position vector p_iAnd a historical optimal position vector p for the entire particle swarm_g；

Establishing a fitness function G (X) by using the mean square error of a model predicted value and an actually measured value, wherein the model predicted value is obtained by substituting the current position of particles into an FMU for simulation and solving; the goal of the search over the entire population is to minimize G (X);

starting iterative search:

in one iteration, calculating the fitness value of each particle, and updating p of single particle_iValue and p of population_gThe value, then according to the following formula update each particle velocity and position vector;

where ω is the inertial weight and is 0.6, c by default₁、c₂Is a learning factor, r₁、r₂Is [0,1 ]]A random number in between;

after the maximum iteration times are reached, the historical optimal position vector p of the particle swarm_gThe final result of the parameter identification is obtained.

In general, the above technical solutions contemplated by the present invention can achieve the following advantageous effects compared to the prior art.

(1) Aiming at the problems that the Modelica model parameter sensitivities are inconsistent, the existing identification algorithm is low in convergence speed and high in search range requirement, a reinforcement learning framework for identifying the Modelica power transmission system model parameters is constructed, and the method has the advantages of local optimization prevention, high convergence speed and large search range; the invention adopts a staged identification process, the rough adjustment stage utilizes the characteristics of fast convergence speed and large search range of reinforcement learning to quickly find the optimal subinterval of each parameter, the fine adjustment stage utilizes the characteristics of high precision and strong global search capability of a heuristic algorithm to determine the final identification result in the optimal subinterval, compared with the method only using a single algorithm, the staged identification effect is better,

(2) aiming at the problem of huge search space caused by excessive parameters, the invention also carries out sensitivity analysis on the parameters to be identified, combines the parameters to be identified with lower sensitivity, is beneficial to reducing the number of the parameters to be identified and saves the expense of computing resources in the process of strengthening learning and identifying the parameters; meanwhile, the sensitivity is used for setting the parameter identification priority, so that a basis can be provided for the learning rate setting in the parameter identification process of the reinforcement learning method, and the parameter identification precision is improved.

(3) The method designs a sensitivity analysis scheme based on a Sobol method, simultaneously explores all parameters of the model in a possible value space based on the variance, has no limitation on the nonlinearity and monotonicity of the model, and is suitable for carrying out parameter global sensitivity analysis on any model.

Drawings

FIG. 1 is a flow chart of a method for identifying parameters of a powertrain model based on reinforcement learning according to the present invention;

FIG. 2 is a model of the powertrain created based on Modelica;

FIG. 3 is a global impact index for inertia2.J/spring. c/damming/and Jmotor parameters;

FIG. 4 is a Modelica-based power transmission model established for a numerically-controlled machine tool feeding system, parameter identification is carried out by adopting the method of the invention, and the output result of a physical prototype and the simulation result of the model are obtained under the same input condition.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The invention provides a Modelica model parameter identification method based on reinforcement learning, which can effectively identify key parameters of a Modelica model so as to improve the simulation performance of the model. In order to meet the requirements of high convergence speed and high identification precision, the invention takes the actually measured data of a physical machine corresponding to a model as a training sample, adopts a reinforcement Learning framework improved based on a Q-Learning algorithm in the parameter coarse adjustment stage, and carries out serial iterative training on an intelligent agent represented by each parameter to be identified; and in the fine adjustment stage, a PSO/GA algorithm is adopted to determine the final value of the parameter to be identified. The method has the characteristics of high convergence rate and high identification precision.

Reinforcement learning is a machine learning method based on a trial and error mechanism,origin of originResearch in the field of animal intelligence has subsequently evolved into an important branch of the field of machine learning. The reinforcement learning considers the interaction process between the intelligent agent and the unknown environment guided by the target, accumulates experience in the process of continuous trial and error and continuously updates the action selection strategy, and has the advantages of high convergence speed, easy avoidance of local optimization, no need of priori knowledge and the like. Currently, reinforcement learning is mostly used in the field of combinatorial optimization (sequence decision) and has been successful, but the function optimization field to which parameter identification belongs has not been applied yet. In consideration of the requirements of fast convergence, global optimum and the like required by the Modelica model parameter identification, the invention provides a parameter identification method based on reinforcement learning to solve the problem.

The invention adopts the overall technical scheme that: analyzing the parameter sensitivity of the Modelica model to be identified by using a Sobol-based sensitivity analysis method, sequencing the parameters according to the size of the global influence factor of the parameters, and determining the training sequence and the discretization degree (namely the number of lattices split in the initialization range) of the parameters in the coarse adjustment stage; the Q-Learning-based improved reinforcement Learning framework is used, the mean square error between actual measurement data and model simulation data on a physical machine is taken as an objective function and an environment state, the adjustment of parameters in a feasible range is taken as an action, the relation between a current objective function value and a historical optimal/average value is a single-step reward, the parameters to be identified of the Modelica model are coarsely adjusted, and an optimal subinterval of each parameter is obtained; and (3) based on the optimal subintervals of all the parameters, performing iterative training by adopting a PSO/GA algorithm to obtain the optimal identification value of each parameter, namely the final identification result of the whole technical scheme.

S1, Modelica model parameter sensitivity analysis based on a Sobol method comprises the following steps:

(1) an FMU (Functional Module-up) of a Modelica model to be identified is derived, a Functional model unit comprises a C code, a solver and the like of the model, can be directly called and independently completes Simulation solving of the model, and is set to be a 'Co-Simulation' mode;

(2) respectively carrying out Monte Carlo sampling in the range of N parameters to be identified to generate a sample matrix A, B and ABⁱAnd (i-1, 2, …, N), performing simulation solution on the FMU by taking the sample matrix as input to obtain a Y-value matrix, obtaining a global influence factor of each parameter to be identified based on the Y-value matrix and an influence index formula, and sequencing. Determining a training sequence according to the size of the influence factors of the parameters to be identified, wherein the parameters with larger influence factors (namely more sensitive) are optimized in a single-round training;

(3) obtaining a random value in the search range of each parameter, substituting the random value into FMU to calculate the result as a reference value, and sequentially determining the minimum subinterval size G of each parameter_i(i 1,2, …, N) to ensure that the ith parameter is increasing or decreasing G_iAnd the simulation result is influenced obviously.

And S2, a reinforcement Learning framework based on the Q-Learning algorithm is used for the parameter coarse adjustment stage. The device is characterized by comprising the following components:

(1) setting the environmental state as the model estimated value Y of the current wheel number_estMeasured value Y of physical machine_meaThe mean square error (i.e., the reinforcement learning objective function f (x)) of (a) is used to represent the difference between the current model parameter combination and the actual parameter combination;

(2) the single step reward is used for measuring the quality of the action selected in the current state and is an expression of an objective function value, and the single step reward has the following two modes:

i

ii r＝min(1,max(0,(F(X_mean)-F(X_best))/(F(X_mean)-F(X_cur))))

r represents a reward or punishment value, F (X)_mean) Denotes the value of the objective function at the mean value of the parameter, F (X)_best) Denotes the value of the objective function under the optimum parameter, F (X)_cur) Representing the value of an objective function at the current parameter

(3) Action according to parameter minimum sub-interval G_i(i-1, 2, …, N) and range settings for each parameter, splitting the search range for the ith parameter into

Sub-interval (

Is the maximum value of the parameter, and the maximum value of the parameter,

the parameter is the minimum value), after one subinterval is selected, a value is randomly acquired in the subinterval to serve as an action;

(4) the action selection strategy is divided into two phases: selecting a search path and determining behavior. The former determines whether the next round of selected behavior is on the left or right of the current behavior, and the selection index is L_p(i, j), the calculation formula is as follows:

L_p(i, j) … …, k the number of parameter transformation combinations

Is the current behavior a on the ith path_i,jThe nth largest Q value, lambda, of the adjacent k behaviors₁Is a pathA weight coefficient.

Obtain a [0,1 ]]Random number of epsilon in between₁The search path is determined using the following formula:

rand (1,2) denotes a random probability distribution within the interval 1-2

The determined behavior is to select one of k behaviors adjacent to the current behavior on the search path to obtain a [0,1 ]]Random number of epsilon in between₂Behavior is determined using the following formula:

ε₁and ε₂Are random numbers that are used to ensure reinforcement learning exploratory.

(5) Designing an updating strategy of a Q value function based on a Q-Learning algorithm, wherein a Q value function updating formula corresponding to the ith parameter is as follows:

Q^r+1(i,j)+＝α(r+(1-λ₂)max(L_p(i,j))+λ₂min(L_p(i,j))-Q^r(i,j))

where α is a hyperparameter controlling the learning rate, r is a single step reward, λ₂The smaller the value of the hyper-parameter for controlling the update amplitude, the larger the update amplitude of the Q value table.

S3, a Modelica model parameter identification algorithm based on a reinforcement learning framework is used for a parameter coarse adjustment stage, and the method specifically comprises the following steps:

(1) all the parameters to be identified are initialized randomly and substituted into FMU for calculation, and compared with measured data, the mean square error is obtained and used as the initial value of the optimal objective function, and the initial value is recorded;

(2) entering a training period, wherein a plurality of agents perform serial learning, namely, each parameter is optimized in turn, and the learning process of a single agent is called as a 'one-turn';

(3) enter intoA round, randomly selecting an action a within the search range of the parameter_rand(i) Fixing other parameters, applying the parameters to FMU to obtain the objective function value F (X) of current round_cur) And a current reward and punishment value r, wherein the Q value updating strategy updates the Q value table of the current agent according to the Q value updating strategy of claim 3. If F (X)_cur)≤F(X_best) If the current parameter has the search value, turning to (4), otherwise, entering the next round;

(4) entering a search cycle in which execution is performed

Each iteration selects an action a according to an action selection strategy_iter(i) Fixing other parameters, applying the parameters to FMU to obtain the objective function value F (X) of current round_cur) Updating the Q value table of the current agent according to the current reward and punishment value r;

(5) after the search period of one agent is finished, entering a 'one-turn' of the next agent until all agents go through one-turn, finishing a training period, and if the current period number is less than the current period number

And (2) entering the next training period, otherwise, finishing the training.

S4, a parameter fine-tuning scheme based on a heuristic optimization algorithm PSO is formed as follows:

(1) and determining the fine tuning range of each parameter to be identified according to the operation result of the reinforcement learning algorithm. If the optimal subinterval obtained by reinforcement learning is

The value of μ is set according to the search capability of the particle swarm algorithm,

(2) initializing an N-dimensional space and a group of particles, creating a velocity vector v for each particle_iPosition vector x_iHistorical optimal position vector p_iAnd a historical optimal position vector p for the entire particle swarm_g；

(3) And establishing a fitness function F (X) by using the mean square error of the model predicted value and the measured value, wherein the model predicted value is obtained by substituting the current position of the particle into an FMU for simulation and solving. The goal of the search over the entire population is to minimize F (X);

(4) starting iterative search, calculating the fitness value of each particle in one iteration, and updating p of single particle_iValue and p of population_gThe value, then according to the following formula update each particle velocity and position vector,

10.

in the formula (1), omega is an inertia weight and is 0.6 as default, c₁、c₂For learning factors, in general c₁＝c₂＝2，r₁、 r₂Is [0,1 ]]A random number in between;

after the maximum iteration times are reached, the historical optimal position vector p of the particle swarm_gThe final result of the parameter fine tuning is also the final result of the entire parameter identification process.

The embodiment of the invention takes a Modelica-based power transmission system model (a numerically-controlled machine tool single-shaft servo feeding system) as shown in figure 2 as an example to describe the method in detail, the system simulates a sinusoidal signal to drive a gear reducer and drive a load to run, and two components for simulating inertia and springs and shock absorbers for influencing the whole system are arranged in the middle of the system. The input of the whole system model is a sinusoidal signal, the output is the absolute angular velocity of the component inertia3, and the parameters to be identified and their attached information are shown in table 1.

TABLE 1

It should be explained that this embodiment is for explaining the technical solution of the present invention in detail and proving its effectiveness and feasibility, not a real engineering case, so in this embodiment, the measured data of a real physical machine is not used as a sample set, but the output of a standard model is used as a sample set, and then all the parameters to be identified are initialized randomly, and further, the technical solution of the present invention is used for parameter identification so that the parameters to be identified are as close to the parameter standard value as possible, and then the identification is completed. The complete identification process comprises the following steps:

1. deriving an FMU (functional model unit) of the Modelica model of the transmission system, and setting simulation parameters such as step length, starting time, ending time, solver and the like of simulation solution;

2. and setting the parameter to be identified in the FMU as a standard value, and performing simulation solving to obtain an output as a sample set. Sampling 1000 points on an output curve in a fixed period to serve as a data sample set for parameter identification at this time, wherein a subsequent reinforcement learning objective function and a PSO algorithm fitness function are mean square errors between the ordinate of the 1000 points and a model predicted value;

3. and carrying out parameter sensitivity analysis on the Modelica model. As described above, the 4 parameters to be identified are respectively reducer load moment of inertia2. j/torsion spring stiffness coefficient spring. c/damper damping coefficient damping/and motor output shaft moment of inertia Jmotor, and the sensitivity analysis steps are as follows:

(1) setting a sample: dimension 4 (i.e. 4 input variables), number 10000 (i.e. 10000 input samples), generating a 10000 × 8 Sobol matrix based on the range of each parameter, and processing the Sobol matrix according to the Sobol method to obtain a matrix A, B and A_B ⁽ⁱ⁾(i＝1,2,…,8)；

(2) Mixing the aboveInputting the sample matrix into FMU for simulation solution to obtain output, and sorting the output into Y value matrix Y according to Sobol method_A、Y_BAnd Y_ABWith f (A), f (B) and f (A) respectively_B ⁱ) Represents;

(3) the global influence index S of each parameter is obtained according to the following formula_Ti：

The results are shown in fig. 3, in terms of sensitivity, Jmotor > spring.c > damming > inertia 2.j;

(4) determining the learning sequence to be Jmotor, spring.c, damming and inertia2.J according to the sensitivity sequence obtained in the step (3), and then determining G corresponding to each parameter_iAnd determining that the number of the split lattices corresponding to each parameter is 200/1000/100/30 respectively.

4. And carrying out rough parameter adjustment on the Modelica model. Using Q-Learning based reinforcement Learning, a search range needs to be set for each parameter, in this case 4 parameters to be identified, and the search range is obtained from a look-up manual or experience, as also shown in table 1. Simultaneously, parameters in the algorithm are set, and the learning rate alpha of the reinforcement learning algorithm is made to be 0.7 and lambda₁0.5 is the path weight coefficient, λ₂0.25 is a hyper-parameter for controlling the update amplitude, k 4 is the number of parameter transformation combinations, and then the parameters are preliminarily identified by the following steps:

(1) randomly initializing all parameters to be identified, substituting the parameters into a model based on a Modelica power transmission system for calculation, comparing the parameters with a data sample set to obtain a mean square error as an initial value of an optimal objective function, and recording the mean square error;

(3) entering a round, and randomly selecting an action a in the search range of the parameter_rand(i) Fixing other parameters, and applying the sameThe parameters are applied to the FMU to obtain the objective function value F (X) of the current round_cur) Using the formula

Obtaining a single step reward r, using equation Q^r+1(i,j)+＝α(r+(1-λ₂)max(L_p(i,j))+λ₂min(L_p(i,j))-Q^r(i, j)) updates the Q value table of the current agent. If F (X)_cur)≤F(X_best) If the current parameter has the search value, turning to (4), otherwise, entering the next round;

(4) entering a search cycle in which execution is performed

And (2) entering the next training period, otherwise, finishing the training.

The initial identification result obtained at this stage is the optimal subinterval of each parameter, as shown in table 2, on the basis of which an accurate solution needs to be determined;

TABLE 2

5. And carrying out parameter fine adjustment on the Modelica model. Step 4, only the optimal subinterval where each parameter to be identified is located is obtained, and a final identification result is determined by using a PSO algorithm in the interval, and the specific steps are as follows:

(1) the optimal subinterval obtained in step 4 is

Setting the mu values to be 2 based on the searching capability of the particle swarm optimization;

(2) initializing a 4-dimensional space and a group of particles, creating a velocity vector v for each particle_iPosition vector x_iHistorical optimal position vector p_iAnd a historical optimal position vector p for the entire particle swarm_g；

(3) And establishing a fitness function F (X) by using the model predicted value and the mean square error of the data sample set, wherein the model predicted value is obtained by substituting the current position of the particle into an FMU for simulation and solving. The goal of the search over the entire population is to minimize F (X);

(5) after the maximum iteration times are reached, the historical optimal position vector p of the particle swarm_gThe final result of the parameter fine tuning is the most important result of the whole parameter identification processAnd (4) final results.

The comparison between the final identification result and the parameter standard value is shown in Table 3, and the errors between the parameter identification result and the parameter standard value are less than 5%, so that the engineering use requirement is met.

TABLE 3

A Modelica-based power transmission model is established for a feeding system of a numerical control machine tool, parameter identification is carried out by adopting the method, the output result of a physical prototype and the model simulation result are shown in figure 4 under the same input condition, and it can be seen that the transformation trend and the numerical value of the model simulation curve are basically consistent with the output curve of the physical prototype, thereby explaining the correctness and the effectiveness of the parameter identification method.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for identifying parameters of a power transmission system model based on reinforcement learning is characterized by comprising the following steps:

s4, parameter fine adjustment:

2. The method for identifying the parameters of the power transmission system model based on the reinforcement learning as claimed in claim 1, wherein the step S2 is to perform parameter sensitivity analysis on the power transmission system model with undetermined parameters by using a sobol method, and comprises the following steps:

Where i ═ {1, 2, …, N };

02. the sample matrix A, B,

The model simulation result vectors f (A), f (B),

Wherein Y represents f (A), f (B) and

form aA set of vectors of (a); var (Y) represents the variance of the model output of the powertrain;

3. The reinforcement learning-based power transmission system model parameter identification method according to claim 1, wherein the reinforcement learning framework construction process specifically comprises:

(2) constructing a single step reward:

r＝min(1，max(0，(F(X_mean)-F(X_best))/(F(X_mean)-F(X_cur))))

wherein r represents a single step reward value, F (X)_cur) Denotes the value of the objective function under the current parameter, F (X)_best) Denotes the value of the objective function under the optimum parameter, F (X)_mean) Representing the value of the objective function at the mean value of the parameter;

(3) minimum variation G according to parameter_i(i ═ 1, 2.., N) and the range setting actions for each parameter:

splitting the search range of the ith parameter into

is the maximum value of the ith parameter,

is the ith parameter minimum; minimum change G_iMeans that the ith parameter is being identifiedThe amount of change in each step of the process;

(4) constructing an action selection strategy:

01. selecting a search path:

wherein k represents the number of parameter transformation combinations;

is the first path and the current action a_i，jThe nth largest Q value, λ, among the adjacent k actions₁Is a path weight coefficient;

02. determining an action:

q (i, m) represents the Q value of the ith parameter to be identified, epsilon₁And ε₂Random numbers are used to ensure reinforcement learning exploratory properties;

(5) and (3) constructing an updating strategy of a Q value function:

Q^r+1(i，j)+＝α(r+(1-λ₂)max(L_p(i，j))+λ₂min(L_p(i，j))-Q^r(i，j))

4. The method for identifying the parameters of the model of the power transmission system based on the reinforcement Learning as claimed in claim 1, wherein in the rough parameter adjusting stage, the reinforcement Learning framework based on the Q-Learning algorithm is used, and the iterative training comprises the following specific steps:

the learning process comprises the following steps: randomly selecting an action a in the range of possible value intervals of the corresponding parameters of the current agent_rand(i) Fixing other parameters, applying the parameters to the power transmission system model to obtain the objective function value F (X) under the current parameters_cur) And a single step reward value r for updating the Q value table of the current agent according to the Q value updating strategy of claim 3; if F (X)_cur)≤F(X_best) If the current parameter has a search value, the step (3) is carried out, otherwise, the learning process is repeated;

(3) the action selection policy of claim 3 selecting an action a_iter(i) Fixing other parameters, applying the parameters to FMU to obtain the value of target function F (X)_cur) And a reward and punishment value r, updating the Q value table;

(4) iteratively executing steps (2) and (3) for the ith parameter

And (3) turning to the step (2) to enter the next training period, otherwise, finishing the training.

5. The method as claimed in claim 1, wherein the step S4 is to use PSO optimization algorithm to fine-tune the parameters to obtain the final recognition result.

6. The reinforcement learning-based power transmission system model parameter identification method according to claim 1, wherein the parameters are fine-tuned by using a PSO optimization algorithm, and the method comprises the following specific steps:

setting the fine tuning range to

in order to strengthen the optimal subinterval obtained by learning,

initializing an N-dimensional space and a group of particles, creating a velocity vector v for each particle_iPosition vector x_iHistorical optimal position vector p_iAnd a historical optimal position vector p for the entire particle swarm_g；

Establishing a fitness function G (X) by using the mean square error of a model predicted value and an actually measured value, wherein the model predicted value is obtained by substituting the current position of particles into an FMU for simulation and solving; the goal of the search of the entire population of particles is to minimize g (x);

starting iterative search: in one iteration, calculating the fitness value of each particle, and updating p of single particle_iValue and p of population_gThe value, then according to the following formula update each particle velocity and position vector;

where ω is the inertial weight, c₁、c₂Is a learning factor, r₁、r₂Is [0,1 ]]A random number in between;

7. A powertrain system model parameter identification system based on reinforcement learning, comprising: a computer-readable storage medium and a processor;

the computer-readable storage medium is used for storing executable instructions;

the processor is configured to read executable instructions stored in the computer readable storage medium and execute the reinforcement learning-based powertrain system model parameter identification method of any one of claims 1-6.