CN112632860A - Power transmission system model parameter identification method based on reinforcement learning - Google Patents

Power transmission system model parameter identification method based on reinforcement learning Download PDF

Info

Publication number
CN112632860A
CN112632860A CN202110002104.6A CN202110002104A CN112632860A CN 112632860 A CN112632860 A CN 112632860A CN 202110002104 A CN202110002104 A CN 202110002104A CN 112632860 A CN112632860 A CN 112632860A
Authority
CN
China
Prior art keywords
value
parameter
parameters
reinforcement learning
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110002104.6A
Other languages
Chinese (zh)
Other versions
CN112632860B (en
Inventor
丁建完
陈立平
郭超
彭奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202110002104.6A priority Critical patent/CN112632860B/en
Publication of CN112632860A publication Critical patent/CN112632860A/en
Application granted granted Critical
Publication of CN112632860B publication Critical patent/CN112632860B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/25Design optimisation, verification or simulation using particle-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/08Probabilistic or stochastic CAD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/14Force analysis or force optimisation, e.g. static or dynamic forces

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a method for identifying model parameters of a power transmission system based on reinforcement learning, and belongs to the field of system modeling simulation. Aiming at the problems of inconsistent sensitivity of model parameters of the power transmission system, low convergence rate and higher search range requirement of the existing identification algorithm, the invention constructs a reinforcement learning framework for identifying the model parameters of the power transmission system, and has the advantages of local optimization prevention, high convergence rate and large search range; the method adopts a staged identification process, the rough adjustment stage utilizes the characteristics of high convergence speed and large search range of reinforcement learning to quickly find the optimal subinterval of each parameter, and the fine adjustment stage utilizes the characteristics of high precision and strong global search capability of a heuristic algorithm to determine the final identification result in the optimal subinterval.

Description

Power transmission system model parameter identification method based on reinforcement learning
Technical Field
The invention belongs to the field of system modeling simulation, and particularly relates to a method for identifying parameters of a power transmission system model based on reinforcement learning.
Background
With the continuous development of the multi-field system modeling simulation technology, the Modelica language gradually becomes an industry standard for the development of the multi-field and multi-subject system simulation field, and the aim of the Modelica language is to define a universal programming language for complex system modeling. The Modelica model is a program which is written by using a Modelica language and has the capabilities of object-oriented modeling, multi-field unified modeling, non-causal statement type modeling and continuous discrete hybrid modeling, and can be used for engineers in different industries to build own simulation model systems and develop corresponding dynamic simulation. For the Modelica model, the model is required to be consistent with a physical prototype as much as possible, and the key point is the setting of model parameters, the process of building the model is only used for determining the basic form of the model, and accurate model parameters are required to be set for obtaining the optimal simulation performance.
The power transmission system is a device which obtains energy from a power source in the form of speed/angular velocity and force/moment under the control of a controller and transmits the energy to the next link of the system, and mainly comprises components such as a motor, a speed reducer and the like, and the states of energy loss, speed change and the like exist in the power transmission process.
The existing model parameter identification method has various types, the most common method is the least square method at present, and the method is a default parameter optimization method in modeling software. However, the least square estimation has too high dependency on data, the identification result is easily affected by noise, and the requirement on the initialization range of the parameters is strict. In addition, the commonly used identification method is an evolutionary Algorithm, such as Particle Swarm Optimization (PSO), Genetic Algorithm (GA), and the like, which has a strong ability to approach global optimum, but has a slow convergence rate and a precision that needs to be improved. Therefore, an identification method with high convergence rate and low requirement on the initial range of parameters is urgently needed to identify the parameters of the Modelica-based power transmission system model.
Disclosure of Invention
In view of the above defects or improvement requirements of the prior art, the present invention provides a method for identifying parameters of a model of a powertrain system based on reinforcement learning, and aims to provide an identification method with high convergence rate and low requirement for an initial range of parameters to identify parameters of a model lia-based model of a powertrain system.
In order to achieve the above object, the present invention provides a method for identifying parameters of a powertrain system model based on reinforcement learning, comprising:
s1, constructing a dynamic model of a power transmission system based on a multi-field unified modeling language Modelica;
s2, carrying out sensitivity analysis on the parameters to be identified of the model;
s3, roughly adjusting the parameters to be identified based on a reinforcement learning algorithm:
constructing a reinforcement learning framework for identifying the Modelica power transmission system model parameters;
performing iterative training by using a reinforcement learning framework to obtain an optimal subinterval of each parameter to be identified;
s4, parameter fine adjustment:
and (3) taking the mean square error of the measured data and the model estimation value as an objective function, and iteratively searching for optimization in a solution space formed by the model parameters to be identified to obtain each corresponding parameter value when the objective function value is minimum, wherein the parameter value is used as a final identification result.
Further, step S2 is specifically to perform parameter sensitivity analysis on the powertrain model with undetermined parameters by using a sobol method, and the specific steps are as follows:
01. respectively carrying out Monte Carlo sampling in the possible value range of N parameters to be identified to generate an initial sample matrix I A, an initial sample matrix II B and a cross sample matrix
Figure BDA0002881854870000021
Where i ═ {1, 2, …, N };
02. the sample matrix A, B,
Figure BDA0002881854870000022
As input, the power transmission system model is simulated and solved to respectively obtain an initial sample matrix A, an initial sample matrix B and a cross sample matrix
Figure BDA0002881854870000023
The model simulation result vectors f (A), f (B),
Figure BDA0002881854870000024
03. Calculating the global influence index S of each parameter based on the simulation result and the following formulaTi
Figure BDA0002881854870000031
Wherein Y represents f (A), f (B) and
Figure DEST_PATH_1
a set of vectors is formed; va ((Y) represents the variance of the powertrain model output;
04. performing sensitivity sorting on the parameters to be identified according to the size of the parameter global influence index; larger impact factors indicate more sensitivity;
05. and combining the parameters to be identified with the sensitivity lower than a set threshold value.
Further, the reinforcement learning framework construction process specifically includes:
(1) model estimation value YestAnd measured value YmeaAs an reinforcement learning objective function f (x);
(2) constructing a single step reward:
r=min(1,max(0,(F(Xmean)-F(Xbest))/(F(Xmean)-F(Xcur))))
wherein r represents a single step reward value, F (X)cur) Denotes the value of the objective function under the current parameter, F (X)best) Denotes the value of the objective function under the optimum parameter, F (X)mean) Representing parametersObjective function value at average;
(3) minimum variation G according to parameteri(i ═ 1,2, …, N) and the range setting action for each parameter:
splitting the search range of the ith parameter into
Figure BDA0002881854870000033
Selecting one subinterval, and randomly acquiring a value in the subinterval as an action; wherein,
Figure BDA0002881854870000034
is the maximum value of the ith parameter,
Figure BDA0002881854870000035
is the ith parameter minimum; minimum change GiThe variable quantity of the ith parameter which is increased or decreased in each step in the identification process is referred to;
(4) constructing an action selection strategy:
01. selecting a search path:
determining whether the action of the next round of selection is on the left or the right of the current action, wherein the selection index is Lp(i, j) the calculation formula is:
Figure BDA0002881854870000041
wherein k represents the number of parameter transformation combinations;
Figure BDA0002881854870000042
is the first path and the current action ai,jThe nth largest Q value, λ, among the adjacent k actions1Is a path weight coefficient;
obtain a [0,1 ]]Random number of epsilon in between1The search path/is determined using the following formula:
Figure BDA0002881854870000043
rand (1,2) represents a random probability distribution within the interval 1-2;
02. determining an action:
obtain a [0,1 ]]Random number of epsilon in between2Action a is determined using the following formula:
Figure BDA0002881854870000044
q (i, m) represents the Q value of the ith parameter to be identified, epsilon1And ε2Is a random number used to ensure reinforcement learning exploratory properties;
(5) and (3) constructing an updating strategy of a Q value function:
the Q value function updating formula corresponding to the ith parameter is as follows:
Qr+1(i,j)+=α(r+(1-λ2)max(Lp(i,j))+λ2min(Lp(i,j))-Qr(i,j))
where α is a hyperparameter controlling the learning rate, r is a single step reward, λ2To control the over-parameters of the update amplitude.
Further, in the parameter coarse tuning stage, a reinforcement Learning framework based on a Q-Learning algorithm is used, and the iterative training specifically comprises the following steps:
(1) randomly initializing all parameters to be identified, substituting the parameters into a power transmission system model for calculation, and comparing the parameters with actually measured data to obtain a mean square error serving as an initial value of an optimal objective function F (X);
(2) the plurality of agents execute serial learning from large to small according to the sensitivity of the parameters to be identified; the link of adjusting the parameters to be identified according to the Q value table in the reinforcement learning frame is called an agent, and each parameter to be identified uniquely corresponds to one agent; the process of the intelligent agent adjusting the parameters to be identified is called the learning behavior of the intelligent agent;
the learning process comprises the following steps: randomly selecting an action a in the range of possible value intervals of the corresponding parameters of the current agentrand(i) Fixing other parameters, applying the parameters to the power transmission system model to obtain the current parametersThe objective function value F (X)cur) And a single step reward value r, updating the Q value table of the current agent according to the Q value updating strategy; if F (X)cur)≤F(Xbest) If the current parameter has a search value, the step (3) is carried out, otherwise, the learning process is repeated;
(3) selecting an action a according to an action selection policyiter(i) Fixing other parameters, applying the parameters to FMU to obtain the value of target function F (X)cur) And a reward and punishment value r, updating the Q value table;
(4) iteratively executing steps (2) and (3) for the ith parameter
Figure BDA0002881854870000051
Secondly, completing one identification of the ith parameter, and switching to the identification of the (i + 1) th parameter;
(5) all the parameters to be identified are serially executed in the steps (2) to (4) to complete a training period, and if the current completion period number is less than the given training times
Figure BDA0002881854870000052
And (2) entering the next training period, otherwise, finishing the training.
Further, step S4 is specifically to use a PSO optimization algorithm to fine tune the parameters to obtain a final identification result, and the specific steps are as follows:
setting a fine tuning range to take into account recognition errors of reinforcement learning
Figure BDA0002881854870000053
Wherein the mu value is set according to the searching capability of the particle swarm algorithm,
Figure BDA0002881854870000054
in order to strengthen the optimal subinterval obtained by learning,
Figure BDA0002881854870000055
the parameter is the optimal value of the ith parameter after reinforcement learning identification;
an N-dimensional space and a population of particles are initialized,creating a velocity vector v for each particleiPosition vector xiHistorical optimal position vector piAnd a historical optimal position vector p for the entire particle swarmg
Establishing a fitness function G (X) by using the mean square error of a model predicted value and an actually measured value, wherein the model predicted value is obtained by substituting the current position of particles into an FMU for simulation and solving; the goal of the search over the entire population is to minimize G (X);
starting iterative search:
in one iteration, calculating the fitness value of each particle, and updating p of single particleiValue and p of populationgThe value, then according to the following formula update each particle velocity and position vector;
Figure BDA0002881854870000061
Figure BDA0002881854870000062
where ω is the inertial weight and is 0.6, c by default1、c2Is a learning factor, r1、r2Is [0,1 ]]A random number in between;
after the maximum iteration times are reached, the historical optimal position vector p of the particle swarmgThe final result of the parameter identification is obtained.
In general, the above technical solutions contemplated by the present invention can achieve the following advantageous effects compared to the prior art.
(1) Aiming at the problems that the Modelica model parameter sensitivities are inconsistent, the existing identification algorithm is low in convergence speed and high in search range requirement, a reinforcement learning framework for identifying the Modelica power transmission system model parameters is constructed, and the method has the advantages of local optimization prevention, high convergence speed and large search range; the invention adopts a staged identification process, the rough adjustment stage utilizes the characteristics of fast convergence speed and large search range of reinforcement learning to quickly find the optimal subinterval of each parameter, the fine adjustment stage utilizes the characteristics of high precision and strong global search capability of a heuristic algorithm to determine the final identification result in the optimal subinterval, compared with the method only using a single algorithm, the staged identification effect is better,
(2) aiming at the problem of huge search space caused by excessive parameters, the invention also carries out sensitivity analysis on the parameters to be identified, combines the parameters to be identified with lower sensitivity, is beneficial to reducing the number of the parameters to be identified and saves the expense of computing resources in the process of strengthening learning and identifying the parameters; meanwhile, the sensitivity is used for setting the parameter identification priority, so that a basis can be provided for the learning rate setting in the parameter identification process of the reinforcement learning method, and the parameter identification precision is improved.
(3) The method designs a sensitivity analysis scheme based on a Sobol method, simultaneously explores all parameters of the model in a possible value space based on the variance, has no limitation on the nonlinearity and monotonicity of the model, and is suitable for carrying out parameter global sensitivity analysis on any model.
Drawings
FIG. 1 is a flow chart of a method for identifying parameters of a powertrain model based on reinforcement learning according to the present invention;
FIG. 2 is a model of the powertrain created based on Modelica;
FIG. 3 is a global impact index for inertia2.J/spring. c/damming/and Jmotor parameters;
FIG. 4 is a Modelica-based power transmission model established for a numerically-controlled machine tool feeding system, parameter identification is carried out by adopting the method of the invention, and the output result of a physical prototype and the simulation result of the model are obtained under the same input condition.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The invention provides a Modelica model parameter identification method based on reinforcement learning, which can effectively identify key parameters of a Modelica model so as to improve the simulation performance of the model. In order to meet the requirements of high convergence speed and high identification precision, the invention takes the actually measured data of a physical machine corresponding to a model as a training sample, adopts a reinforcement Learning framework improved based on a Q-Learning algorithm in the parameter coarse adjustment stage, and carries out serial iterative training on an intelligent agent represented by each parameter to be identified; and in the fine adjustment stage, a PSO/GA algorithm is adopted to determine the final value of the parameter to be identified. The method has the characteristics of high convergence rate and high identification precision.
Reinforcement learning is a machine learning method based on a trial and error mechanism,origin of originResearch in the field of animal intelligence has subsequently evolved into an important branch of the field of machine learning. The reinforcement learning considers the interaction process between the intelligent agent and the unknown environment guided by the target, accumulates experience in the process of continuous trial and error and continuously updates the action selection strategy, and has the advantages of high convergence speed, easy avoidance of local optimization, no need of priori knowledge and the like. Currently, reinforcement learning is mostly used in the field of combinatorial optimization (sequence decision) and has been successful, but the function optimization field to which parameter identification belongs has not been applied yet. In consideration of the requirements of fast convergence, global optimum and the like required by the Modelica model parameter identification, the invention provides a parameter identification method based on reinforcement learning to solve the problem.
The invention adopts the overall technical scheme that: analyzing the parameter sensitivity of the Modelica model to be identified by using a Sobol-based sensitivity analysis method, sequencing the parameters according to the size of the global influence factor of the parameters, and determining the training sequence and the discretization degree (namely the number of lattices split in the initialization range) of the parameters in the coarse adjustment stage; the Q-Learning-based improved reinforcement Learning framework is used, the mean square error between actual measurement data and model simulation data on a physical machine is taken as an objective function and an environment state, the adjustment of parameters in a feasible range is taken as an action, the relation between a current objective function value and a historical optimal/average value is a single-step reward, the parameters to be identified of the Modelica model are coarsely adjusted, and an optimal subinterval of each parameter is obtained; and (3) based on the optimal subintervals of all the parameters, performing iterative training by adopting a PSO/GA algorithm to obtain the optimal identification value of each parameter, namely the final identification result of the whole technical scheme.
S1, Modelica model parameter sensitivity analysis based on a Sobol method comprises the following steps:
(1) an FMU (Functional Module-up) of a Modelica model to be identified is derived, a Functional model unit comprises a C code, a solver and the like of the model, can be directly called and independently completes Simulation solving of the model, and is set to be a 'Co-Simulation' mode;
(2) respectively carrying out Monte Carlo sampling in the range of N parameters to be identified to generate a sample matrix A, B and ABiAnd (i-1, 2, …, N), performing simulation solution on the FMU by taking the sample matrix as input to obtain a Y-value matrix, obtaining a global influence factor of each parameter to be identified based on the Y-value matrix and an influence index formula, and sequencing. Determining a training sequence according to the size of the influence factors of the parameters to be identified, wherein the parameters with larger influence factors (namely more sensitive) are optimized in a single-round training;
(3) obtaining a random value in the search range of each parameter, substituting the random value into FMU to calculate the result as a reference value, and sequentially determining the minimum subinterval size G of each parameteri(i 1,2, …, N) to ensure that the ith parameter is increasing or decreasing GiAnd the simulation result is influenced obviously.
And S2, a reinforcement Learning framework based on the Q-Learning algorithm is used for the parameter coarse adjustment stage. The device is characterized by comprising the following components:
(1) setting the environmental state as the model estimated value Y of the current wheel numberestMeasured value Y of physical machinemeaThe mean square error (i.e., the reinforcement learning objective function f (x)) of (a) is used to represent the difference between the current model parameter combination and the actual parameter combination;
(2) the single step reward is used for measuring the quality of the action selected in the current state and is an expression of an objective function value, and the single step reward has the following two modes:
i
Figure BDA0002881854870000091
ii r=min(1,max(0,(F(Xmean)-F(Xbest))/(F(Xmean)-F(Xcur))))
r represents a reward or punishment value, F (X)mean) Denotes the value of the objective function at the mean value of the parameter, F (X)best) Denotes the value of the objective function under the optimum parameter, F (X)cur) Representing the value of an objective function at the current parameter
(3) Action according to parameter minimum sub-interval Gi(i-1, 2, …, N) and range settings for each parameter, splitting the search range for the ith parameter into
Figure BDA0002881854870000092
Sub-interval (
Figure BDA0002881854870000093
Is the maximum value of the parameter, and the maximum value of the parameter,
Figure BDA0002881854870000094
the parameter is the minimum value), after one subinterval is selected, a value is randomly acquired in the subinterval to serve as an action;
(4) the action selection strategy is divided into two phases: selecting a search path and determining behavior. The former determines whether the next round of selected behavior is on the left or right of the current behavior, and the selection index is Lp(i, j), the calculation formula is as follows:
Figure BDA0002881854870000095
Lp(i, j) … …, k the number of parameter transformation combinations
Figure BDA0002881854870000096
Is the current behavior a on the ith pathi,jThe nth largest Q value, lambda, of the adjacent k behaviors1Is a pathA weight coefficient.
Obtain a [0,1 ]]Random number of epsilon in between1The search path is determined using the following formula:
Figure BDA0002881854870000101
rand (1,2) denotes a random probability distribution within the interval 1-2
The determined behavior is to select one of k behaviors adjacent to the current behavior on the search path to obtain a [0,1 ]]Random number of epsilon in between2Behavior is determined using the following formula:
Figure BDA0002881854870000102
ε1and ε2Are random numbers that are used to ensure reinforcement learning exploratory.
(5) Designing an updating strategy of a Q value function based on a Q-Learning algorithm, wherein a Q value function updating formula corresponding to the ith parameter is as follows:
Qr+1(i,j)+=α(r+(1-λ2)max(Lp(i,j))+λ2min(Lp(i,j))-Qr(i,j))
where α is a hyperparameter controlling the learning rate, r is a single step reward, λ2The smaller the value of the hyper-parameter for controlling the update amplitude, the larger the update amplitude of the Q value table.
S3, a Modelica model parameter identification algorithm based on a reinforcement learning framework is used for a parameter coarse adjustment stage, and the method specifically comprises the following steps:
(1) all the parameters to be identified are initialized randomly and substituted into FMU for calculation, and compared with measured data, the mean square error is obtained and used as the initial value of the optimal objective function, and the initial value is recorded;
(2) entering a training period, wherein a plurality of agents perform serial learning, namely, each parameter is optimized in turn, and the learning process of a single agent is called as a 'one-turn';
(3) enter intoA round, randomly selecting an action a within the search range of the parameterrand(i) Fixing other parameters, applying the parameters to FMU to obtain the objective function value F (X) of current roundcur) And a current reward and punishment value r, wherein the Q value updating strategy updates the Q value table of the current agent according to the Q value updating strategy of claim 3. If F (X)cur)≤F(Xbest) If the current parameter has the search value, turning to (4), otherwise, entering the next round;
(4) entering a search cycle in which execution is performed
Figure BDA0002881854870000103
Each iteration selects an action a according to an action selection strategyiter(i) Fixing other parameters, applying the parameters to FMU to obtain the objective function value F (X) of current roundcur) Updating the Q value table of the current agent according to the current reward and punishment value r;
(5) after the search period of one agent is finished, entering a 'one-turn' of the next agent until all agents go through one-turn, finishing a training period, and if the current period number is less than the current period number
Figure BDA0002881854870000111
And (2) entering the next training period, otherwise, finishing the training.
S4, a parameter fine-tuning scheme based on a heuristic optimization algorithm PSO is formed as follows:
(1) and determining the fine tuning range of each parameter to be identified according to the operation result of the reinforcement learning algorithm. If the optimal subinterval obtained by reinforcement learning is
Figure BDA0002881854870000112
Setting a fine tuning range to take into account recognition errors of reinforcement learning
Figure BDA0002881854870000113
The value of μ is set according to the search capability of the particle swarm algorithm,
Figure BDA0002881854870000114
the parameter is the optimal value of the ith parameter after reinforcement learning identification;
(2) initializing an N-dimensional space and a group of particles, creating a velocity vector v for each particleiPosition vector xiHistorical optimal position vector piAnd a historical optimal position vector p for the entire particle swarmg
(3) And establishing a fitness function F (X) by using the mean square error of the model predicted value and the measured value, wherein the model predicted value is obtained by substituting the current position of the particle into an FMU for simulation and solving. The goal of the search over the entire population is to minimize F (X);
(4) starting iterative search, calculating the fitness value of each particle in one iteration, and updating p of single particleiValue and p of populationgThe value, then according to the following formula update each particle velocity and position vector,
Figure BDA0002881854870000115
10.
Figure BDA0002881854870000116
in the formula (1), omega is an inertia weight and is 0.6 as default, c1、c2For learning factors, in general c1=c2=2,r1、 r2Is [0,1 ]]A random number in between;
after the maximum iteration times are reached, the historical optimal position vector p of the particle swarmgThe final result of the parameter fine tuning is also the final result of the entire parameter identification process.
The embodiment of the invention takes a Modelica-based power transmission system model (a numerically-controlled machine tool single-shaft servo feeding system) as shown in figure 2 as an example to describe the method in detail, the system simulates a sinusoidal signal to drive a gear reducer and drive a load to run, and two components for simulating inertia and springs and shock absorbers for influencing the whole system are arranged in the middle of the system. The input of the whole system model is a sinusoidal signal, the output is the absolute angular velocity of the component inertia3, and the parameters to be identified and their attached information are shown in table 1.
Figure BDA0002881854870000121
TABLE 1
It should be explained that this embodiment is for explaining the technical solution of the present invention in detail and proving its effectiveness and feasibility, not a real engineering case, so in this embodiment, the measured data of a real physical machine is not used as a sample set, but the output of a standard model is used as a sample set, and then all the parameters to be identified are initialized randomly, and further, the technical solution of the present invention is used for parameter identification so that the parameters to be identified are as close to the parameter standard value as possible, and then the identification is completed. The complete identification process comprises the following steps:
1. deriving an FMU (functional model unit) of the Modelica model of the transmission system, and setting simulation parameters such as step length, starting time, ending time, solver and the like of simulation solution;
2. and setting the parameter to be identified in the FMU as a standard value, and performing simulation solving to obtain an output as a sample set. Sampling 1000 points on an output curve in a fixed period to serve as a data sample set for parameter identification at this time, wherein a subsequent reinforcement learning objective function and a PSO algorithm fitness function are mean square errors between the ordinate of the 1000 points and a model predicted value;
3. and carrying out parameter sensitivity analysis on the Modelica model. As described above, the 4 parameters to be identified are respectively reducer load moment of inertia2. j/torsion spring stiffness coefficient spring. c/damper damping coefficient damping/and motor output shaft moment of inertia Jmotor, and the sensitivity analysis steps are as follows:
(1) setting a sample: dimension 4 (i.e. 4 input variables), number 10000 (i.e. 10000 input samples), generating a 10000 × 8 Sobol matrix based on the range of each parameter, and processing the Sobol matrix according to the Sobol method to obtain a matrix A, B and AB (i)(i=1,2,…,8);
(2) Mixing the aboveInputting the sample matrix into FMU for simulation solution to obtain output, and sorting the output into Y value matrix Y according to Sobol methodA、YBAnd YABWith f (A), f (B) and f (A) respectivelyB i) Represents;
(3) the global influence index S of each parameter is obtained according to the following formulaTi
Figure BDA0002881854870000131
The results are shown in fig. 3, in terms of sensitivity, Jmotor > spring.c > damming > inertia 2.j;
(4) determining the learning sequence to be Jmotor, spring.c, damming and inertia2.J according to the sensitivity sequence obtained in the step (3), and then determining G corresponding to each parameteriAnd determining that the number of the split lattices corresponding to each parameter is 200/1000/100/30 respectively.
4. And carrying out rough parameter adjustment on the Modelica model. Using Q-Learning based reinforcement Learning, a search range needs to be set for each parameter, in this case 4 parameters to be identified, and the search range is obtained from a look-up manual or experience, as also shown in table 1. Simultaneously, parameters in the algorithm are set, and the learning rate alpha of the reinforcement learning algorithm is made to be 0.7 and lambda10.5 is the path weight coefficient, λ20.25 is a hyper-parameter for controlling the update amplitude, k 4 is the number of parameter transformation combinations, and then the parameters are preliminarily identified by the following steps:
(1) randomly initializing all parameters to be identified, substituting the parameters into a model based on a Modelica power transmission system for calculation, comparing the parameters with a data sample set to obtain a mean square error as an initial value of an optimal objective function, and recording the mean square error;
(2) entering a training period, wherein a plurality of agents perform serial learning, namely, each parameter is optimized in turn, and the learning process of a single agent is called as a 'one-turn';
(3) entering a round, and randomly selecting an action a in the search range of the parameterrand(i) Fixing other parameters, and applying the sameThe parameters are applied to the FMU to obtain the objective function value F (X) of the current roundcur) Using the formula
Figure BDA0002881854870000141
Obtaining a single step reward r, using equation Qr+1(i,j)+=α(r+(1-λ2)max(Lp(i,j))+λ2min(Lp(i,j))-Qr(i, j)) updates the Q value table of the current agent. If F (X)cur)≤F(Xbest) If the current parameter has the search value, turning to (4), otherwise, entering the next round;
(4) entering a search cycle in which execution is performed
Figure BDA0002881854870000142
Each iteration selects an action a according to an action selection strategyiter(i) Fixing other parameters, applying the parameters to FMU to obtain the objective function value F (X) of current roundcur) Updating the Q value table of the current agent according to the current reward and punishment value r;
(5) after the search period of one agent is finished, entering a 'one-turn' of the next agent until all agents go through one-turn, finishing a training period, and if the current period number is less than the current period number
Figure BDA0002881854870000143
And (2) entering the next training period, otherwise, finishing the training.
The initial identification result obtained at this stage is the optimal subinterval of each parameter, as shown in table 2, on the basis of which an accurate solution needs to be determined;
Figure BDA0002881854870000144
TABLE 2
5. And carrying out parameter fine adjustment on the Modelica model. Step 4, only the optimal subinterval where each parameter to be identified is located is obtained, and a final identification result is determined by using a PSO algorithm in the interval, and the specific steps are as follows:
(1) the optimal subinterval obtained in step 4 is
Figure BDA0002881854870000145
Setting a fine tuning range to take into account recognition errors of reinforcement learning
Figure BDA0002881854870000146
Setting the mu values to be 2 based on the searching capability of the particle swarm optimization;
(2) initializing a 4-dimensional space and a group of particles, creating a velocity vector v for each particleiPosition vector xiHistorical optimal position vector piAnd a historical optimal position vector p for the entire particle swarmg
(3) And establishing a fitness function F (X) by using the model predicted value and the mean square error of the data sample set, wherein the model predicted value is obtained by substituting the current position of the particle into an FMU for simulation and solving. The goal of the search over the entire population is to minimize F (X);
(4) starting iterative search, calculating the fitness value of each particle in one iteration, and updating p of single particleiValue and p of populationgThe value, then according to the following formula update each particle velocity and position vector,
Figure BDA0002881854870000151
Figure BDA0002881854870000152
in the formula (1), omega is an inertia weight and is 0.6 as default, c1、c2For learning factors, in general c1=c2=2,r1、 r2Is [0,1 ]]A random number in between;
(5) after the maximum iteration times are reached, the historical optimal position vector p of the particle swarmgThe final result of the parameter fine tuning is the most important result of the whole parameter identification processAnd (4) final results.
The comparison between the final identification result and the parameter standard value is shown in Table 3, and the errors between the parameter identification result and the parameter standard value are less than 5%, so that the engineering use requirement is met.
Figure BDA0002881854870000153
TABLE 3
A Modelica-based power transmission model is established for a feeding system of a numerical control machine tool, parameter identification is carried out by adopting the method, the output result of a physical prototype and the model simulation result are shown in figure 4 under the same input condition, and it can be seen that the transformation trend and the numerical value of the model simulation curve are basically consistent with the output curve of the physical prototype, thereby explaining the correctness and the effectiveness of the parameter identification method.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (7)

1. A method for identifying parameters of a power transmission system model based on reinforcement learning is characterized by comprising the following steps:
s1, constructing a dynamic model of a power transmission system based on a multi-field unified modeling language Modelica;
s2, carrying out sensitivity analysis on the parameters to be identified of the model;
s3, roughly adjusting the parameters to be identified based on a reinforcement learning algorithm:
constructing a reinforcement learning framework for identifying the Modelica power transmission system model parameters;
performing iterative training by using a reinforcement learning framework to obtain an optimal subinterval of each parameter to be identified;
s4, parameter fine adjustment:
and (3) taking the mean square error of the measured data and the model estimation value as an objective function, and iteratively searching for optimization in a solution space formed by the model parameters to be identified to obtain each corresponding parameter value when the objective function value is minimum, wherein the parameter value is used as a final identification result.
2. The method for identifying the parameters of the power transmission system model based on the reinforcement learning as claimed in claim 1, wherein the step S2 is to perform parameter sensitivity analysis on the power transmission system model with undetermined parameters by using a sobol method, and comprises the following steps:
01. respectively carrying out Monte Carlo sampling in the possible value range of N parameters to be identified to generate an initial sample matrix I A, an initial sample matrix II B and a cross sample matrix
Figure FDA0002881854860000011
Where i ═ {1, 2, …, N };
02. the sample matrix A, B,
Figure FDA0002881854860000012
As input, the power transmission system model is simulated and solved to respectively obtain an initial sample matrix A, an initial sample matrix B and a cross sample matrix
Figure FDA0002881854860000013
The model simulation result vectors f (A), f (B),
Figure FDA0002881854860000014
03. Calculating the global influence index S of each parameter based on the simulation result and the following formulaTi
Figure FDA0002881854860000015
Wherein Y represents f (A), f (B) and
Figure 1
form aA set of vectors of (a); var (Y) represents the variance of the model output of the powertrain;
04. performing sensitivity sorting on the parameters to be identified according to the size of the parameter global influence index; larger impact factors indicate more sensitivity;
05. and combining the parameters to be identified with the sensitivity lower than a set threshold value.
3. The reinforcement learning-based power transmission system model parameter identification method according to claim 1, wherein the reinforcement learning framework construction process specifically comprises:
(1) model estimation value YestAnd measured value YmeaAs an reinforcement learning objective function f (x);
(2) constructing a single step reward:
r=min(1,max(0,(F(Xmean)-F(Xbest))/(F(Xmean)-F(Xcur))))
wherein r represents a single step reward value, F (X)cur) Denotes the value of the objective function under the current parameter, F (X)best) Denotes the value of the objective function under the optimum parameter, F (X)mean) Representing the value of the objective function at the mean value of the parameter;
(3) minimum variation G according to parameteri(i ═ 1, 2.., N) and the range setting actions for each parameter:
splitting the search range of the ith parameter into
Figure FDA0002881854860000021
Selecting one subinterval, and randomly acquiring a value in the subinterval as an action; wherein,
Figure FDA0002881854860000022
is the maximum value of the ith parameter,
Figure FDA0002881854860000023
is the ith parameter minimum; minimum change GiMeans that the ith parameter is being identifiedThe amount of change in each step of the process;
(4) constructing an action selection strategy:
01. selecting a search path:
determining whether the action of the next round of selection is on the left or the right of the current action, wherein the selection index is Lp(i, j) the calculation formula is:
Figure FDA0002881854860000024
wherein k represents the number of parameter transformation combinations;
Figure FDA0002881854860000025
is the first path and the current action ai,jThe nth largest Q value, λ, among the adjacent k actions1Is a path weight coefficient;
obtain a [0,1 ]]Random number of epsilon in between1The search path/is determined using the following formula:
Figure FDA0002881854860000031
rand (1,2) represents a random probability distribution within the interval 1-2;
02. determining an action:
obtain a [0,1 ]]Random number of epsilon in between2Action a is determined using the following formula:
Figure FDA0002881854860000032
q (i, m) represents the Q value of the ith parameter to be identified, epsilon1And ε2Random numbers are used to ensure reinforcement learning exploratory properties;
(5) and (3) constructing an updating strategy of a Q value function:
the Q value function updating formula corresponding to the ith parameter is as follows:
Qr+1(i,j)+=α(r+(1-λ2)max(Lp(i,j))+λ2min(Lp(i,j))-Qr(i,j))
where α is a hyperparameter controlling the learning rate, r is a single step reward, λ2To control the over-parameters of the update amplitude.
4. The method for identifying the parameters of the model of the power transmission system based on the reinforcement Learning as claimed in claim 1, wherein in the rough parameter adjusting stage, the reinforcement Learning framework based on the Q-Learning algorithm is used, and the iterative training comprises the following specific steps:
(1) randomly initializing all parameters to be identified, substituting the parameters into a power transmission system model for calculation, and comparing the parameters with actually measured data to obtain a mean square error serving as an initial value of an optimal objective function F (X);
(2) the plurality of agents execute serial learning from large to small according to the sensitivity of the parameters to be identified; the link of adjusting the parameters to be identified according to the Q value table in the reinforcement learning frame is called an agent, and each parameter to be identified uniquely corresponds to one agent; the process of the intelligent agent adjusting the parameters to be identified is called the learning behavior of the intelligent agent;
the learning process comprises the following steps: randomly selecting an action a in the range of possible value intervals of the corresponding parameters of the current agentrand(i) Fixing other parameters, applying the parameters to the power transmission system model to obtain the objective function value F (X) under the current parameterscur) And a single step reward value r for updating the Q value table of the current agent according to the Q value updating strategy of claim 3; if F (X)cur)≤F(Xbest) If the current parameter has a search value, the step (3) is carried out, otherwise, the learning process is repeated;
(3) the action selection policy of claim 3 selecting an action aiter(i) Fixing other parameters, applying the parameters to FMU to obtain the value of target function F (X)cur) And a reward and punishment value r, updating the Q value table;
(4) iteratively executing steps (2) and (3) for the ith parameter
Figure FDA0002881854860000041
Secondly, completing one identification of the ith parameter, and switching to the identification of the (i + 1) th parameter;
(5) all the parameters to be identified are serially executed in the steps (2) to (4) to complete a training period, and if the current completion period number is less than the given training times
Figure FDA0002881854860000042
And (3) turning to the step (2) to enter the next training period, otherwise, finishing the training.
5. The method as claimed in claim 1, wherein the step S4 is to use PSO optimization algorithm to fine-tune the parameters to obtain the final recognition result.
6. The reinforcement learning-based power transmission system model parameter identification method according to claim 1, wherein the parameters are fine-tuned by using a PSO optimization algorithm, and the method comprises the following specific steps:
setting the fine tuning range to
Figure FDA0002881854860000043
Wherein the mu value is set according to the searching capability of the particle swarm algorithm,
Figure FDA0002881854860000044
in order to strengthen the optimal subinterval obtained by learning,
Figure FDA0002881854860000045
the parameter is the optimal value of the ith parameter after reinforcement learning identification;
initializing an N-dimensional space and a group of particles, creating a velocity vector v for each particleiPosition vector xiHistorical optimal position vector piAnd a historical optimal position vector p for the entire particle swarmg
Establishing a fitness function G (X) by using the mean square error of a model predicted value and an actually measured value, wherein the model predicted value is obtained by substituting the current position of particles into an FMU for simulation and solving; the goal of the search of the entire population of particles is to minimize g (x);
starting iterative search: in one iteration, calculating the fitness value of each particle, and updating p of single particleiValue and p of populationgThe value, then according to the following formula update each particle velocity and position vector;
Figure FDA0002881854860000051
Figure FDA0002881854860000052
where ω is the inertial weight, c1、c2Is a learning factor, r1、r2Is [0,1 ]]A random number in between;
after the maximum iteration times are reached, the historical optimal position vector p of the particle swarmgThe final result of the parameter identification is obtained.
7. A powertrain system model parameter identification system based on reinforcement learning, comprising: a computer-readable storage medium and a processor;
the computer-readable storage medium is used for storing executable instructions;
the processor is configured to read executable instructions stored in the computer readable storage medium and execute the reinforcement learning-based powertrain system model parameter identification method of any one of claims 1-6.
CN202110002104.6A 2021-01-04 2021-01-04 Power transmission system model parameter identification method based on reinforcement learning Active CN112632860B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110002104.6A CN112632860B (en) 2021-01-04 2021-01-04 Power transmission system model parameter identification method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110002104.6A CN112632860B (en) 2021-01-04 2021-01-04 Power transmission system model parameter identification method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN112632860A true CN112632860A (en) 2021-04-09
CN112632860B CN112632860B (en) 2024-06-04

Family

ID=75290870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110002104.6A Active CN112632860B (en) 2021-01-04 2021-01-04 Power transmission system model parameter identification method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN112632860B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113406434A (en) * 2021-05-14 2021-09-17 杭州电子科技大学 SVG dynamic parameter segmentation optimization identification method based on parameter fault characteristics
CN113836788A (en) * 2021-08-24 2021-12-24 浙江大学 Acceleration method for flow industry reinforcement learning control based on local data enhancement
CN114676572A (en) * 2022-03-25 2022-06-28 中国航空发动机研究院 Parameter determination method and device and computer readable storage medium
CN114860388A (en) * 2022-07-07 2022-08-05 中国汽车技术研究中心有限公司 Combined simulation method for converting FMU (failure mode reporting) model into Modelica model
DE102022104313A1 (en) 2022-02-23 2023-08-24 Dr. Ing. H.C. F. Porsche Aktiengesellschaft Method, system and computer program product for autonomously calibrating an electric powertrain
CN117312808A (en) * 2023-11-30 2023-12-29 山东省科学院海洋仪器仪表研究所 Calculation method for sea surface aerodynamic roughness
CN118096085A (en) * 2024-04-24 2024-05-28 山东冠县鑫恒祥面业有限公司 Flour production line equipment operation and maintenance management method based on Internet of things

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341294A (en) * 2017-06-15 2017-11-10 苏州同元软控信息技术有限公司 Spacecraft Information System Modeling emulation mode based on Modelica language
CN109522602A (en) * 2018-10-18 2019-03-26 北京航空航天大学 A kind of Modelica Model Parameter Optimization method based on agent model
US20190347370A1 (en) * 2018-05-09 2019-11-14 Palo Alto Research Center Incorporated Learning constitutive equations of physical components with constraints discovery

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341294A (en) * 2017-06-15 2017-11-10 苏州同元软控信息技术有限公司 Spacecraft Information System Modeling emulation mode based on Modelica language
US20190347370A1 (en) * 2018-05-09 2019-11-14 Palo Alto Research Center Incorporated Learning constitutive equations of physical components with constraints discovery
CN109522602A (en) * 2018-10-18 2019-03-26 北京航空航天大学 A kind of Modelica Model Parameter Optimization method based on agent model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴义忠;蒋占四;陈立平;: "基于Modelica语言的多领域模型仿真优化研究", ***仿真学报, no. 12, 20 June 2009 (2009-06-20) *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113406434A (en) * 2021-05-14 2021-09-17 杭州电子科技大学 SVG dynamic parameter segmentation optimization identification method based on parameter fault characteristics
CN113406434B (en) * 2021-05-14 2022-05-31 杭州电子科技大学 SVG dynamic parameter segmentation optimization identification method based on parameter fault characteristics
CN113836788A (en) * 2021-08-24 2021-12-24 浙江大学 Acceleration method for flow industry reinforcement learning control based on local data enhancement
CN113836788B (en) * 2021-08-24 2023-10-27 浙江大学 Acceleration method for flow industrial reinforcement learning control based on local data enhancement
DE102022104313A1 (en) 2022-02-23 2023-08-24 Dr. Ing. H.C. F. Porsche Aktiengesellschaft Method, system and computer program product for autonomously calibrating an electric powertrain
CN114676572A (en) * 2022-03-25 2022-06-28 中国航空发动机研究院 Parameter determination method and device and computer readable storage medium
CN114676572B (en) * 2022-03-25 2023-02-17 中国航空发动机研究院 Parameter determination method and device and computer readable storage medium
CN114860388A (en) * 2022-07-07 2022-08-05 中国汽车技术研究中心有限公司 Combined simulation method for converting FMU (failure mode reporting) model into Modelica model
CN117312808A (en) * 2023-11-30 2023-12-29 山东省科学院海洋仪器仪表研究所 Calculation method for sea surface aerodynamic roughness
CN117312808B (en) * 2023-11-30 2024-02-06 山东省科学院海洋仪器仪表研究所 Calculation method for sea surface aerodynamic roughness
CN118096085A (en) * 2024-04-24 2024-05-28 山东冠县鑫恒祥面业有限公司 Flour production line equipment operation and maintenance management method based on Internet of things

Also Published As

Publication number Publication date
CN112632860B (en) 2024-06-04

Similar Documents

Publication Publication Date Title
CN112632860B (en) Power transmission system model parameter identification method based on reinforcement learning
CN108621159B (en) Robot dynamics modeling method based on deep learning
Vrabie et al. Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems
CN111428849A (en) Improved particle swarm algorithm-based transfer function model parameter identification method and device
US20220326664A1 (en) Improved machine learning for technical systems
KR20170052344A (en) Method and apparatus for searching new material
CN112395777B (en) Engine calibration parameter optimization method based on automobile exhaust emission simulation environment
CN115291513A (en) Boiler reheat steam temperature prediction method and device, terminal equipment and storage medium
CN114880806A (en) New energy automobile sales prediction model parameter optimization method based on particle swarm optimization
Picotti et al. Data-driven tuning of a nmpc controller for a virtual motorcycle through genetic algorithm
CN116933948A (en) Prediction method and system based on improved seagull algorithm and back propagation neural network
CN115167102A (en) Reinforced learning self-adaptive PID control method based on parallel dominant motion evaluation
CN114740710A (en) Random nonlinear multi-agent reinforcement learning optimization formation control method
Menner et al. Automated controller calibration by Kalman filtering
Stenger et al. Benchmark of bayesian optimization and metaheuristics for control engineering tuning problems with crash constraints
Xu et al. Meta-learning via weighted gradient update
Arshad et al. Deep Deterministic Policy Gradient to Regulate Feedback Control Systems Using Reinforcement Learning.
Liu et al. System identification based on generalized orthonormal basis function for unmanned helicopters: A reinforcement learning approach
CN114967472A (en) Unmanned aerial vehicle trajectory tracking state compensation depth certainty strategy gradient control method
CN114139937A (en) Indoor thermal comfort data generation method, system, equipment and medium
del Rio Ruiz et al. Towards the development of a CAD tool for the implementation of high-speed embedded MPCs on FPGAs
Brasch et al. Lateral control of a vehicle using reinforcement learning
CN113657604A (en) Device and method for operating an inspection table
He et al. A novel tuning method for predictive control of vav air conditioning system based on machine learning and improved PSO
CN111077769A (en) Method for controlling or regulating a technical system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant