CN115755606B

CN115755606B - Automatic optimization method, medium and equipment for carrier controller based on Bayesian optimization

Info

Publication number: CN115755606B
Application number: CN202211433936.4A
Authority: CN
Inventors: 苏杰; 牟剑秋; 许正昊; 李晓芸
Original assignee: Shanghai Youdao Zhitu Technology Co Ltd
Current assignee: Shanghai Youdao Zhitu Technology Co Ltd
Priority date: 2022-11-16
Filing date: 2022-11-16
Publication date: 2023-07-07
Anticipated expiration: 2042-11-16
Also published as: CN115755606A

Abstract

The invention discloses a carrier automatic driving controller automatic optimization method, medium and equipment based on Bayesian optimization, which use Bayesian optimization to automatically optimize the performance of the carrier automatic driving controller, replace manual parameter adjustment and grid parameter adjustment with original redundancy and low efficiency, have definite practical significance, use batch parallelization technology to improve and promote the analytic proxy function of the Bayesian optimization, improve the efficiency of the carrier automatic driving controller performance optimization, and have obvious technical advancement and practicability.

Description

Automatic optimization method, medium and equipment for carrier controller based on Bayesian optimization

Technical Field

The invention belongs to the technical field of intelligent automobile autopilot, and particularly relates to an automatic optimization method, medium and equipment for an automatic carrier driving controller based on Bayesian optimization.

Background

In recent years, with rapid improvement of the vehicle intelligence level, the related automatic driving technology is vigorously developed, a controller provided with a control algorithm is one of necessary modules of an automatic driving vehicle system, the controller can effectively control the vehicle to track a reference track to run forward, in general, the design of the control algorithm needs to model vehicle dynamics and construct according to the model, in the process, the model needs to be linearized and discretized, and therefore, the designed control algorithm needs to adjust and calibrate a plurality of parameters of the model, so that the performance of the control algorithm meets the requirement. In the past, the adjustment of control algorithm parameters is carried out by methods such as manual adjustment or grid search, and the efficiency is low, and the parameter adjustment space is limited, so that the optimal control performance cannot be achieved.

To optimize the performance of the control algorithm and improve the efficiency of the optimization process, researchers have conducted some related research and exploration. Marco et al published articles IEEE International Conference on Robotics and Automation,2016 "Automatic LQR Tuning Based on Gaussian Process Global Optimization, propose an automatic LQR controller optimizing method based on Bayesian process, and the Bayesian optimization uses entropy search as a proxy function, so that the optimal parameter set of the LQR controller can be automatically, efficiently and rapidly searched; su, jie et al published in IEEE Transactions on Vehicular Technology,2018, paper "Autonomous vehicle control through the dynamics and controller learning", further consider LQR controller performance optimization for Gaussian process Bayesian optimization, and design a time-varying lower confidence boundary function as a proxy function of Bayesian optimization for time-varying characteristics of system operation, so that the method has better applicability to time-varying characteristic scenes of vehicles; riboni, a. Et al, issued in nature,2022, science report "Bayesian optimization and deep learning for steering wheel angle prediction," used LSTM to design controllers for backbone networks for steering control of autonomous vehicles, and bayesian optimization as a controller parameter for automatic optimization searching.

The above research results can improve the performance optimization efficiency of the controller to a certain extent, but these methods still have certain limitations, such as: the research results all consider single-process serialization decision examples, so that the design of the Bayesian optimization flow is performed by using the analyzed proxy function, and parallelization cannot be achieved. The deep learning control algorithm designed by Riboni, a. Et al, as described above, has numerous parameters to be adjusted, which can make the space dimension of the bayesian optimized parameter set very high; in addition, the value space of part of the control parameters has compact continuity characteristics, so that the number of parameter groups to be searched is increased sharply, and the method brings further challenges for optimizing the efficiency of the search task.

Disclosure of Invention

Aiming at the problems, the main purpose of the invention is to design an automatic optimization method, medium and equipment of a carrier automatic driving controller based on Bayesian optimization, which consider using expected improvement functions of multi-batch parallelization as proxy functions of Bayesian optimization and bring a better solution for optimizing search of an automatic driving control algorithm.

The invention adopts the following technical scheme for realizing the purposes:

an automatic driving carrier controller optimizing method based on Bayesian optimization comprises the following steps:

s1: initializing a sample dataset

S2: for dataCollection set

Modeling using a proxy model;

establishing a Bayesian optimization proxy function, and circulating the following steps:

s21: obtaining posterior distribution mean and variance through agent model regression;

s22: obtaining a parameter set to be evaluated by Bayesian optimization proxy function;

s23: verifying the obtained parameter set to be evaluated on the carrier vehicle, and amplifying the data of the parameter set to a sampling data set

S3: and (3) if the parameter set to be evaluated reaches the termination condition, exiting the circulation step of S2, ending and obtaining the index parameters required by the controller.

As a further description of the invention, S1: modeling the performance index of the controller to be evaluated to obtain an evaluation function; according to the reachable domain of the parameter set to be optimized in the index, selecting n feasible combinations as the parameter set to be evaluated, carrying out a controller performance effect experiment on the carrier vehicle through the parameter to be evaluated, and collecting an effect index data set

S2: set S1 parameter set to be evaluated X= { θ ₁ ，…，θ _n As input, the effect index set corresponding to the parameter set to be evaluated

As an output, use agent model for effect index dataset +.>

Is set up; wherein θ _i ，i∈[1，n]Representing the set of parameters evaluated ∈>

Representing a controller performance effect value; establishing a Bayesian optimization proxy function, and then circulating the following steps:

s21, obtaining an effect index data set through agent model regression

Posterior distribution and variance of (a);

s22, substituting the mean function and the variance function of the posterior distribution obtained in the S21 into a Bayesian optimization proxy function to obtain a recommended solution theta predicted by the proxy function _n+1 .；

S23, performing a controller performance effect experiment on the carrier vehicle by using the parameter set represented by the recommended solution obtained in the S22, and collecting an effect index

And amplifying the set of data by the existing data set +.>

S3: and when the difference between the effect index and the ideal index of the controller is smaller than a set threshold value or the difference between the posterior distribution mean value and the set threshold value is smaller than the set threshold value, the circulation step is exited, and the obtained recommended solution is the index parameter obtained by the controller.

As a further description of the present invention, the modeling method of the evaluation function of the vehicle system control performance index in S1 is as follows:

wherein,,

representing the control performance of the parameter θ fitted to the system, representing a weighted fusion of control accuracy, vehicle safety and control cost, +.>

Representing variance as +.>

Gaussian distributed noise,/, of (2)>

And the corresponding noisy evaluation result obtained by measurement after the parameter group theta is matched with the system is represented.

As a further description of the present invention, the reachable domain space of the parameter set to be optimized in S1 is a mixed space, where the mixed space includes a discrete space and a continuous space, a part of the parameter set to be optimized belongs to the discrete space, and a part of the parameter set belongs to the continuous space.

As a further description of the present invention, S2 the proxy model is a gaussian process, which is completely described by a mean function μ (X) and a covariance function K (X, X);

the mean function μ (X) is:

wherein ψ (X) represents a polynomial function of order p, α _p Coefficients representing the corresponding orders, C being a constant;

the covariance function K (X, X) is:

wherein the kernel function k (θ _i ，θ _j )，i∈[1，n]，j∈[1，n]The complete form is as follows:

wherein, the diagonal array

Representing length-stretching hyper-parameters lambda _i ，i∈[1，n]Representing the parameter θ _i Corresponding telescoping parameters;

the objective function of the proxy model regression is the logarithm of the edge likelihood distribution as follows:

wherein,,

representing likelihood distribution +.>

Representing all gaussian process related hyper-parameter sets.

As a further description of the present invention, the bayesian-optimized proxy function is modeled as:

wherein AC (X) represents a proxy function, N represents Monte Carlo sampling points, i represents a current sampling point sequence number, q represents the total number of parallelization batch, and j represents a current batch; x= { X ₁ ，…，X _q The parameter set is divided into q batches, where X _q A set of parameter sets representing the q-th lot;

representation of posterior mean function->

The j-th lot, L (X) is Gaussian process posterior distribution covariance +.>

The result of the cholesky decomposition is: it satisfies->

Representing a standard normal distribution sample, +.>

Represents the optimal value min Y observed by the current dataset.

As a further description of the present invention, the posterior mean function

The calculation method of (1) is as follows:

wherein I represents a unit array,

the posterior distribution covariance

The calculation method of (2) is as follows:

as a further description of the present invention, the bayesian proxy function also includes a maximum operation that depends on whether the demand for the objective function is minimized or maximized;

minimizing the demand leads to the proxy function having the minimizing operation, and maximizing the demand leads to the proxy function having the maximizing operation.

A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the bayesian-optimization-based method of optimizing an autopilot controller.

An automatic driving carrier controller optimizing device based on Bayesian optimization comprises a memory for storing a computer program and a processor, wherein the processor realizes the automatic driving carrier controller optimizing method based on Bayesian optimization when executing the computer program.

Compared with the prior art, the invention has the technical effects that:

the invention provides a carrier automatic driving controller automatic optimization method, medium and equipment based on Bayesian optimization, which uses Bayesian optimization to automatically optimize the performance of the carrier automatic driving controller, replaces manual parameter adjustment and grid parameter adjustment with original redundancy and low efficiency, has definite practical significance, improves and promotes the analytic proxy function of the Bayesian optimization by using a batch parallelization technology, improves the efficiency of the carrier automatic driving controller performance optimization, and has obvious technical advancement and practicability.

Drawings

FIG. 1 is a flow chart of an autopilot vehicle controller optimization method based on Bayesian optimization;

FIG. 2 is a schematic diagram of a controller performance change corresponding to a candidate parameter set in a Bayesian optimization operation process;

FIG. 3 is a schematic diagram of LQR controller trace effects using Bayesian optimization resulting in parameter set design.

Detailed Description

The invention is described in detail below with reference to the attached drawing figures:

an automatic driving carrier controller optimizing method based on Bayesian optimization, referring to fig. 1-3, comprises the following steps:

s1: initializing a sample dataset

S2: for data sets

Modeling using a proxy model;

Specifically, this embodiment specifically describes the steps described above:

s1: modeling the performance index of the controller to be evaluated to obtain an evaluation function; according to the reachable domain of the parameter set to be optimized in the index, selecting n feasible combinations as the parameter set to be evaluated, performing a controller performance effect experiment on the carrier vehicle on the parameter to be evaluated, and collecting an effect index data set

The modeling mode of the evaluation function of the vehicle system control performance index is as follows:

wherein,,

representing the control performance of the parameter theta incorporated into the system, representingWeighted fusion of control accuracy, vehicle safety and control cost, +.>

Representing variance as +.>

Gaussian distributed noise,/, of (2)>

As an output, use agent model for effect index dataset +.>

Is set up;

wherein θ _i ，i∈[1，n]Representing the set of parameters that have been evaluated,

representing a controller performance effect value;

in this embodiment, the proxy model is preferably set as a gaussian process, but is not limited to a gaussian process, and the gaussian process is completely described by a mean function μ (X) and a covariance function K (X, X);

the mean function μ (X) is:

the covariance function K (X, X) is:

wherein, the diagonal array

Representing length-stretching hyper-parameters lambda _i ，i∈[1，n]Representing the parameter θ _i Corresponding telescoping parameters.

The objective function of the gaussian process regression is the logarithm of the edge likelihood distribution as follows:

wherein,,

representing likelihood distribution +.>

Representing all gaussian process related hyper-parameter sets.

Further, a bayesian optimized proxy function is established, and in this embodiment, as a preference, the bayesian optimized proxy function is modeled as:

wherein AC (X) represents a proxy function, N represents Monte Carlo sampling points, i represents a current sampling point sequence number, q represents total parallelization batch times, and j represents a tableShowing the current batch; x= { X ₁ ，…，X _q The parameter set is divided into q batches, where X _q A set of parameter sets representing the q-th lot;

representation of posterior mean function->

The j-th lot, L (X) is Gaussian process posterior distribution covariance +.>

The result of the cholesky decomposition is: it satisfies->

Representing a standard normal distribution sample, +.>

Represents the optimal value observed by the current data set, namely min Y.

After the Bayesian optimization proxy function is determined, the following steps are circulated for the effect experiment of the parameter set to be evaluated:

s21, obtaining an effect index data set through agent model regression

Posterior distribution and variance of (a);

the posterior mean function

The calculation method of (1) is as follows:

wherein I represents a unit array,

the posterior distribution covariance

The calculation method of (2) is as follows:

And amplifying the set of data into the effect index data set of the existing S2 +.>

S3: when the difference between the effect index and the ideal index of the controller is smaller than a set threshold value or the difference between the posterior distribution mean value and the set threshold value is smaller than the set threshold value, the circulation step of S2 is exited, and the obtained recommended solution is the index parameter solved by the controller;

it should be noted that, the present embodiment is not limited to the type of the autopilot controller, and may be used for autopilot controllers with various parameter optimization requirements.

In one embodiment, the vehicle is typically modeled using a bicycle model (bicycle model) and linearized, discretized, to the form:

z _k+1 ＝Az _k +Bu _k ， (1)

wherein the method comprises the steps of

Representing a system state vector, e representing a track trace lateral offset error, d_e representing a derivative of the track trace lateral inexpensive error, th_e representing an angle offset error of the track trace, d_th_e representing a derivative of the track trace angle offset error, delta_v representing a difference between the current speed and the planned speed. />

The system control vector, delta, represents steering angle, acc represents longitudinal acceleration.

Matrices a and B are shown below, respectively:

where dt represents a discrete time step and v represents vehicle speed.

The control performance objective function is modeled as follows:

wherein the method comprises the steps of

The expected value is expressed, and M is the experiment number. The optimization objective of the infinite time domain needs to be approximated in a limited way, and what is expected to correspond to is a noisy estimate, which we approximate characterize with the following functions:

wherein the method comprises the steps of

Representing variance as +.>

Gaussian distributed noise of (c). The designed controller is a linear quadratic regulator controller (Linear Quadratic Regulator, LQR), and Q and R respectively represent a state weight matrix and a control weight matrix. Consider the state weight parameters Q [0, 0] corresponding to the position error term]＝θ[0，0]. Corresponding control weight parameter of corresponding control quantity is R0, 0]＝θ[0，1]。

The expression of the LQR controller is as follows,

u _k ＝-F _θ z _k ， (5)

wherein F is _θ The manner of calculation of (c) is as follows,

wherein F is _θ Is a solution to the following algebraic licarpae equation:

the LQR controller is used to perform trajectory tracking control. The reference track for track tracking is obtained by using the following cubic spline interpolation function:

where pos denotes the reference position, h ₁ ，…，h _m+1 Representing a total of m +1 reference anchor points. a, a ₁ ，b ₁ ，c ₁ ，d ₁ ，…，a _m ，b _m ，c _m ，d _m Is the corresponding coefficient. In this example, the reference track anchor lateral position is set to [0.0,6.0,12.5,10.0,17.5,20.0,25.0 ]]The longitudinal position is set to be [0.0, -3.0, -5.0,6.5,3.0,0.0,0.0 ]]。

Generation optimized for bayesianConstruction of initial data set by reason model

In the present embodiment, n θ are taken to constitute x= { θ ₁ ，…，θ _n -a }; the parameters are respectively substituted into the LQR controllers to track, so that a controller effect evaluation set +.>

In the present embodiment, two parameters θ are set to [0.0001,0.001,0.01,0.1,1, 10, 100, 1000, respectively]Therefore, n=64.

Entering a Bayesian optimization main loop.

First, a data set is obtained using Gaussian process regression

Posterior distribution of (c). Without loss of generality, the prior mean function of the Gaussian process is taken as zero mean, and the prior covariance function calculation is carried out by taking the first n-1 points as follows:

the kernel function k (θ _i ，θ _j )，i∈[1，n-1]，j∈[1，n-1]In its complete form, it is,

wherein, the diagonal array

Representing length-stretching hyper-parameters lambda _i ，i∈[1，n-1]Representing the parameter θ _i Corresponding telescoping parameters.

The super parameters are all obtained by minimizing the logarithm of edge likelihood:

wherein the method comprises the steps of

Representing likelihood distribution +.>

Representing all gaussian process related hyper-parameter sets.

The posterior mean function

The calculation method of (2) is as follows:

wherein I represents a unit array,

the posterior distribution covariance

The calculation method of (2) is as follows:

secondly, obtaining the next point to be evaluated recommended by Bayesian optimization by using the following proxy function:

wherein AC (X) represents a proxy function, N represents Monte Carlo miningThe number of samples, i, represents the current sampling point sequence number, q represents the total number of parallelization batches, and j represents the current batch. X= { X ₁ ，…，X _q The parameter set is divided into q batches, where X _q The set of parameter sets representing the q-th lot.

Representation of posterior mean function->

The j-th lot, L (X) is Gaussian process posterior distribution covariance +.>

The result of the Geolis decomposition, i.e. it satisfies +.>

Representing a standard normal distribution sample, +.>

Represents the optimal value observed by the current data set, namely min Y.

Third step, theta is calculated _n+1 Performing controller performance effect experiment on carrier vehicle, and collecting effect index

And amplifying the set of data to the existing data set +.>

And updates the bayesian optimized posterior distribution.

And when the difference between the performance index and the ideal index of the controller or the difference between the posterior distribution mean value of the proxy model and the set threshold value is smaller than the set threshold value, exiting the Bayesian optimization main loop to obtain the solved.

The above algorithm is deployed on a computer medium and device. In this embodiment, the computer medium is a notebook computer, its hardware is configured as cpu i5-10210u,16g memory, and its software is configured as windows 10 operating system, and python 3.9.6,pytorch 1.12.1,gpytorch 1.9.0,botorch 0.7.2,numpy 1.23.3,matplotlib 3.6.0 is deployed.

The program operating parameters are configured to: the wheel diameter of the vehicle is 0.5m, and the maximum turning angle is 45 degrees. The dynamic model discrete sampling time was 0.1s. The gaussian process model is a single-task gaussian process, and the number of initialized samples is set to 16. The bayesian optimization is tried three times in total, each time an attempt is made to search 16 rounds, q of the proxy function qEI is set to 1, the number of monte carlo sample samples is set to 64, and the search boundaries are set to [0.0001,100].

As shown in fig. 2, in the process of automatically searching the optimal parameter set obtained by the method of the present embodiment (i.e., the automatic optimization method of the carrier autopilot control algorithm based on bayesian optimization described in S1-S3), the objective function (i.e., the position error tracked by the trajectory of the LQR controller) changes schematically, and the result shows that the parameter set of the controller can be effectively and automatically optimized by applying the method of the present embodiment.

As shown in fig. 3, the trajectory tracking control effect of the LQR controller designed for the parameters obtained by the method of this embodiment (i.e., the automatic optimization method of the vehicle autopilot control algorithm based on bayesian optimization described in S1 to S3 above) is performed.

In addition, in other embodiments, the present invention may also provide a bayesian-optimized-based autopilot controller optimization apparatus including a memory and a processor;

the memory is used for storing a computer program;

the processor is configured to implement the automatic optimization method of the vehicle autopilot controller based on bayesian optimization as described in S1 to S3 above when executing the computer program.

In addition, in other embodiments, the present invention may further provide a computer readable storage medium, where a computer program is stored, where the computer program, when executed by a processor, can implement the automatic optimization method for a vehicle autopilot controller based on bayesian optimization as described in S1 to S3 above.

It should be noted that the Memory may include a random access Memory (Random Access Memory, RAM) or a Non-Volatile Memory (NVM), such as at least one magnetic disk Memory. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a neural network processor (Neural Processor Unit, NPU), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. Of course, the apparatus should also have necessary components to implement the program operation, such as a power supply, a communication bus, and the like.

The above embodiments are only for illustrating the technical solution of the present invention, but not for limiting, and other modifications and equivalents thereof by those skilled in the art should be included in the scope of the claims of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. An automatic driving carrier controller optimizing method based on Bayesian optimization is characterized by comprising the following steps:

s1: initializing a sample dataset

S2: for data sets

Modeling using a proxy model;

S3: if the parameter set to be evaluated reaches the termination condition, the step S2 of circulation is exited, and the index parameters obtained by the controller are ended;

the bayesian-optimized proxy function in S22 is modeled as:

representation of posterior mean function->

The j-th lot, L (x) is Gaussian process posterior distribution covariance +.>

The product is obtained by the decomposition of the Gerlichia group; />

Representing a standard normal distribution sample, +.>

Representing an optimal value minY obtained by observation of the current data set;

the posterior mean function

The calculation method of (1) is as follows:

wherein I represents a unit array, X _1：n-1 ＝{θ ₁ ，…，θ _n-1 -representing the set of parameters to be evaluated, θ ₁ ，…，θ _n Representing the set of parameters evaluated; k (θ) _n ，X _1：n-1 )＝[k(θ ₁ ，θ _n )，…，k(θ _n-1 ，θ _n )] ^T Represents θ _n And X _1：n-1 Is a covariance function of (2);

indicating the effect index corresponding to the evaluated parameter set, < >>

Representing a controller performance effect value; />

Representing the noise variance;

the posterior distribution covariance

The calculation method of (2) is as follows:

2. a bayesian-optimization-based method for optimizing an autopilot controller according to claim 1, wherein:

s1: modeling the performance index of the controller to be evaluated to obtain an evaluation function; according to the indexSelecting n feasible combinations as parameter set to be evaluated, performing controller performance effect experiment on a carrier vehicle through the parameter to be evaluated, and collecting effect index data set

As an output, use agent model for effect index dataset +.>

Representing a controller performance effect value;

establishing a Bayesian optimization proxy function, and then circulating the following steps:

s21, obtaining an effect index data set through agent model regression

Posterior distribution and variance of (a);

s22, substituting the mean function and the variance function of the posterior distribution obtained in the S21 into a Bayesian optimization proxy function to obtain a recommended solution theta predicted by the proxy function _n+1 ；

And amplifying the set of data by the existing data set +.>

3. A bayesian-optimization-based method for optimizing an autopilot controller according to claim 2, wherein: s1, modeling a vehicle system control performance index evaluation function by the following steps:

wherein,,

Representing variance as +.>

Gaussian distributed noise,/, of (2)>

4. A bayesian-optimization-based method for optimizing an autopilot controller according to claim 2, wherein: the reachable domain space of the parameter set to be optimized in the S1 is a mixed space, the mixed space comprises a discrete space and a continuous space, a part of the parameter set to be optimized belongs to the discrete space, and a part of the parameter set belongs to the continuous space.

5. A bayesian-optimization-based method for optimizing an autopilot controller according to claim 2, wherein: s2, the agent model is a Gaussian process, and the Gaussian process is completely described by a mean function mu (X) and a covariance function K (X, X);

the mean function μ (X) is:

the covariance function K (X, X) is:

wherein, the diagonal array

wherein,,

representing likelihood distribution +.>

Representing all gaussian process related hyper-parameter sets.

6. A bayesian-optimization-based method for optimizing an autopilot controller according to claim 1, wherein: the Bayesian optimization proxy function also includes a maximum operation that depends on whether the demand for the objective function is minimized or maximized;

7. A computer-readable storage medium, characterized by: the storage medium has stored thereon a computer program which, when executed by a processor, implements a bayesian-optimization-based autopilot controller optimization method according to any one of claims 1-6.

8. An automatic driving carrier controller optimizing device based on bayesian optimization, which is characterized in that: comprising a memory for storing a computer program and a processor, which processor, when executing the computer program, implements a bayesian-optimization-based autopilot controller optimization method according to any one of claims 1-6.