CN114442630B - Intelligent vehicle planning control method based on reinforcement learning and model prediction - Google Patents

Intelligent vehicle planning control method based on reinforcement learning and model prediction Download PDF

Info

Publication number
CN114442630B
CN114442630B CN202210088325.4A CN202210088325A CN114442630B CN 114442630 B CN114442630 B CN 114442630B CN 202210088325 A CN202210088325 A CN 202210088325A CN 114442630 B CN114442630 B CN 114442630B
Authority
CN
China
Prior art keywords
vehicle
intelligent vehicle
intelligent
potential field
planning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210088325.4A
Other languages
Chinese (zh)
Other versions
CN114442630A (en
Inventor
陈剑
戚子恒
王通
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210088325.4A priority Critical patent/CN114442630B/en
Publication of CN114442630A publication Critical patent/CN114442630A/en
Application granted granted Critical
Publication of CN114442630B publication Critical patent/CN114442630B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • G05D1/0238Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors
    • G05D1/024Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors in combination with a laser
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0214Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0257Control of position or course in two dimensions specially adapted to land vehicles using a radar
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
    • G05D1/0278Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle using satellite positioning signals, e.g. GPS
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Optics & Photonics (AREA)
  • Electromagnetism (AREA)
  • Traffic Control Systems (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses an intelligent vehicle planning control method based on reinforcement learning and model prediction. Comprising the following steps: acquiring and calculating by using a vehicle-mounted laser radar sensor to obtain road boundary information and barrier information under a vehicle body coordinate system; acquiring and calculating by using a vehicle-mounted GPS sensor to obtain a global reference road point under a vehicle body coordinate system; building a virtual scene where the intelligent vehicle is located; under a virtual scene of the intelligent vehicle, carrying out path planning on the intelligent vehicle by utilizing a path generating module based on road boundary information, barrier information and global reference road points under a vehicle body coordinate system to obtain a planned path of the intelligent vehicle; and tracking the planning path of the intelligent vehicle by using a tracking control module, thereby realizing the planning control of the intelligent vehicle. According to the invention, the network training of the planning part is promoted, the path planning effect of the intelligent vehicle when the intelligent vehicle is positioned at an irregular time is ensured, and the stability and the comfort of the vehicle body movement are promoted.

Description

Intelligent vehicle planning control method based on reinforcement learning and model prediction
Technical Field
The invention belongs to an intelligent vehicle planning control method in the field of intelligent vehicle automatic driving, and particularly relates to an intelligent vehicle planning control method based on reinforcement learning and model prediction in a weak GPS environment.
Background
With the recent development of economy and the improvement of the technical level of the automobile industry, the automobile maintenance amount is continuously increased, and the problems of traffic accidents, traffic jams, exhaust emission, drowsiness of drivers and the like are aggravated. The unmanned automobile has the advantages of energy conservation, environmental protection, comfort, high efficiency and the like, is an important trend of automobile development in the future, and is highly valued in all countries of the world.
Path planning and tracking control are key technologies for autopilot. For the path planning module, the planning effect is seriously dependent on a high-precision map and a high-precision positioning device. Compared with the traditional electronic map with the accuracy of meter level, the high-accuracy map with the centimeter level can more truly show the details of the number, the shape, the width and the like of lanes of the road, and help the intelligent vehicle to plan and decide more accurately. However, the processes of information collection, quality detection, operation and maintenance and the like in the process of high-precision map making make the drawing and maintenance of the map expensive. Meanwhile, because GPS signals are easy to be positioned inaccurately or lost due to weather, high buildings, tunnels and the like, high-precision positioning equipment is often required to be provided with high-cost IMU equipment for auxiliary positioning, and great barriers are brought to popularization and popularization of intelligent vehicles. The difficulty with tracking control modules is how to deal with the non-linear nature of the vehicle system and with the constraint problems in the state variables and manipulated variables while tracking the path. Meanwhile, since errors are also easily introduced when the sensor senses the motion state of the vehicle body, the robustness of the controller under the error interference needs to be ensured.
In recent years, reinforcement learning has been greatly successful in fields such as image recognition, voice recognition, robotics, and the like. Q learning is developed from reinforcement learning. In Q learning, there is a body with states and corresponding actions. At any time, the agent is in some feasible state. In the next time step, the state is converted by performing certain operations. This action is accompanied by rewards or penalties. The goal of the agent is to maximize the rewards benefits. The algorithm can interact with its environment through constant trial and error in an initially unknown environment, which directs the vehicle to take action continuously so that its return from the environment is maximized, and then find a collision-free path to avoid the obstacle.
DDPG (Deep Deterministic Policy Gradient) uses the network structure of the Actor-Critic and adopts the method of an experience playback pool in DQN (Deep Q Network) algorithm to establish a database named experience pool for storing the interaction data of the agent and the environment. During training, the intelligent agent can randomly select training data from the experience pool to train the neural network, so that the correlation of the training data in time is prevented, and the training efficiency and the sample utilization rate are effectively improved.
Model predictive control (Model Predictive Control, MPC) has found wide application in industrial systems as an effective method to conveniently address the problem of multi-variable constraint control. In recent years, MPC has expanded to the problem of moving body tracking control, and can accomplish the preset objective in a suboptimal manner on the basis of satisfying the constraint conditions of the system. In this control scheme, the control sequence is recalculated at each sampling time, minimizing the cost function under the input state constraints. After the first control input of the sequence is applied to the system, the online optimization problem is repeated at the next time step according to the latest system state.
Disclosure of Invention
In order to solve the problem of inaccurate positioning of the intelligent vehicle in the background technology, the invention provides an intelligent vehicle planning control method based on reinforcement learning and model prediction, which improves the existing planning and control algorithm so as to improve the stability and comfort of the intelligent vehicle when the positioning is inaccurate.
The technical scheme adopted by the invention is as follows:
the invention comprises the following steps:
step 1: obtaining an obstacle grid map through a vehicle-mounted laser radar sensor, determining road boundary information and obstacle information around a vehicle body under a laser radar sensor coordinate system based on the obstacle grid map, and then obtaining the road boundary information and the obstacle information under the vehicle body coordinate system after coordinate conversion;
step 2: acquiring global reference road points under a vehicle-mounted GPS sensor coordinate system by using a vehicle-mounted GPS sensor, acquiring vehicle body positioning and motion states by using the vehicle-mounted GPS sensor, and finally carrying out coordinate conversion on the global reference road points based on the vehicle body positioning and the motion states to acquire global reference road points under the vehicle body coordinate system;
step 3: constructing a virtual scene where the intelligent vehicle is located by the barrier grid map and the global reference road points;
step 4: under a virtual scene of the intelligent vehicle, carrying out path planning on the intelligent vehicle by utilizing a path generating module based on road boundary information, barrier information and global reference road points under a vehicle body coordinate system to obtain a planned path of the intelligent vehicle;
step 5: and tracking the planning path of the intelligent vehicle by using a tracking control module, thereby realizing the planning control of the intelligent vehicle.
The path generation module in the step 4 is obtained through training of the following steps:
s1: the training phase of the reinforcement learning agent based on the DDPG is divided into an initial phase, an intermediate phase and a final phase in sequence; the first state space input in the initial stage consists of the distance from the intelligent vehicle to the left and right boundaries of the road and the position of an accurate global reference road point in a vehicle body coordinate system, the second state space input in the middle stage consists of the first state space and the position of the nearest obstacle in front of the intelligent vehicle in the vehicle body coordinate system, and the third state space input in the final stage consists of the distance from the intelligent vehicle to the left and right boundaries of the road, the position of the nearest obstacle in front of the intelligent vehicle in the vehicle body coordinate system and the position of an inaccurate reference road point in the vehicle body coordinate system;
s2: constructing an action space which is the front wheel rotation angle delta of the intelligent vehicle f
S3: and training the reinforcement learning agent based on the DDPG by forming a training set by the action space and different state spaces, setting a reward and punishment value, and supervising the training process to obtain the trained reinforcement learning agent.
The reward and punishment value comprises a reward value R reaching the end point arrive Punishment value R for intelligent vehicle collision collision And an intermediate state punishment value R temp
Punishment value R of the intermediate state temp The method is obtained by calculation through the following steps:
a1: respectively distributing corresponding potential field functions for road boundaries, barriers and global reference road points in each training stage by utilizing a potential field method;
a2: respectively calculating corresponding road boundary potential fields P according to the three potential field functions R Obstacle potential field P O And an accurate global reference waypoint potential field P W And an inaccurate global reference waypoint potential field P W′ After the corresponding potential fields in the training phase are overlapped, the total potential field P of the current training phase is obtained U And as a punishment value R of the intermediate state temp
A3: in the training process, according to the total potential field P U Setting potential field parameters of all potential field functions of each training stage in A1 by using a path planning method based on a potential field method, updating the total potential field of each training stage according to the set potential field function, and taking the updated total potential field as a punishment value R of the intermediate state of each training stage temp
In the tracking control module of the step 5, firstly, a vehicle dynamics model is built according to the intelligent vehicle, and then a prediction equation of the vehicle state is built based on the vehicle dynamics model;
then, according to a prediction equation of the vehicle state, a model prediction control algorithm is utilized to establish a target optimization function and constraint conditions, and then a path tracking controller is established;
and finally, tracking a planned path of the intelligent vehicle by using a path tracking controller, thereby realizing the planning control of the intelligent vehicle.
The target optimization function is as follows:
the constraint conditions of the target optimization function are as follows:
Δu min ≤Δu(k|t)≤Δu max
u min ≤u(k|t)≤u max
y min ≤y(k|t)≤y max
β min ≤β(k|t)≤β max
k=t,…,t+N p -1
y(t+N p |t)-r(t+N p |t)∈Ω
wherein min is U(t) J represents that in the prediction time domain corresponding to the time t, when the intelligent vehicleWhen the target optimization value is minimum, the operation of collecting the control quantity of the front wheel corner of the vehicle is taken; j represents a target optimized value of the intelligent vehicle, and U (t) represents a control quantity set of the front wheel corner of the vehicle in a prediction time domain corresponding to the moment t;representing the operation of calculating the norm square based on the first weight matrix Q,/>Representing the operation of calculating the norm square based on the second weight matrix R ++>Representing an operation of calculating a norm square based on the third weight matrix P, y (t+i|t) represents a predicted value of the i-th vehicle state yaw angle and lateral position at the time t, r (t+i|t) represents a predicted value of the i-th vehicle state yaw angle and lateral position at the time t, u (t+i|t) represents an i-th control amount at the time t, y (t+n) p I t) represents the nth at time t p Predicted values of individual vehicle state yaw angle and lateral position, r (t+N) p I t) represents the nth at time t p Expected values of individual vehicle state yaw angle and lateral position, N p For predicting time domain, Q, R, P are the first, second and third weight coefficients, deltau max Right limit increment of the front wheel corner of the vehicle; deltau min The left limit increment of the front wheel corner of the vehicle; deltau (k|t) represents the control increment of the vehicle front wheel rotation angle at the k time at the current t time, u (k|t) is the control amount of the vehicle front wheel rotation angle at the k time at the current t time, u max The right limit position of the front wheel corner of the vehicle; u (u) min The left limit position of the front wheel corner of the vehicle; y (k|t) represents the vehicle state yaw angle and lateral position at k time at the current t time, y min Is the minimum value of the vehicle state yaw angle and lateral position; y is max Beta (k|t) represents the vehicle centroid slip angle at time k at the current time t, which is the maximum value of the vehicle state yaw angle and the lateral position; beta min And beta max Respectively the vehicle mass center side deflection angleMinimum and maximum values, Ω represent terminal constraint fields.
And the terminal constraint domain in the target optimization function is subjected to linearization pretreatment.
The beneficial effects of the invention are as follows:
aiming at the scene of inaccurate positioning of an intelligent vehicle, the invention provides a planning control method, which comprises a path planning method based on DDPG reinforcement learning and a path tracking method based on model prediction control, namely a path generation module and a tracking control module.
In the path planning method, the path generation of the intelligent vehicle in the inaccurate positioning scene is realized based on the DDPG algorithm, and the safety and smoothness of the path are ensured. And the potential field method is used for improving the reward and punishment value of the DDPG, and the training phase is divided into an initial phase, an intermediate phase and a final phase, so that the convergence speed and the training efficiency of the algorithm are improved.
In the tracking control method, a path tracking controller is realized based on a model predictive control algorithm, and terminal cost and terminal constraint are added in a target optimization function, so that the stability and control precision of a control system are improved. And the terminal constraint domain is linearized, so that the real-time performance of the intelligent vehicle control system is ensured.
The planning control algorithm combining the path planning method and the tracking control method can smoothly complete obstacle avoidance in a scene of inaccurate positioning of the intelligent vehicle, safely complete navigation tasks along a designed path, and ensure smooth and stable track.
Drawings
Fig. 1 is a schematic diagram of an offset occurring at a collected reference waypoint.
Fig. 2 is a schematic diagram of misalignment of the vehicle body that offsets the reference waypoints.
Fig. 3 is a schematic diagram of a DDPG network structure.
FIG. 4 is a flow diagram of virtual environment path generation.
Fig. 5 is a kinematic model of a smart car.
FIG. 6 is a schematic diagram of path generation in a virtual environment.
Fig. 7 is a vehicle dynamics model.
FIG. 8 is a plot of a reward function for reinforcement learning training.
Fig. 9 is a flow of implementation of the planning control of the present invention.
FIG. 10 is a motion profile of a smart car when positioning is inaccurate.
FIG. 11 is a graph of centroid side bias angle variation for three methods when positioning is inaccurate.
Fig. 12 shows the lateral acceleration variation in three methods when the positioning step is accurate.
Detailed Description
The invention is further illustrated and described below in connection with specific embodiments. The technical features of the embodiments of the invention can be combined correspondingly on the premise of no mutual conflict.
As shown in fig. 9, the present invention includes the steps of:
step 1: the intelligent vehicle is provided with a laser radar sensor and a GPS sensor. Obtaining an obstacle grid map through a vehicle-mounted laser radar sensor, determining road boundary information and obstacle information around a vehicle body under a laser radar sensor coordinate system based on the obstacle grid map, and then obtaining the road boundary information and the obstacle information under the vehicle body coordinate system after coordinate conversion; the obstacle information is specifically the position of the nearest obstacle in front of the intelligent vehicle.
Step 2: acquiring global reference road points under a vehicle-mounted GPS sensor coordinate system by using a vehicle-mounted GPS sensor, acquiring vehicle body positioning and motion states by using the vehicle-mounted GPS sensor, and finally carrying out coordinate conversion on the global reference road points based on the vehicle body positioning and the motion states to acquire global reference road points under the vehicle body coordinate system; the signals of the vehicle-mounted GPS sensor are subject to environmental interference to shift, so that the acquired global reference waypoints shift, as shown in fig. 1. The signal of the vehicle-mounted GPS sensor is interfered to further cause inaccurate positioning of the vehicle body, so that the global reference road point under the coordinate system of the vehicle body is offset, as shown in fig. 2.
Step 3: constructing a virtual scene where the intelligent vehicle is located by the barrier grid map and the global reference road points;
step 4: as shown in fig. 4, in a virtual scene of the intelligent vehicle, a path generation module is utilized to plan the path of the intelligent vehicle based on road boundary information, barrier information and global reference road points in a vehicle body coordinate system, so as to obtain a planned path of the intelligent vehicle; the kinematic model of the intelligent vehicle is shown in fig. 5, and the generation of the planned path in the virtual environment is shown in fig. 6.
The path generation module in the step 4 is obtained through training of the following steps:
s1: the network structure of the reinforcement learning agent based on the DDPG is shown in figure 3, and the training phase of the reinforcement learning agent based on the DDPG is divided into an initial phase, an intermediate phase and a final phase in turn from simple to difficult according to training scenes; wherein the first state space input in the initial stage is the distance d from the intelligent vehicle to the left and right boundaries of the road l And d r And the position d of the accurate global reference road point in the vehicle body coordinate system wx And d wy The second state space input in the middle stage consists of the first state space and the position d of the nearest obstacle in front of the intelligent vehicle in the vehicle body coordinate system ox And d oy The third state space input in the final stage consists of the distance from the intelligent vehicle to the left and right boundaries of the road, the position of the nearest obstacle in front of the intelligent vehicle in the vehicle body coordinate system and the position d of the inaccurate reference road point in the vehicle body coordinate system wx′ And d wy′ Composition; i.e. the third state space s= { d l ,d r ,d ox ,d oy ,d wx′ ,d wy′ }。
S2: constructing an action space which is the front wheel rotation angle delta of the intelligent vehicle f
S3: training the reinforcement learning intelligent agent based on DDPG by forming a training set by the action space and different state spaces, setting a reward and punishment value and supervising the training process to obtain the trained reinforcement learning intelligent agent;
the prize and punishment value comprises a prize value R reaching the end point arrive Punishment value R for intelligent vehicle collision collision And an intermediate state punishment value R temp
Punishment value R of intermediate state temp The method is obtained by calculation through the following steps:
a1: respectively distributing corresponding potential field functions for road boundaries, barriers and global reference road points in each training stage by utilizing a potential field method;
a2: respectively calculating corresponding road boundary potential fields P according to the three potential field functions R Obstacle potential field P O And an accurate global reference waypoint potential field P W And an inaccurate global reference waypoint potential field P W′ After the corresponding potential fields in the training phase are overlapped, the total potential field P of the current training phase is obtained U And as a punishment value R of the intermediate state temp The method comprises the steps of carrying out a first treatment on the surface of the I.e. the punishment value R of the intermediate state of the final stage temp =P R +P O +P W′
The potential field function of the road boundary is:
wherein P is R (d l ,d r ) Is the boundary potential field of the road, a R Is the intensity parameter of the potential field, d s Is the safe distance from the intelligent vehicle to the road boundary.
The potential field function of an obstacle is:
wherein P is O (d ox ,d oy ) Is the potential field of an obstacle, a o And b o The intensity parameter and the shape parameter of the obstacle potential function, respectively. X is X s And Y s The safety distances between the vehicle and the obstacle are respectively represented, the longitudinal direction is the running direction of the intelligent vehicle, the direction perpendicular to the running direction of the intelligent vehicle is the transverse direction, and the longitudinal direction and the transverse direction are in the horizontal plane, and are defined as:
X s =X 0 -vT 0
Y s =Y 0 +(υsinθ eo sinθ e )T 0
wherein X is 0 And Y 0 Representing the minimum safe distance in the longitudinal and transverse directions, T 0 Is a safe time interval, v is the speed of the intelligent vehicle, v o Is the speed of the obstacle, theta e Is the heading angle deviation between the intelligent vehicle and the obstacle.
The potential field functions of the accurate and inaccurate global reference waypoints are the same, wherein the potential field functions of the global reference waypoints are as follows:
wherein P is W (d wy ) Is the accurate global reference waypoint potential field, d a Refers to the error range of the transverse position of the global reference road point, a w The potential field strength for the global reference waypoint.
A3: in the training process, according to the total potential field P U Setting potential field parameters of all potential field functions of each training stage in A1 by using a path planning method based on a potential field method, updating the total potential field of each training stage according to the set potential field function, and taking the updated total potential field as a punishment value R of the intermediate state of each training stage temp
Step 5: and tracking the planning path of the intelligent vehicle by using a tracking control module, thereby realizing the planning control of the intelligent vehicle.
In the tracking control module of step 5, firstly, a vehicle dynamics model is built according to the intelligent vehicle, and then a prediction equation of the vehicle state is built based on the vehicle dynamics model; the vehicle dynamics model is shown in fig. 7.
Then, according to a prediction equation of the vehicle state, a model prediction control algorithm is utilized to establish a target optimization function and constraint conditions with terminal constraint and terminal cost, and then a path tracking controller is established;
and finally, tracking a planned path of the intelligent vehicle by controlling the front wheel corner of the vehicle by using a path tracking controller, so as to realize the planning control of the intelligent vehicle.
The objective optimization function with terminal constraint and terminal cost is as follows:
the constraint conditions of the target optimization function are as follows:
Δu min ≤Δu(k|t)≤Δu max
u min ≤u(k|t)≤u max
y min ≤y(k|t)≤y max
β min ≤β(k|t)≤β max
k=t,…,t+N p -1
y(t+N p |t)-r(t+N p |t)∈Ω
wherein,the cost of the added terminal; y (t+N) p |t)-r(t+N p T) εΩ are the terminal constraints that are added. min U(t) J represents the operation of taking a control quantity set of the front wheel corner of the vehicle when the target optimization value of the intelligent vehicle is minimum in a prediction time domain corresponding to the t moment; j represents a target optimized value of the intelligent vehicle, reflects the requirements of path tracking errors and stable change of control quantity in a certain time domain in the future, and U (t) represents a control quantity set of the front wheel corner of the vehicle in a prediction time domain corresponding to the moment t; />Representing the operation of calculating the norm square based on the first weight matrix Q,representing the operation of calculating the norm square based on the second weight matrix R ++>Representing the calculation of norms based on the third weight matrix PSquare operation +.>Representing the operation of calculating the intelligent vehicle tracking error weight based on the first weight matrix Q at the ith moment at the t moment,/">Representing the operation of calculating the intelligent vehicle control stability weight based on the second weight matrix R at the ith moment at the t moment,/and the control stability weight of the intelligent vehicle>Indicating the N-th at time t p Operation of calculating the intelligent vehicle tracking error weight based on the third weight matrix P at each moment, +.>Reflecting the requirements for path tracking errors, < >>Reflecting the requirement for smooth variation of the control quantity, y (t+i|t) represents the predicted value of the ith vehicle state yaw angle and lateral position at time t, r (t+i|t) represents the predicted value of the ith vehicle state yaw angle and lateral position at time t, the predicted value of the vehicle state yaw angle and lateral position is obtained by the planned path of the intelligent vehicle, u (t+i|t) represents the ith control quantity at time t, y (t+n) p I t) represents the nth at time t p Predicted values of individual vehicle state yaw angle and lateral position, r (t+N) p I t) represents the nth at time t p Expected values of individual vehicle state yaw angle and lateral position, N p For predicting time domain, Q, R, P are the first, second and third weight coefficients, deltau max Right limit increment of the front wheel corner of the vehicle; deltau min The left limit increment of the front wheel corner of the vehicle; deltau (k|t) represents the control increment of the vehicle front wheel rotation angle at the k time at the current t time, u (k|t) is the control amount of the vehicle front wheel rotation angle at the k time at the current t time, u max Right limit for front wheel corner of vehicleA location; u (u) min The left limit position of the front wheel corner of the vehicle; y (k|t) represents the vehicle state yaw angle and lateral position at k time at the current t time, y min Is the minimum value of the vehicle state yaw angle and lateral position; y is max Beta (k|t) represents the vehicle centroid slip angle at time k at the current time t, which is the maximum value of the vehicle state yaw angle and the lateral position; beta min And beta max Respectively a minimum value and a maximum value of the vehicle mass center slip angle, and omega represents a terminal constraint domain.
The terminal constraint domain in the target optimization function is subjected to linearization pretreatment, so that the real-time performance of the control system is ensured.
In this embodiment, the training environment is a joint simulation of MATLAB/Simulink and Carsim. The network structure, the state space, the action space and the rewarding function of the reinforcement learning algorithm are designed in MATLAB/Simulink, and the vehicle model with high accuracy and high reality is obtained in Carsim.
After the potential field design is completed, setting the potential field parameters by using a path planning method of a potential field method. And if the planned path does not meet the safety requirement, adjusting the potential field parameters.
When the reinforcement learning training scene is set, the training scene is divided into three stages from simple to difficult. The initial stage only comprises road barriers and accurate reference road points; in the middle stage, adding an obstacle into the initial stage; in the final stage, inaccurate reference road points are added in the middle stage.
As shown in FIG. 8, the result of reinforcement learning training is that the method improves both the network training effect and the convergence rate of the conventional DDPG network.
The controller provided by the invention is tested under the double-shift-line working condition, noise is added into the yaw rate and the transverse speed, and the tracking effect is compared with that of the traditional model predictive control method. The Mean Absolute Error (MAE) of its tracking effect is given in the following table:
table 1: average absolute error of tracking effect (MAE)
As can be seen from Table 1, the tracking control method provided by the invention has improved tracking accuracy compared with the conventional model predictive control method when no error exists, yaw rate noise exists and transverse rate noise exists.
The path planning method and the tracking control method are combined and used for coping with the scene of inaccurate positioning of the vehicle body, and the implementation flow of the scene is shown in fig. 9. Fig. 10 is a comparison of planning control effects in a scene where a designed reference road point is inaccurate and a vehicle body positioning step is accurate, a frame work a is a planning control method proposed by the present invention, a frame work B is a conventional DDPG planning and pure tracking control method, and pf+mpc is a planning and model predictive control tracking method of a potential field method. Fig. 11 (a), (b), and (c) are respectively centroid slip angle value changes of three methods in order, and fig. 12 (a), (b), and (c) are respectively lateral acceleration changes of three methods in order, for representing stability and comfort of the track. Table 2 statistical analysis was performed on the experimental data.
Table 2: experimental results analysis Table of the invention and other methods
As can be seen from fig. 9, 10, 11, 12 and table 2, the planning control method designed by the invention can enable the intelligent vehicle to have a more comfortable and more stable motion state when the positioning is inaccurate.

Claims (4)

1. The intelligent vehicle planning control method based on reinforcement learning and model prediction is characterized by comprising the following steps of:
step 1: obtaining an obstacle grid map through a vehicle-mounted laser radar sensor, determining road boundary information and obstacle information around a vehicle body under a laser radar sensor coordinate system based on the obstacle grid map, and then obtaining the road boundary information and the obstacle information under the vehicle body coordinate system after coordinate conversion;
step 2: acquiring global reference road points under a vehicle-mounted GPS sensor coordinate system by using a vehicle-mounted GPS sensor, acquiring vehicle body positioning and motion states by using the vehicle-mounted GPS sensor, and finally carrying out coordinate conversion on the global reference road points based on the vehicle body positioning and the motion states to acquire global reference road points under the vehicle body coordinate system;
step 3: constructing a virtual scene where the intelligent vehicle is located by the barrier grid map and the global reference road points;
step 4: under a virtual scene of the intelligent vehicle, carrying out path planning on the intelligent vehicle by utilizing a path generating module based on road boundary information, barrier information and global reference road points under a vehicle body coordinate system to obtain a planned path of the intelligent vehicle;
step 5: tracking a planning path of the intelligent vehicle by utilizing a tracking control module, so as to realize planning control of the intelligent vehicle;
the path generation module in the step 4 is obtained through training of the following steps:
s1: the training phase of the reinforcement learning agent based on the DDPG is divided into an initial phase, an intermediate phase and a final phase in sequence; the first state space input in the initial stage consists of the distance from the intelligent vehicle to the left and right boundaries of the road and the position of an accurate global reference road point in a vehicle body coordinate system, the second state space input in the middle stage consists of the first state space and the position of the nearest obstacle in front of the intelligent vehicle in the vehicle body coordinate system, and the third state space input in the final stage consists of the distance from the intelligent vehicle to the left and right boundaries of the road, the position of the nearest obstacle in front of the intelligent vehicle in the vehicle body coordinate system and the position of an inaccurate reference road point in the vehicle body coordinate system;
s2: constructing an action space which is the front wheel rotation angle delta of the intelligent vehicle f
S3: training the reinforcement learning intelligent agent based on DDPG by forming a training set by the action space and different state spaces, setting a reward and punishment value and supervising the training process to obtain the trained reinforcement learning intelligent agent;
the prize and punishment valueIncluding the prize value R reaching the endpoint arrive Punishment value R for intelligent vehicle collision collision And an intermediate state punishment value R temp
Punishment value R of the intermediate state temp The method is obtained by calculation through the following steps:
a1: respectively distributing corresponding potential field functions for road boundaries, barriers and global reference road points in each training stage by utilizing a potential field method;
a2: respectively calculating corresponding road boundary potential fields P according to the three potential field functions R Obstacle potential field P O And an accurate global reference waypoint potential field P W And an inaccurate global reference waypoint potential field P W′ After the corresponding potential fields in the training phase are overlapped, the total potential field P of the current training phase is obtained U And as a punishment value R of the intermediate state temp
A3: in the training process, according to the total potential field P U Setting potential field parameters of all potential field functions of each training stage in A1 by using a path planning method based on a potential field method, updating the total potential field of each training stage according to the set potential field function, and taking the updated total potential field as a punishment value R of the intermediate state of each training stage temp
2. The intelligent vehicle planning control method based on reinforcement learning and model prediction according to claim 1, wherein the tracking control module in step 5 firstly establishes a vehicle dynamics model according to the intelligent vehicle, and then establishes a prediction equation of the vehicle state based on the vehicle dynamics model;
then, according to a prediction equation of the vehicle state, a model prediction control algorithm is utilized to establish a target optimization function and constraint conditions, and then a path tracking controller is established;
and finally, tracking a planned path of the intelligent vehicle by using a path tracking controller, thereby realizing the planning control of the intelligent vehicle.
3. The intelligent vehicle planning control method based on reinforcement learning and model prediction according to claim 2, wherein the objective optimization function is:
the constraint conditions of the target optimization function are as follows:
Δu min ≤Δu(k|t)≤Δu max
u min ≤u(k|t)≤u max
y min ≤y(k|t)≤y max
β min ≤β(k|t)≤β max
k=t,...,t+N p -1
y(t+N p |t)-r(t+N p |t)∈Ω
wherein,representing the operation of taking a control quantity set of the front wheel corner of the vehicle when the target optimization value of the intelligent vehicle is minimum in a prediction time domain corresponding to the t moment; j represents a target optimized value of the intelligent vehicle, and U (t) represents a control quantity set of the front wheel corner of the vehicle in a prediction time domain corresponding to the moment t; />Representing the operation of calculating the norm square based on the first weight matrix Q,representing the operation of calculating the norm square based on the second weight matrix R ++>Representing an operation of calculating a norm square based on the third weight matrix P, y (t+i|t) represents a predicted value of the i-th vehicle state yaw angle and lateral position at the time t, and r (t+i|t) represents the i-th vehicle state yaw angle at the time tAnd the expected value of the lateral position, u (t+i|t) represents the i-th control amount at the time t, y (t+N) p I t) represents the nth at time t p Predicted values of individual vehicle state yaw angle and lateral position, r (t+N) p I t) represents the nth at time t p Expected values of individual vehicle state yaw angle and lateral position, N p For predicting time domain, Q, R, P are the first, second and third weight coefficients, deltau max Right limit increment of the front wheel corner of the vehicle; deltau min The left limit increment of the front wheel corner of the vehicle; deltau (k|t) represents the control increment of the vehicle front wheel rotation angle at the k time at the current t time, u (k|t) is the control amount of the vehicle front wheel rotation angle at the k time at the current t time, u max The right limit position of the front wheel corner of the vehicle; u (u) min The left limit position of the front wheel corner of the vehicle; y (k|t) represents the vehicle state yaw angle and lateral position at k time at the current t time, y min Is the minimum value of the vehicle state yaw angle and lateral position; y is max Beta (k|t) represents the vehicle centroid slip angle at time k at the current time t, which is the maximum value of the vehicle state yaw angle and the lateral position; beta min And beta max Respectively a minimum value and a maximum value of the vehicle mass center slip angle, and omega represents a terminal constraint domain.
4. The intelligent vehicle planning control method based on reinforcement learning and model prediction according to claim 3, wherein a terminal constraint domain in the objective optimization function is subjected to linearization pretreatment.
CN202210088325.4A 2022-01-25 2022-01-25 Intelligent vehicle planning control method based on reinforcement learning and model prediction Active CN114442630B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210088325.4A CN114442630B (en) 2022-01-25 2022-01-25 Intelligent vehicle planning control method based on reinforcement learning and model prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210088325.4A CN114442630B (en) 2022-01-25 2022-01-25 Intelligent vehicle planning control method based on reinforcement learning and model prediction

Publications (2)

Publication Number Publication Date
CN114442630A CN114442630A (en) 2022-05-06
CN114442630B true CN114442630B (en) 2023-12-05

Family

ID=81368785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210088325.4A Active CN114442630B (en) 2022-01-25 2022-01-25 Intelligent vehicle planning control method based on reinforcement learning and model prediction

Country Status (1)

Country Link
CN (1) CN114442630B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114578834B (en) * 2022-05-09 2022-07-26 北京大学 Target layering double-perception domain-based reinforcement learning unmanned vehicle path planning method
CN115540896B (en) * 2022-12-06 2023-03-07 广汽埃安新能源汽车股份有限公司 Path planning method and device, electronic equipment and computer readable medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110794842A (en) * 2019-11-15 2020-02-14 北京邮电大学 Reinforced learning path planning algorithm based on potential field
CN112650237A (en) * 2020-12-21 2021-04-13 武汉理工大学 Ship path planning method and device based on clustering processing and artificial potential field
CN112666939A (en) * 2020-12-09 2021-04-16 深圳先进技术研究院 Robot path planning algorithm based on deep reinforcement learning
CN112799386A (en) * 2019-10-25 2021-05-14 中国科学院沈阳自动化研究所 Robot path planning method based on artificial potential field and reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112799386A (en) * 2019-10-25 2021-05-14 中国科学院沈阳自动化研究所 Robot path planning method based on artificial potential field and reinforcement learning
CN110794842A (en) * 2019-11-15 2020-02-14 北京邮电大学 Reinforced learning path planning algorithm based on potential field
CN112666939A (en) * 2020-12-09 2021-04-16 深圳先进技术研究院 Robot path planning algorithm based on deep reinforcement learning
CN112650237A (en) * 2020-12-21 2021-04-13 武汉理工大学 Ship path planning method and device based on clustering processing and artificial potential field

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
APF-DPPO: an automatic driving policy learning method based on the artificial potential field method ot optimize the reward function;junqiang lin;Machines;全文 *
基于前视声呐信息的AUV避碰规划研究;刘和祥;边信黔;秦政;王宏健;;***仿真学报(第24期);全文 *
基于强化学习的智能车低成本导航;王通;中国优秀硕士学位论文全文数据库工程科技Ⅱ辑;C035-484 *
王通.基于强化学习的智能车低成本导航.中国优秀硕士学位论文全文数据库工程科技Ⅱ辑.2021,C035-484. *
约束非完整移动机器人轨迹跟踪的非线性预测控制;韩光信;吉林大学学报(工学版);第177-181页 *

Also Published As

Publication number Publication date
CN114442630A (en) 2022-05-06

Similar Documents

Publication Publication Date Title
CN111845774B (en) Automatic driving automobile dynamic trajectory planning and tracking method based on transverse and longitudinal coordination
CN111289008B (en) Local path planning method for unmanned vehicle
CN110262495B (en) Control system and method capable of realizing autonomous navigation and accurate positioning of mobile robot
Weiskircher et al. Predictive guidance and control framework for (semi-) autonomous vehicles in public traffic
CN114442630B (en) Intelligent vehicle planning control method based on reinforcement learning and model prediction
CN113608531B (en) Unmanned vehicle real-time global path planning method based on safety A-guidance points
CN113848914B (en) Method for planning local path by collision coefficient artificial potential field method in dynamic environment
CN113276848A (en) Intelligent driving lane changing and obstacle avoiding track planning and tracking control method and system
CN111137298B (en) Vehicle automatic driving method, device, system and storage medium
CN112947469A (en) Automobile track-changing track planning and dynamic track tracking control method
CN113255998B (en) Expressway unmanned vehicle formation method based on multi-agent reinforcement learning
CN112577506A (en) Automatic driving local path planning method and system
CN112977478B (en) Vehicle control method and system
CN114942642A (en) Unmanned automobile track planning method
CN116337045A (en) High-speed map building navigation method based on karto and teb
CN115993825A (en) Unmanned vehicle cluster control system based on air-ground cooperation
Zhang et al. Structured road-oriented motion planning and tracking framework for active collision avoidance of autonomous vehicles
Kanchwala et al. Development of an intelligent transport system for EV
López et al. Efficient local navigation approach for autonomous driving vehicles
CN114715193A (en) Real-time trajectory planning method and system
KR102618247B1 (en) Device for correcting localization heading error in autonomous car and operating methdo thereof
Farag et al. MPC track follower for self-driving cars
Shin et al. Design of a vision-based autonomous path-tracking control system and experimental validation
CN113460091B (en) Unprotected crossroad unmanned vehicle rolling optimization decision method
Wang Control system design for autonomous vehicle path following and collision avoidance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant