CN115031753A - Driving condition local path planning method based on safety potential field and DQN algorithm - Google Patents
Driving condition local path planning method based on safety potential field and DQN algorithm Download PDFInfo
- Publication number
- CN115031753A CN115031753A CN202210650446.3A CN202210650446A CN115031753A CN 115031753 A CN115031753 A CN 115031753A CN 202210650446 A CN202210650446 A CN 202210650446A CN 115031753 A CN115031753 A CN 115031753A
- Authority
- CN
- China
- Prior art keywords
- vehicle
- parameter
- current vehicle
- potential field
- formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/26—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
- G01C21/34—Route searching; Route guidance
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/26—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
- G01C21/34—Route searching; Route guidance
- G01C21/3446—Details of route searching algorithms, e.g. Dijkstra, A*, arc-flags, using precalculated routes
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/38—Electronic maps specially adapted for navigation; Updating thereof
- G01C21/3804—Creation or updating of map data
- G01C21/3807—Creation or updating of map data characterised by the type of data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention discloses a driving condition local path planning method based on a safety potential field and a DQN algorithm, which comprises the following steps: 1. acquiring surrounding environment information of the vehicle state information; 2. constructing an environment safety potential field model according to the collected environment information; 3. constructing an environment grid map by using a grid method according to the calculated potential field intensity distribution map; 4. initializing deep reinforcement learning parameters, constructing a deep neural network, training the deep neural network, obtaining an optimal path planning model, and performing path planning. According to the method, the grid map is constructed by using the safety potential field theory, and the local path planning of the automobile in various scenes is completed by deep reinforcement learning, so that the safety of the automobile is higher in the driving process, the automobile can pass more efficiently, and the safety navigation of the intelligent automobile is guaranteed.
Description
Technical Field
The invention relates to the field of intelligent automobile safety and path planning, in particular to a method for planning a local condition path of a driver based on a safety potential field and a DQN algorithm.
Background
The path planning is the most critical link for the autonomous navigation of the intelligent automobile. The method mainly aims to search an optimal or suboptimal path from a starting point to a target point under different driving scenes and guarantee the safety of the path. Intelligent driving has been widely applied to relatively simple scenes such as various mining areas and industrial parks, but for relatively complex driving scenes such as actual roads, environmental factors such as obstacles, traffic signs, ground road conditions and other driving vehicles need to be considered, which brings challenges to the research of path planning. With the arrival of the artificial intelligence era, the environment faced by the path planning field becomes more and more complex, which requires the path planning algorithm to have the ability of making rapid changes and the ability of flexible learning to the complex environment. The existing path planning algorithm still has the problems that the vehicle is trapped in local optimization during path planning, so that the whole path cannot be completely planned, and the existing path planning algorithm cannot adapt to complex and changeable scenes.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a driving condition local path planning method based on a safety potential field and a DQN algorithm, so that a grid map is constructed by using a safety potential field theory, and local path planning of an automobile in various scenes is completed by deep reinforcement learning, so that the automobile is higher in safety and more efficient to pass in the driving process, and a guarantee is provided for the safe navigation of an intelligent automobile.
In order to achieve the purpose of the invention, the following technical scheme is adopted:
the invention relates to a driving condition local path planning method based on a safety potential field and a DQN algorithm, which is characterized by comprising the following steps of:
step 2, establishing a vehicle coordinate system by taking the initial position of the current vehicle j as an original point, taking the advancing direction as a Y axis and taking the direction vertical to the Y axis as an X axis;
under the vehicle coordinate system, calculating the potential field intensity E of the road mark b of the current vehicle j to the current vehicle j according to the environment information and the state information R_bj Potential field intensity E of static obstacle c to current vehicle j D_cj And the potential field intensity E of the running vehicle d to the current vehicle j V_dj So as to obtain the total safe potential field intensity E of the current vehicle j;
step 3, constructing a grid map by using a grid method according to the potential field intensity distribution map corresponding to the total safety potential field intensity E;
step 4, defining a state parameter set s and an action parameter set a for the current vehicle j to run;
constructing a deep neural network and initializing network parameters;
training the deep neural network based on a state parameter set s and an action parameter set a to obtain an optimal network model; so as to obtain the state parameter s of the current vehicle j at the ith step i Inputting the optimal network model and outputting the motion parameter a of the ith step i For planning the driving path of the current vehicle j.
The invention discloses a driving condition local path planning method based on a safety potential field and a DQN algorithm, which is characterized in that the step 2 comprises the following steps:
step 2.1: calculating the potential field intensity E of the road mark b to the current vehicle j by using the formula (1) R_bj :
In the formula (1), T b Is a parameter determined by the type of road sign; k is a radical of 1 Is a parameter, p b For the distance threshold, r, between the road marking b and the current vehicle j bj Represents the distance vector between the road sign b and the current vehicle j, and r bj =(x j -x b ,y j -y b ),(x j ,y j ) Is the centroid position coordinate of the current vehicle j in the vehicle coordinate system, (x) b ,y b ) Identifying the position coordinates of the road sign b under the vehicle coordinate system;
step 2.3: calculating the potential field intensity E of the static obstacle c to the current vehicle j by using the formula (2) D_cj :
In the formula (2) (. rho) c Is a distance threshold between a static obstacle c and a current vehicle j, r cj Represents a distance vector between a static obstacle c and a current vehicle j, and r cj =(x j -x c ,y j -y c ) Wherein (x) c ,y c ) Is the centroid position coordinate, k, of the static obstacle c in the vehicle coordinate system 2 And G is a parameter, M c Is the mass of the static obstacle c;
step 2.4: calculating the potential field intensity E of the running vehicle d to the current vehicle j by using the formula (3) V_dj :
In the formula (3), k 3 Is a parameter that is a function of,v d is the speed of the running vehicle d, r dj Represents a distance vector between the running vehicle d and the current vehicle j, and r dj =(x j -x d ,y j -y d ) Wherein (x) d ,y d ) Is the centroid position coordinate, rho, of a running vehicle d in the vehicle coordinate system d Is a distance threshold between the running vehicle d and the current vehicle j d Is v d And r dj The angle between clockwise directions;
step 2.5: calculating the total safe potential field intensity E suffered by the current vehicle j by using the formula (4) j :
E j =E R_j +E V_j +E D_j (4)
In the formula (4), E R_j Identifying for each traffic the sum of the potential field vectors at the current vehicle j position, E D_j For the sum of the potential field vectors of the respective vehicles at the current position of the vehicle j, E V_j And (4) carrying out vector summation on the potential fields of the current position of the vehicle j for each running vehicle.
The step 3 comprises the following steps:
step 3.1: establishing the grid map by using the environmental information of the rectangular area where the current vehicle j is located; and the length range of the grid map is recorded as Y MIN ,Y MAX ]And the width range is marked as [ X ] MIN ,X MAX ](ii) a Let the area of each grid in the grid map be denoted as C R ;
Step 3.2: in the grid map, the obstacle grid is represented by a black grid, the travel grid is represented by a white grid, and the risk level G (M, N) of the grid at the (M, N) position in the grid map is calculated using equation (5):
in the formula (5), M, N represents the abscissa and ordinate in the grid map, respectively, and M is x/C R ,N=y/C R (x, y) represents the position coordinates of an arbitrary point in the vehicle coordinate system, and α (M, N) represents the position at (M, N) under the influence of the potential fieldα (M, N) e [0, 1), and is obtained from formula (6):
in the formula (6), the reaction mixture is,indicating rounding down, E (M, N) indicating the strength of the potential field at the (M, N) position, and Δ indicating the number of levels of the set risk level.
The step 4 comprises the following steps:
step 4.1: define action parameter set a ═ { a ═ a 0 ,a 1 ,a 2 ,…,a i ,…,a I In which a 0 Represents initial motion information of a vehicle j in the grid map, a i Representing the current vehicle j in the grid map according to the course angle for the ith step action parameterMoving forward one grid;
define state parameter set s ═ { s ═ s 0 ,s 1 ,s 2 ,…,s i ,…,s I },s 0 Represents the initial state parameter, s, of the current vehicle j i Indicates the execution of the i-1 st step action a i-1 A latter state parameter, and s i ={(M,N) i ,G(M,N) i },(M,N) i Representing the execution of the ith step action parameter a i Coordinate point of the next current vehicle j in the grid map, G (M, N) i Representing the execution of the ith step action parameter a i Risk level of the back coordinate point; i represents the maximum number of steps;
step 4.2: initializing greedy probability as epsilon, attenuation coefficient as gamma, and capacity of an experience playback pool as n;
step 4.3: constructing an online network, comprising: an input layer, a hidden layer and an output layer, and initializing the weight parameters and the bias parameters of the online network;
step 4.4: the state parameter s of the ith step i Inputting into an input layer in the online network, wherein the output layer outputs each neuron to execute the ith action parameter a according to different course angles by using the formula (7) i Corresponding value function Q i :
Q i =σ(W 2 ×Relu(W 1 s i +b 1 )+b 2 ) (7)
In the formula (7), W 1 As weight array of input layer to hidden layer, b 1 For an input layer to hidden layer bias array, W 2 Weight array for hidden layer to output layer, b 2 A bias array from a hidden layer to an output layer; relu is the activation function; sigma is a sigmod function;
step 4.5: the reward and punishment function of the depth reinforcement learning is defined by the following equations (8) to (11):
r i =w 1 r sm,i +w 2 r s,i +w 3 r end,i (11)
in formulae (8) to (11), r sm,i Reward value r for smoothness of the trajectory formed by the coordinate points traversed by the current vehicle j in step i s,i Reward value, r, for step i trajectory security end,i Reward value, r, for whether step I vehicle j reaches the end within maximum number of steps I i Total prize value, Δ δ, for step i vehicle j i The variation of the j course angle of the vehicle in the ith step is delta which is a threshold parameter of the variation of the course angle, and lambda 1 Proportional parameter, λ, for the reward value for trajectory safety 2 Bias parameter for reward value for track security, η 1 、η 2 Zeta, a reward parameter for smoothness of the trajectory under different conditions 1 And ζ 2 Reward parameter, w, for whether a track reaches an endpoint under different conditions 1 、w 2 And w 3 Representing the ratio of weights to different awards, (x) j,i ,y j,i ) And (x) end ,y end ) A coordinate point representing the current position of the current vehicle j at the ith step and a coordinate point representing the target position, d 1 ,d 2 ,d 3 And d 4 Are all distance thresholds;
step 4.6: generating a random number tau between 0 and 1, judging whether tau < epsilon is established, and if yes, obtaining a value function Q corresponding to each neuron i The course angle corresponding to the maximum function value is selected as the ith step action parameter executed by the current vehicle j, otherwise, the secondary function Q is selected i Randomly selecting a corresponding course angle as an action parameter executed by the current vehicle j;
step 4.7: training a deep neural network consisting of an online network and a target network;
step 4.7.1: creating an experience pool D for storing the state, action and reward information of the vehicle at each step;
the state parameter s of the current vehicle j at the ith step i Lower execution action parameter a i Obtaining the state parameter s of the step i +1 i+1 And the prize value r of step i i And forming a parameter form(s) i ,a i ,r i ,s i+1 );
When the number of the parameters in the experience pool is larger than n, replacing one parameter added at the earliest by one newly generated parameter;
step 4.7.2: constructing a target network with the same structure as the online network, and initializing the parameters of the target network into a weight array W of the online network 1 、W 2 And a bias array b 1 、b 2 ;
Step 4.7.3: randomly extracting u parameter forms in the experience pool D, and obtaining the state parameter s of the (i + 1) th step of the current vehicle j i+1 Inputting the target network and obtaining the target network using equation (12)Each neuron of the output layer executes the (i + 1) th step action a according to different course angles i+1 Corresponding value function Q' i+1 :
Q′ i+1 =σ(W′ 2 ×Relu(W′ 1 s i+1 +b′ 1 )+b′ 2 ) (12)
W 'in the formula (12)' 1 And W' 2 Weight arrays, b 'of hidden and output layers of the target network respectively' 1 And b' 2 Respectively are bias arrays of a hidden layer and an output layer of the target network;
step 4.7.4: calculating the state parameter s of the current vehicle j in the ith step by using the formula (13) i Corresponding value function Q tag,i :
Q tag,i =R i +γmax(Q′ i+1 ) (13)
In the formula (13), R i To execute the ith step action parameter a i The latter reward value, gamma, is the reward attenuation factor;
step 4.7.5: constructing a loss function loss by using the formula (14), training the online network by using a gradient descent method, and simultaneously calculating the loss function loss to update the parameter W of the online network 1 ,b 1 ,W 2 And b 2 Assigning parameters in the online network to a target network when the training times reach a fixed number;
loss=E((Q tag,i -Q real,i ) 2 ) (14)
in formula (14), Q real,i Representing the ith step action a in the form of the extracted u parameters i A corresponding value function; e represents expectation;
step 4.8: substituting the updated network parameters of the online network into the step 4.4-step 4.7 for iterative training until loss function loss is converged, thereby obtaining an optimal local path planning model;
step 4.9: and inputting the state parameters of the current vehicle j into the optimal local path planning model to obtain corresponding action parameters, recording the position coordinates of the vehicle in the grid map after executing the action parameters of each step, and converting the position coordinates into actual coordinates in a vehicle coordinate system so as to fit the actual coordinates, wherein the obtained fitted curve is the path planned by the current vehicle j.
Compared with the prior art, the invention has the beneficial effects that:
1. the grid map is constructed by utilizing the safety potential field, the limitation that the distance is taken as a single target in the traditional path planning is overcome, the safety, smoothness and traditional distance factors of the path are comprehensively considered, the environmental elements are simplified by utilizing the grid map, the grid map is easy to construct in an actual scene, and the planning of a short path is convenient;
2. compared with the traditional path planning algorithm reinforcement learning, the reinforcement learning algorithm has the advantages of exploration and development, can learn how to jump out local stable points under the condition that the traditional algorithm falls into local optimum, and can plan paths more timely and reliably in practical application compared with the traditional search and sampling algorithms;
3. the invention introduces the concept of a safety potential field, digitalizes more environmental factors and can deal with more and more complex driving scenes.
Drawings
FIG. 1 is a flow chart of the network training of the present invention;
fig. 2 is a schematic diagram of the action parameters that can be selected by the vehicle j in each step of the grid map.
Detailed Description
In this embodiment, a driving condition path planning method based on a safety potential field and a DQN algorithm can calculate the safety potential field distribution of the current environment of an automobile according to the current driving scene, thereby constructing a grid map, and finally design a reward function by using a depth reinforcement learning algorithm in combination with various factors, specifically including the following steps:
step 2, establishing a vehicle coordinate system by taking the initial position of the current vehicle j as an original point, taking the advancing direction as a Y axis and taking the direction vertical to the Y axis as an X axis;
under a vehicle coordinate system, calculating the potential field intensity E of the road mark b of the current vehicle j to the current vehicle j according to the environment information and the state information R_bj Potential field intensity E of static obstacle c to current vehicle j D_cj And the potential field intensity E of the running vehicle d to the current vehicle j V_dj So as to obtain the total safe potential field intensity E of the current vehicle j;
step 2.1: calculating the potential field intensity E of the road mark b to the current vehicle j by using the formula (1) R_bj :
In the formula (1), T b Is a parameter determined by the type of road sign; k is a radical of 1 Is a parameter, p b For the distance threshold between the road sign b and the current vehicle j, r bj Represents the distance vector between the road sign b and the current vehicle j, and r bj =(x j -x b ,y j -y b ),(x j ,y j ) Is the centroid position coordinate of the current vehicle j in the vehicle coordinate system, (x) b ,y b ) Position coordinates of the road sign identifier b under the vehicle coordinate system are obtained;
step 2.3: calculating the potential field intensity E of the static obstacle c to the current vehicle j by using the formula (2) D_cj :
In the formula (2), ρ c Is a distance threshold between a static obstacle c and a current vehicle j, r cj Representing the distance between a static obstacle c and the current vehicle jVector, and r cj =(x j -x c ,y j -y c ) Wherein (x) c ,y c ) Is the centroid position coordinate, k, of the static obstacle c in the vehicle coordinate system 2 And G is a parameter, M c Is the mass of the static obstacle c;
step 2.4: calculating the potential field intensity E of the running vehicle d to the current vehicle j by using the formula (3) V_dj :
In the formula (3), k 3 Is a parameter, v d Is the speed of the running vehicle d, r dj Represents a distance vector between the running vehicle d and the current vehicle j, and r dj =(x j -x d ,y j -y d ) Wherein (x) d ,y d ) For the coordinates of the position of the center of mass, p, of the vehicle d travelling under the vehicle coordinate system d Is a distance threshold between the running vehicle d and the current vehicle j d Is v d And r dj The angle between the clockwise directions;
step 2.5: calculating the total safe potential field intensity E suffered by the current vehicle j by using the formula (4) j :
E j =E R_j +E V_j +E D_j (4)
In the formula (4), E R_j Identifying for each traffic the sum of the potential field vectors at the current vehicle j position, E D_j For the sum of the potential field vectors of the respective vehicles at the current position of the vehicle j, E V_j And (4) carrying out vector summation on the potential fields of the current position of the vehicle j for each running vehicle.
Step 3, constructing a grid map by using a grid method according to a potential field intensity distribution diagram corresponding to the total safety potential field intensity E;
step 3.1: establishing a grid map by using the environmental information of the rectangular area where the current vehicle j is located; and the length range of the grid map is recorded as Y MIN ,Y MAX ]And the width range is marked as [ X ] MIN ,X MAX ](ii) a Let each grid in grid mapArea of (A) is marked as C R In this example, Y is each MIN ,Y MAX ,X MIN ,X MAX Are respectively set as-10, 30, -10, 10, and the units are m and C R Is set to 0.5 and has the unit of m 2 ;
Step 3.2: in the grid map, the obstacle grid is represented by a black grid, the travel grid is represented by a white grid, and the risk level G (M, N) of the grid at the (M, N) position in the grid map is calculated using equation (5):
in the formula (5), M, N represents the abscissa and ordinate of the grid map, respectively, and M is x/C R ,N=y/C R (x, y) represents the position coordinates of an arbitrary point in the vehicle coordinate system, α (M, N) represents the risk level at the (M, N) position under the influence of the potential field, α (M, N) ∈ [0, 1), and is obtained by equation (6):
in the formula (6), the reaction mixture is,indicating rounding down, E (M, N) indicating the strength of the potential field at the (M, N) position, and Δ indicating the number of levels of risk set, in this example Δ is taken to be 8, considering that each level of security should have a certain degree of discrimination and no significant faults.
Step 4, defining a state parameter set s and an action parameter set a for the current vehicle j to run;
constructing a deep neural network and initializing network parameters;
training the deep neural network based on the state parameter set s and the action parameter set a to obtain an optimal network model; so as to obtain the state parameter s of the current vehicle j at the ith step i Inputting the optimal network model and outputting the motion parameter a of the ith step i For the currentFig. 1 shows a flowchart of deep reinforcement learning for planning a travel path of a vehicle j.
Step 4.1: define action parameter set a ═ { a ═ a 0 ,a 1 ,a 2 ,…,a i ,…,a I In which a 0 Representing the initial motion parameter, a, of the vehicle j in the grid map i For the ith step action parameter, representing the current vehicle j in the grid map according to the course angleMoving a grid forward, wherein the action parameters which can be selected in each step are shown in figure 2 according to the characteristics of the grid map;
define state parameter set s ═ s 0 ,s 1 ,s 2 ,…,s i ,…,s I },s 0 Represents the initial state parameter, s, of the current vehicle j i Indicates the execution of the i-1 st step action a i-1 A latter state parameter, and s i ={(M,N) i ,G(M,N) i },(M,N) i Representing the execution of the ith step action parameter a i Coordinate point of the next current vehicle j in the grid map, G (M, N) i Representing the execution of the ith step action parameter a i Risk level of the back coordinate point; i represents the maximum number of steps;
step 4.2: initializing greedy probability as epsilon, attenuation coefficient as gamma, and capacity of an experience playback pool as n;
step 4.3: constructing an online network, comprising: an input layer, a hidden layer and an output layer, and initializing a weight parameter and a bias parameter of an online network;
in this example, the state consists of the horizontal and vertical coordinates of the vehicle j in the grid map and the risk level of the corresponding position, and the action of each step has a different value in 8 in the course angle in the grid map, so that the input layer has 3 neurons and the output layer has 8 neurons;
step 4.4: the state parameter s of the ith step i Inputting into an input layer in the online network, and outputting the ith action parameter a by each neuron according to different course angles by using an output layer according to the formula (7) i Corresponding value function Q i :
Q i =σ(W 2 ×Relu(W 1 s i +b 1 )+b 2 ) (7)
In the formula (7), W 1 As an array of weights from input layer to hidden layer, b 1 For an input layer to hidden layer bias array, W 2 Weight array for hidden layer to output layer, b 2 A bias array from a hidden layer to an output layer; relu is the activation function; sigma is a sigmod function;
step 4.5: the reward and punishment function of the depth reinforcement learning is defined by the following equations (8) to (11):
r i =w 1 r sm,i +w 2 r s,i +w 3 r end,i (11)
in formulae (8) to (11), r sm,i Reward value r for smoothness of the trajectory formed by the coordinate points traversed by the current vehicle j in step i s,i Reward value, r, for step i trajectory security s,i The safety training method is obtained by calculation according to the safety levels, the total number of the safety levels is more, therefore, the form of a function is adopted, the safety of all levels is considered, the training effect and the training efficiency are improved, and r end,i Reward value, r, for whether step I vehicle j reaches the end within maximum number of steps I i Total reward value, Δ δ, for step i vehicle j i The variation of the j course angle of the vehicle in the ith step is shown, delta is a threshold parameter of the variation of the course angle, and the delta is set to be 45 degrees in the example due to the characteristics of the grid map so as to ensure that the variation of the course angle of each step is not too large under the normal condition, thereby influencing the trackSmoothness of the last formed path, λ 1 Proportional parameter, λ, for the reward value for trajectory safety 2 Bias parameter for reward value for track security, η 1 、η 2 Zeta reward parameter for smoothness of track under different conditions 1 And ζ 2 Reward parameter, w, for whether a track reaches an endpoint under different conditions 1 、w 2 And w 3 Representing the proportion of the weights to different rewards, and calculating different weights for the reward function with no stop in the total reward value, so that the training effect can be expected more quickly, (x) j,i ,y j,i ) And (x) end ,y end ) Respectively representing a coordinate point of the current position of the ith step of the current vehicle j and a coordinate point of the target position, d 1 ,d 2 ,d 3 And d 4 Distance thresholds, which give a reward value when the vehicle approaches the end point, taking into account that in the initial training, the vehicle may rarely reach the midpoint;
step 4.6: in the actual training, the value of the value function of a certain step is larger, the value of the value function trained later is smaller, and the value function falls into the local optimal solution, so a greedy strategy is adopted to generate a random number tau between 0 and 1, whether tau is greater than epsilon is judged, and if tau is greater than epsilon, the value function Q corresponding to each neuron is selected i The course angle corresponding to the maximum function value is selected as the ith step action parameter executed by the current vehicle j, otherwise, the secondary function Q is selected i Randomly selecting a corresponding course angle as an action parameter executed by the current vehicle j;
step 4.7: training a deep neural network consisting of an online network and a target network;
step 4.7.1: creating an experience pool D for storing the state, action and reward information of the vehicle at each step;
the state parameter s of the current vehicle j in the ith step i Lower execution action parameter a i Obtaining the state parameter s of the step i +1 i+1 And the prize value r of step i i And forming a parameter form(s) i ,a i ,r i ,s i+1 );
When the number of the parameters in the experience pool is larger than n, replacing an earliest added parameter by a newly generated parameter;
step 4.7.2: constructing a target network with the same structure as the online network, and initializing the parameters of the target network into a weight array W of the online network 1 、W 2 And an offset array b 1 、b 2 ;
Step 4.7.3: randomly extracting u parameter forms in the experience pool D, and obtaining the state parameter s of the (i + 1) th step of the current vehicle j i+1 Inputting a target network, and obtaining each neuron of an output layer of the target network by using the formula (12) to execute the (i + 1) step action a according to different course angles i+1 Corresponding value function Q' i+1 :
Q′ i+1 =σ(W′ 2 ×Relu(W′ 1 s i+1 +b′ 1 )+b′ 2 ) (12)
In the formula (12), W 1 ' and W 2 ' weight arrays for hidden and output layers of the target network, respectively, b 1 ' and b 2 ' bias arrays for hidden and output layers of the target network, respectively;
step 4.7.4: calculating the state parameter s of the current vehicle j in the ith step by using the formula (13) i Corresponding value function Q tag,i :
Q tag,i =R i +γmax(Q′ i+1 ) (13)
In the formula (13), R i To execute the ith step action parameter a i The latter reward value, gamma, is the reward attenuation factor; q tag,i The calculation of (2) uses a Markov decision process, the value range of gamma is 0-1, and when gamma is 0, the current state outputs Q tag,i The value of the value function is calculated by considering the following neural network when the value of the gamma value tends to 1;
step 4.7.5: constructing a loss function loss by using the formula (14), training the online network by using a gradient descent method, and simultaneously calculating the loss function loss to update the parameter W of the online network 1 ,b 1 ,W 2 And b 2 Assigning parameters in the online network to a target network when the training times reach a fixed number;
loss=E((Q tag,i -Q real,i ) 2 ) (14)
in formula (14), Q real,i Representing the ith step action a in the form of the extracted u parameters i A corresponding value function; e represents expectation;
step 4.8: substituting the updated network parameters of the online network into the step 4.4-step 4.7 for iterative training until loss function loss is converged, thereby obtaining an optimal local path planning model;
step 4.9: and inputting the state parameters of the current vehicle j into the optimal local path planning model to obtain corresponding action parameters, recording the position coordinates of the vehicle in the grid map after executing the action parameters of each step, and converting the position coordinates into actual coordinates in a vehicle coordinate system so as to fit the actual coordinates, wherein the obtained fitted curve is the path planned by the current vehicle j.
Claims (4)
1. A driving condition local path planning method based on a safety potential field and a DQN algorithm is characterized by comprising the following steps:
step 1, acquiring environmental information of a current vehicle j in an environment of a vehicle network, and acquiring state information of the current vehicle j through a vehicle sensor;
step 2, establishing a vehicle coordinate system by taking the initial position of the current vehicle j as an original point, taking the advancing direction as a Y axis and taking the direction vertical to the Y axis as an X axis;
under the vehicle coordinate system, calculating the potential field intensity E of the road mark b of the current vehicle j to the current vehicle j according to the environment information and the state information R_bj Potential field intensity E of static obstacle c to current vehicle j D_cj And the potential field intensity E of the running vehicle d to the current vehicle j V_dj So as to obtain the total safe potential field intensity E of the current vehicle j;
step 3, constructing a grid map by using a grid method according to the potential field intensity distribution map corresponding to the total safety potential field intensity E;
step 4, defining a state parameter set s and an action parameter set a of the current vehicle j in driving;
constructing a deep neural network and initializing network parameters;
training the deep neural network based on a state parameter set s and an action parameter set a to obtain an optimal network model; so as to obtain the state parameter s of the current vehicle j at the ith step i Inputting the optimal network model and outputting the motion parameter a of the ith step i Used for planning the driving path of the current vehicle j.
2. The method for planning the local path of the driving condition based on the safety potential field and the DQN algorithm according to claim 1, wherein the step 2 comprises:
step 2.1: calculating the potential field intensity E of the road mark b to the current vehicle j by using the formula (1) R_bj :
In the formula (1), T b Is a parameter determined by the type of road sign; k is a radical of 1 Is a parameter, p b For the distance threshold between the road sign b and the current vehicle j, r bj Represents the distance vector between the road sign b and the current vehicle j, and r bj =(x j -x b ,y j -y b ),(x j ,y j ) Is the centroid position coordinate of the current vehicle j in the vehicle coordinate system, (x) b ,y b ) Identifying the position coordinates of the road sign b under the vehicle coordinate system;
step 2.3: calculating the potential field intensity E of the static obstacle c to the current vehicle j by using the formula (2) D_cj :
In the formula (2), ρ c Is a distance threshold between a static obstacle c and a current vehicle j, r cj Represents a distance vector between a static obstacle c and a current vehicle j, and r cj =(x j -x c ,y j -y c ) Wherein (x) c ,y c ) Is the coordinate of the centroid position of the static obstacle c under the vehicle coordinate system, k 2 And G is a parameter, M c Is the mass of the static obstacle c;
step 2.4: calculating the potential field intensity E of the running vehicle d to the current vehicle j by using the formula (3) V_dj :
In the formula (3), k 3 Is a parameter, v d Is the speed of the running vehicle d, r dj Represents a distance vector between the running vehicle d and the current vehicle j, and r dj =(x j -x d ,y j -y d ) Wherein (x) d ,y d ) Is the centroid position coordinate, rho, of a running vehicle d in the vehicle coordinate system d Is a distance threshold between the running vehicle d and the current vehicle j d Is v d And r dj The angle between clockwise directions;
step 2.5: calculating the total safe potential field intensity E suffered by the current vehicle j by using the formula (4) j :
E j =E R_j +E V_j +E D_j (4)
In formula (4), E R_j Identifying for each traffic the sum of the potential field vectors at the current vehicle j position, E D_j For the sum of the potential field vectors of the respective vehicles at the current position of the vehicle j, E V_j And (4) carrying out vector summation on the potential fields of the current position of the vehicle j for each running vehicle.
3. The method for planning the local path of the driving condition based on the safety potential field and the DQN algorithm according to claim 1, wherein the step 3 comprises:
step 3.1: establishing the grid map by using the environmental information of the rectangular area where the current vehicle j is located; and the length range of the grid map is recorded as [ Y ] MIN ,Y MAX ]And the width range is marked as [ X ] MIN ,X MAX ](ii) a Let the area of each grid in the grid map be denoted as C R ;
Step 3.2: in the grid map, the obstacle grid is represented by a black grid, the travel grid is represented by a white grid, and the risk level G (M, N) of the grid at the (M, N) position in the grid map is calculated using equation (5):
in the formula (5), M, N represents the abscissa and ordinate in the grid map, respectively, and M is x/C R ,N=y/C R (x, y) represents the position coordinates of any point in the vehicle coordinate system, α (M, N) represents the risk level at the (M, N) position under the influence of the potential field, α (M, N) ∈ [0, 1), and is obtained by equation (6):
4. The method for planning the local path of the driving condition based on the safety potential field and the DQN algorithm according to claim 1, wherein the step 4 comprises:
step 4.1: define action parameter set a ═ a 0 ,a 1 ,a 2 ,…,a i ,…,a I In which a 0 Represents initial motion information of a vehicle j in the grid map, a i Representing the current vehicle j in the grid map according to the course angle for the ith step action parameterMoving forward one grid;
define state parameter set s ═ { s ═ s 0 ,s 1 ,s 2 ,…,s i ,…,s I },s 0 Represents the initial state parameter, s, of the current vehicle j i Indicates the execution of the i-1 st step action a i-1 A latter state parameter, and s i ={(M,N) i ,G(M,N) i },(M,N) i Representing the execution of the ith step action parameter a i Coordinate point of the next current vehicle j in the grid map, G (M, N) i Representing the execution of the ith step action parameter a i Risk level of the back coordinate point; i represents the maximum number of steps;
step 4.2: initializing greedy probability as epsilon, attenuation coefficient as gamma, and capacity of an experience playback pool as n;
step 4.3: constructing an online network, comprising: an input layer, a hidden layer and an output layer, and initializing a weight parameter and a bias parameter of the online network;
step 4.4: the state parameter s of the step i is measured i Inputting into an input layer in the online network, wherein the output layer outputs each neuron to execute the ith action parameter a according to different course angles by using the formula (7) i Corresponding value function Q i :
Q i =σ(W 2 ×Relu(W 1 s i +b 1 )+b 2 ) (7)
In the formula (7), W 1 As an array of weights from input layer to hidden layer, b 1 For an input layer to hidden layer bias array, W 2 Weight array for hidden layer to output layer, b 2 A bias array from a hidden layer to an output layer; relu is the activation function; sigma is a sigmod function;
step 4.5: the reward and punishment function of the depth reinforcement learning is defined by the following equations (8) to (11):
r i =w 1 r sm,i +w 2 r s,i +w 3 r end,i (11)
in formulae (8) to (11), r sm,i Reward value r for smoothness of the trajectory formed by the coordinate points traversed by the current vehicle j in step i s,i Reward value, r, for step i trajectory security end,i Reward value, r, for whether step I vehicle j reaches the end within maximum number of steps I i Total prize value, Δ δ, for step i vehicle j i The variation of the j course angle of the vehicle in the ith step is delta which is a threshold parameter of the variation of the course angle, and lambda 1 Proportional parameter, λ, for the reward value for trajectory safety 2 Bias parameter for reward value for track security, η 1 、η 2 Zeta, a reward parameter for smoothness of the trajectory under different conditions 1 And ζ 2 Reward parameter, w, for whether a track reaches an endpoint under different conditions 1 、w 2 And w 3 Representing the ratio of weights to different awards, (x) j,i ,y j,i ) And (x) end ,y end ) Respectively representing a coordinate point of the current position of the ith step of the current vehicle j and a coordinate point of the target position, d 1 ,d 2 ,d 3 And d 4 Are all distance thresholds;
step 4.6: generating a random number tau between 0 and 1, judging whether tau < epsilon is true, if yes, obtaining a value function Q corresponding to each neuron i The course angle corresponding to the maximum function value is selected as the ith step action parameter executed by the current vehicle j, otherwise, the secondary function Q is selected i Randomly selecting a corresponding course angle as an action parameter executed by the current vehicle j;
step 4.7: training a deep neural network consisting of an online network and a target network;
step 4.7.1: creating an experience pool D for storing the state, action and reward information of the vehicle at each step;
the state parameter s of the current vehicle j in the ith step i Lower execution action parameter a i Obtaining the state parameter s of the (i + 1) th step i+1 And the prize value r of step i i And forming a parameter form(s) i ,a i ,r i ,s i+1 );
When the number of the parameters in the experience pool is larger than n, replacing an earliest added parameter by a newly generated parameter;
step 4.7.2: constructing a target network with the same structure as the online network, and initializing the parameters of the target network into a weight array W of the online network 1 、W 2 And a bias array b 1 、b 2 ;
Step 4.7.3: randomly extracting u parameter forms in an experience pool D, and converting the state parameter s of the (i + 1) th step of the current vehicle j i+1 Inputting the target network, and obtaining each neuron of an output layer of the target network by using a formula (12) to execute an i +1 step action a according to different course angles i+1 Corresponding value function Q' i+1 :
Q′ i+1 =σ(W 2 ′×Relu(W 1 ′s i+1 +b 1 ′)+b 2 ′) (12)
In the formula (12), W 1 ' and W 2 ' weight arrays for hidden and output layers of the target network, respectively, b 1 ' and b 2 ' bias arrays for hidden and output layers of the target network, respectively;
step 4.7.4: calculating the state parameter s of the current vehicle j in the ith step by using the formula (13) i Corresponding value function Q tag,i :
Q tag,i =R i +γmax(Q′ i+1 ) (13)
In the formula (13), R i To execute the ith step action parameter a i The latter reward value, gamma, is the reward attenuation factor;
step 4.7.5: constructing a loss function loss by using a formula (14), training the online network by using a gradient descent method, and calculating the loss function loss to update the parameter W of the online network 1 ,b 1 ,W 2 And b 2 Assigning parameters in the online network to a target network when the training times reach a fixed number;
loss=E((Q tag,i -Q real,i ) 2 ) (14)
in formula (14), Q real,i Representing the ith step action a in the form of the extracted u parameters i A corresponding value function; e represents expectation;
step 4.8: substituting the updated network parameters of the online network into the step 4.4-step 4.7 for iterative training until loss function loss is converged, thereby obtaining an optimal local path planning model;
step 4.9: and inputting the state parameters of the current vehicle j into the optimal local path planning model to obtain corresponding action parameters, recording the position coordinates of the vehicle in the grid map after executing the action parameters of each step, and converting the position coordinates into actual coordinates in a vehicle coordinate system so as to fit the actual coordinates, wherein the obtained fitted curve is the path planned by the current vehicle j.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210650446.3A CN115031753A (en) | 2022-06-09 | 2022-06-09 | Driving condition local path planning method based on safety potential field and DQN algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210650446.3A CN115031753A (en) | 2022-06-09 | 2022-06-09 | Driving condition local path planning method based on safety potential field and DQN algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115031753A true CN115031753A (en) | 2022-09-09 |
Family
ID=83122468
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210650446.3A Pending CN115031753A (en) | 2022-06-09 | 2022-06-09 | Driving condition local path planning method based on safety potential field and DQN algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115031753A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117601904A (en) * | 2024-01-22 | 2024-02-27 | 中国第一汽车股份有限公司 | Vehicle running track planning method and device, vehicle and storage medium |
-
2022
- 2022-06-09 CN CN202210650446.3A patent/CN115031753A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117601904A (en) * | 2024-01-22 | 2024-02-27 | 中国第一汽车股份有限公司 | Vehicle running track planning method and device, vehicle and storage medium |
CN117601904B (en) * | 2024-01-22 | 2024-05-14 | 中国第一汽车股份有限公司 | Vehicle running track planning method and device, vehicle and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111061277B (en) | Unmanned vehicle global path planning method and device | |
Cai et al. | High-speed autonomous drifting with deep reinforcement learning | |
US11036232B2 (en) | Iterative generation of adversarial scenarios | |
CN107063280A (en) | A kind of intelligent vehicle path planning system and method based on control sampling | |
CN113044064B (en) | Vehicle self-adaptive automatic driving decision method and system based on meta reinforcement learning | |
CN112784485B (en) | Automatic driving key scene generation method based on reinforcement learning | |
CN112508164B (en) | End-to-end automatic driving model pre-training method based on asynchronous supervised learning | |
CN113715842B (en) | High-speed moving vehicle control method based on imitation learning and reinforcement learning | |
CN115257745A (en) | Automatic driving lane change decision control method based on rule fusion reinforcement learning | |
CN116804879A (en) | Robot path planning framework method for improving dung beetle algorithm and fusing DWA algorithm | |
CN115344052B (en) | Vehicle path control method and control system based on improved group optimization algorithm | |
CN115031753A (en) | Driving condition local path planning method based on safety potential field and DQN algorithm | |
CN113386790A (en) | Automatic driving decision-making method for cross-sea bridge road condition | |
Sun et al. | Human-like highway trajectory modeling based on inverse reinforcement learning | |
CN113487889B (en) | Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent | |
Yang et al. | Vehicle trajectory prediction based on LSTM network | |
CN116448134B (en) | Vehicle path planning method and device based on risk field and uncertain analysis | |
CN116127853A (en) | Unmanned driving overtaking decision method based on DDPG (distributed data base) with time sequence information fused | |
CN116360454A (en) | Robot path collision avoidance planning method based on deep reinforcement learning in pedestrian environment | |
CN111443701A (en) | Unmanned vehicle/robot behavior planning method based on heterogeneous deep learning | |
CN113033902B (en) | Automatic driving lane change track planning method based on improved deep learning | |
CN114701517A (en) | Multi-target complex traffic scene automatic driving solution based on reinforcement learning | |
CN114527759A (en) | End-to-end driving method based on layered reinforcement learning | |
Anderson et al. | Autonomous navigation via a deep Q network with one-hot image encoding | |
CN114779764B (en) | Vehicle reinforcement learning movement planning method based on driving risk analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |