CN115031753A - Driving condition local path planning method based on safety potential field and DQN algorithm - Google Patents

Driving condition local path planning method based on safety potential field and DQN algorithm Download PDF

Info

Publication number
CN115031753A
CN115031753A CN202210650446.3A CN202210650446A CN115031753A CN 115031753 A CN115031753 A CN 115031753A CN 202210650446 A CN202210650446 A CN 202210650446A CN 115031753 A CN115031753 A CN 115031753A
Authority
CN
China
Prior art keywords
vehicle
parameter
current vehicle
potential field
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210650446.3A
Other languages
Chinese (zh)
Inventor
黄鹤
周宇
钱同林
程腾
白先旭
付梦园
张峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Intelligent Manufacturing Institute of Hefei University Technology
Original Assignee
Hefei University of Technology
Intelligent Manufacturing Institute of Hefei University Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology, Intelligent Manufacturing Institute of Hefei University Technology filed Critical Hefei University of Technology
Priority to CN202210650446.3A priority Critical patent/CN115031753A/en
Publication of CN115031753A publication Critical patent/CN115031753A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/3446Details of route searching algorithms, e.g. Dijkstra, A*, arc-flags, using precalculated routes
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/38Electronic maps specially adapted for navigation; Updating thereof
    • G01C21/3804Creation or updating of map data
    • G01C21/3807Creation or updating of map data characterised by the type of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a driving condition local path planning method based on a safety potential field and a DQN algorithm, which comprises the following steps: 1. acquiring surrounding environment information of the vehicle state information; 2. constructing an environment safety potential field model according to the collected environment information; 3. constructing an environment grid map by using a grid method according to the calculated potential field intensity distribution map; 4. initializing deep reinforcement learning parameters, constructing a deep neural network, training the deep neural network, obtaining an optimal path planning model, and performing path planning. According to the method, the grid map is constructed by using the safety potential field theory, and the local path planning of the automobile in various scenes is completed by deep reinforcement learning, so that the safety of the automobile is higher in the driving process, the automobile can pass more efficiently, and the safety navigation of the intelligent automobile is guaranteed.

Description

Driving condition local path planning method based on safety potential field and DQN algorithm
Technical Field
The invention relates to the field of intelligent automobile safety and path planning, in particular to a method for planning a local condition path of a driver based on a safety potential field and a DQN algorithm.
Background
The path planning is the most critical link for the autonomous navigation of the intelligent automobile. The method mainly aims to search an optimal or suboptimal path from a starting point to a target point under different driving scenes and guarantee the safety of the path. Intelligent driving has been widely applied to relatively simple scenes such as various mining areas and industrial parks, but for relatively complex driving scenes such as actual roads, environmental factors such as obstacles, traffic signs, ground road conditions and other driving vehicles need to be considered, which brings challenges to the research of path planning. With the arrival of the artificial intelligence era, the environment faced by the path planning field becomes more and more complex, which requires the path planning algorithm to have the ability of making rapid changes and the ability of flexible learning to the complex environment. The existing path planning algorithm still has the problems that the vehicle is trapped in local optimization during path planning, so that the whole path cannot be completely planned, and the existing path planning algorithm cannot adapt to complex and changeable scenes.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a driving condition local path planning method based on a safety potential field and a DQN algorithm, so that a grid map is constructed by using a safety potential field theory, and local path planning of an automobile in various scenes is completed by deep reinforcement learning, so that the automobile is higher in safety and more efficient to pass in the driving process, and a guarantee is provided for the safe navigation of an intelligent automobile.
In order to achieve the purpose of the invention, the following technical scheme is adopted:
the invention relates to a driving condition local path planning method based on a safety potential field and a DQN algorithm, which is characterized by comprising the following steps of:
step 1, acquiring environmental information of a current vehicle j in an environment of a vehicle network, and acquiring state information of the current vehicle j through a vehicle sensor;
step 2, establishing a vehicle coordinate system by taking the initial position of the current vehicle j as an original point, taking the advancing direction as a Y axis and taking the direction vertical to the Y axis as an X axis;
under the vehicle coordinate system, calculating the potential field intensity E of the road mark b of the current vehicle j to the current vehicle j according to the environment information and the state information R_bj Potential field intensity E of static obstacle c to current vehicle j D_cj And the potential field intensity E of the running vehicle d to the current vehicle j V_dj So as to obtain the total safe potential field intensity E of the current vehicle j;
step 3, constructing a grid map by using a grid method according to the potential field intensity distribution map corresponding to the total safety potential field intensity E;
step 4, defining a state parameter set s and an action parameter set a for the current vehicle j to run;
constructing a deep neural network and initializing network parameters;
training the deep neural network based on a state parameter set s and an action parameter set a to obtain an optimal network model; so as to obtain the state parameter s of the current vehicle j at the ith step i Inputting the optimal network model and outputting the motion parameter a of the ith step i For planning the driving path of the current vehicle j.
The invention discloses a driving condition local path planning method based on a safety potential field and a DQN algorithm, which is characterized in that the step 2 comprises the following steps:
step 2.1: calculating the potential field intensity E of the road mark b to the current vehicle j by using the formula (1) R_bj
Figure BDA0003685857880000021
In the formula (1), T b Is a parameter determined by the type of road sign; k is a radical of 1 Is a parameter, p b For the distance threshold, r, between the road marking b and the current vehicle j bj Represents the distance vector between the road sign b and the current vehicle j, and r bj =(x j -x b ,y j -y b ),(x j ,y j ) Is the centroid position coordinate of the current vehicle j in the vehicle coordinate system, (x) b ,y b ) Identifying the position coordinates of the road sign b under the vehicle coordinate system;
step 2.3: calculating the potential field intensity E of the static obstacle c to the current vehicle j by using the formula (2) D_cj
Figure BDA0003685857880000022
In the formula (2) (. rho) c Is a distance threshold between a static obstacle c and a current vehicle j, r cj Represents a distance vector between a static obstacle c and a current vehicle j, and r cj =(x j -x c ,y j -y c ) Wherein (x) c ,y c ) Is the centroid position coordinate, k, of the static obstacle c in the vehicle coordinate system 2 And G is a parameter, M c Is the mass of the static obstacle c;
step 2.4: calculating the potential field intensity E of the running vehicle d to the current vehicle j by using the formula (3) V_dj
Figure BDA0003685857880000023
In the formula (3), k 3 Is a parameter that is a function of,v d is the speed of the running vehicle d, r dj Represents a distance vector between the running vehicle d and the current vehicle j, and r dj =(x j -x d ,y j -y d ) Wherein (x) d ,y d ) Is the centroid position coordinate, rho, of a running vehicle d in the vehicle coordinate system d Is a distance threshold between the running vehicle d and the current vehicle j d Is v d And r dj The angle between clockwise directions;
step 2.5: calculating the total safe potential field intensity E suffered by the current vehicle j by using the formula (4) j
E j =E R_j +E V_j +E D_j (4)
In the formula (4), E R_j Identifying for each traffic the sum of the potential field vectors at the current vehicle j position, E D_j For the sum of the potential field vectors of the respective vehicles at the current position of the vehicle j, E V_j And (4) carrying out vector summation on the potential fields of the current position of the vehicle j for each running vehicle.
The step 3 comprises the following steps:
step 3.1: establishing the grid map by using the environmental information of the rectangular area where the current vehicle j is located; and the length range of the grid map is recorded as Y MIN ,Y MAX ]And the width range is marked as [ X ] MIN ,X MAX ](ii) a Let the area of each grid in the grid map be denoted as C R
Step 3.2: in the grid map, the obstacle grid is represented by a black grid, the travel grid is represented by a white grid, and the risk level G (M, N) of the grid at the (M, N) position in the grid map is calculated using equation (5):
Figure BDA0003685857880000031
in the formula (5), M, N represents the abscissa and ordinate in the grid map, respectively, and M is x/C R ,N=y/C R (x, y) represents the position coordinates of an arbitrary point in the vehicle coordinate system, and α (M, N) represents the position at (M, N) under the influence of the potential fieldα (M, N) e [0, 1), and is obtained from formula (6):
Figure BDA0003685857880000032
in the formula (6), the reaction mixture is,
Figure BDA0003685857880000033
indicating rounding down, E (M, N) indicating the strength of the potential field at the (M, N) position, and Δ indicating the number of levels of the set risk level.
The step 4 comprises the following steps:
step 4.1: define action parameter set a ═ { a ═ a 0 ,a 1 ,a 2 ,…,a i ,…,a I In which a 0 Represents initial motion information of a vehicle j in the grid map, a i Representing the current vehicle j in the grid map according to the course angle for the ith step action parameter
Figure BDA0003685857880000041
Moving forward one grid;
define state parameter set s ═ { s ═ s 0 ,s 1 ,s 2 ,…,s i ,…,s I },s 0 Represents the initial state parameter, s, of the current vehicle j i Indicates the execution of the i-1 st step action a i-1 A latter state parameter, and s i ={(M,N) i ,G(M,N) i },(M,N) i Representing the execution of the ith step action parameter a i Coordinate point of the next current vehicle j in the grid map, G (M, N) i Representing the execution of the ith step action parameter a i Risk level of the back coordinate point; i represents the maximum number of steps;
step 4.2: initializing greedy probability as epsilon, attenuation coefficient as gamma, and capacity of an experience playback pool as n;
step 4.3: constructing an online network, comprising: an input layer, a hidden layer and an output layer, and initializing the weight parameters and the bias parameters of the online network;
step 4.4: the state parameter s of the ith step i Inputting into an input layer in the online network, wherein the output layer outputs each neuron to execute the ith action parameter a according to different course angles by using the formula (7) i Corresponding value function Q i
Q i =σ(W 2 ×Relu(W 1 s i +b 1 )+b 2 ) (7)
In the formula (7), W 1 As weight array of input layer to hidden layer, b 1 For an input layer to hidden layer bias array, W 2 Weight array for hidden layer to output layer, b 2 A bias array from a hidden layer to an output layer; relu is the activation function; sigma is a sigmod function;
step 4.5: the reward and punishment function of the depth reinforcement learning is defined by the following equations (8) to (11):
Figure BDA0003685857880000042
Figure BDA0003685857880000043
Figure BDA0003685857880000044
r i =w 1 r sm,i +w 2 r s,i +w 3 r end,i (11)
in formulae (8) to (11), r sm,i Reward value r for smoothness of the trajectory formed by the coordinate points traversed by the current vehicle j in step i s,i Reward value, r, for step i trajectory security end,i Reward value, r, for whether step I vehicle j reaches the end within maximum number of steps I i Total prize value, Δ δ, for step i vehicle j i The variation of the j course angle of the vehicle in the ith step is delta which is a threshold parameter of the variation of the course angle, and lambda 1 Proportional parameter, λ, for the reward value for trajectory safety 2 Bias parameter for reward value for track security, η 1 、η 2 Zeta, a reward parameter for smoothness of the trajectory under different conditions 1 And ζ 2 Reward parameter, w, for whether a track reaches an endpoint under different conditions 1 、w 2 And w 3 Representing the ratio of weights to different awards, (x) j,i ,y j,i ) And (x) end ,y end ) A coordinate point representing the current position of the current vehicle j at the ith step and a coordinate point representing the target position, d 1 ,d 2 ,d 3 And d 4 Are all distance thresholds;
step 4.6: generating a random number tau between 0 and 1, judging whether tau < epsilon is established, and if yes, obtaining a value function Q corresponding to each neuron i The course angle corresponding to the maximum function value is selected as the ith step action parameter executed by the current vehicle j, otherwise, the secondary function Q is selected i Randomly selecting a corresponding course angle as an action parameter executed by the current vehicle j;
step 4.7: training a deep neural network consisting of an online network and a target network;
step 4.7.1: creating an experience pool D for storing the state, action and reward information of the vehicle at each step;
the state parameter s of the current vehicle j at the ith step i Lower execution action parameter a i Obtaining the state parameter s of the step i +1 i+1 And the prize value r of step i i And forming a parameter form(s) i ,a i ,r i ,s i+1 );
When the number of the parameters in the experience pool is larger than n, replacing one parameter added at the earliest by one newly generated parameter;
step 4.7.2: constructing a target network with the same structure as the online network, and initializing the parameters of the target network into a weight array W of the online network 1 、W 2 And a bias array b 1 、b 2
Step 4.7.3: randomly extracting u parameter forms in the experience pool D, and obtaining the state parameter s of the (i + 1) th step of the current vehicle j i+1 Inputting the target network and obtaining the target network using equation (12)Each neuron of the output layer executes the (i + 1) th step action a according to different course angles i+1 Corresponding value function Q' i+1
Q′ i+1 =σ(W′ 2 ×Relu(W′ 1 s i+1 +b′ 1 )+b′ 2 ) (12)
W 'in the formula (12)' 1 And W' 2 Weight arrays, b 'of hidden and output layers of the target network respectively' 1 And b' 2 Respectively are bias arrays of a hidden layer and an output layer of the target network;
step 4.7.4: calculating the state parameter s of the current vehicle j in the ith step by using the formula (13) i Corresponding value function Q tag,i
Q tag,i =R i +γmax(Q′ i+1 ) (13)
In the formula (13), R i To execute the ith step action parameter a i The latter reward value, gamma, is the reward attenuation factor;
step 4.7.5: constructing a loss function loss by using the formula (14), training the online network by using a gradient descent method, and simultaneously calculating the loss function loss to update the parameter W of the online network 1 ,b 1 ,W 2 And b 2 Assigning parameters in the online network to a target network when the training times reach a fixed number;
loss=E((Q tag,i -Q real,i ) 2 ) (14)
in formula (14), Q real,i Representing the ith step action a in the form of the extracted u parameters i A corresponding value function; e represents expectation;
step 4.8: substituting the updated network parameters of the online network into the step 4.4-step 4.7 for iterative training until loss function loss is converged, thereby obtaining an optimal local path planning model;
step 4.9: and inputting the state parameters of the current vehicle j into the optimal local path planning model to obtain corresponding action parameters, recording the position coordinates of the vehicle in the grid map after executing the action parameters of each step, and converting the position coordinates into actual coordinates in a vehicle coordinate system so as to fit the actual coordinates, wherein the obtained fitted curve is the path planned by the current vehicle j.
Compared with the prior art, the invention has the beneficial effects that:
1. the grid map is constructed by utilizing the safety potential field, the limitation that the distance is taken as a single target in the traditional path planning is overcome, the safety, smoothness and traditional distance factors of the path are comprehensively considered, the environmental elements are simplified by utilizing the grid map, the grid map is easy to construct in an actual scene, and the planning of a short path is convenient;
2. compared with the traditional path planning algorithm reinforcement learning, the reinforcement learning algorithm has the advantages of exploration and development, can learn how to jump out local stable points under the condition that the traditional algorithm falls into local optimum, and can plan paths more timely and reliably in practical application compared with the traditional search and sampling algorithms;
3. the invention introduces the concept of a safety potential field, digitalizes more environmental factors and can deal with more and more complex driving scenes.
Drawings
FIG. 1 is a flow chart of the network training of the present invention;
fig. 2 is a schematic diagram of the action parameters that can be selected by the vehicle j in each step of the grid map.
Detailed Description
In this embodiment, a driving condition path planning method based on a safety potential field and a DQN algorithm can calculate the safety potential field distribution of the current environment of an automobile according to the current driving scene, thereby constructing a grid map, and finally design a reward function by using a depth reinforcement learning algorithm in combination with various factors, specifically including the following steps:
step 1, acquiring environmental information of a current vehicle j in an environment of a vehicle network, and acquiring state information of the current vehicle j through a vehicle sensor; the environment information of the vehicle j comprises the speed of the surrounding vehicle, the relative position relation with the vehicle j, road information, surrounding obstacle information and the like; the state information of the vehicle j comprises the speed and the course angle of the vehicle j;
step 2, establishing a vehicle coordinate system by taking the initial position of the current vehicle j as an original point, taking the advancing direction as a Y axis and taking the direction vertical to the Y axis as an X axis;
under a vehicle coordinate system, calculating the potential field intensity E of the road mark b of the current vehicle j to the current vehicle j according to the environment information and the state information R_bj Potential field intensity E of static obstacle c to current vehicle j D_cj And the potential field intensity E of the running vehicle d to the current vehicle j V_dj So as to obtain the total safe potential field intensity E of the current vehicle j;
step 2.1: calculating the potential field intensity E of the road mark b to the current vehicle j by using the formula (1) R_bj
Figure BDA0003685857880000071
In the formula (1), T b Is a parameter determined by the type of road sign; k is a radical of 1 Is a parameter, p b For the distance threshold between the road sign b and the current vehicle j, r bj Represents the distance vector between the road sign b and the current vehicle j, and r bj =(x j -x b ,y j -y b ),(x j ,y j ) Is the centroid position coordinate of the current vehicle j in the vehicle coordinate system, (x) b ,y b ) Position coordinates of the road sign identifier b under the vehicle coordinate system are obtained;
step 2.3: calculating the potential field intensity E of the static obstacle c to the current vehicle j by using the formula (2) D_cj
Figure BDA0003685857880000072
In the formula (2), ρ c Is a distance threshold between a static obstacle c and a current vehicle j, r cj Representing the distance between a static obstacle c and the current vehicle jVector, and r cj =(x j -x c ,y j -y c ) Wherein (x) c ,y c ) Is the centroid position coordinate, k, of the static obstacle c in the vehicle coordinate system 2 And G is a parameter, M c Is the mass of the static obstacle c;
step 2.4: calculating the potential field intensity E of the running vehicle d to the current vehicle j by using the formula (3) V_dj
Figure BDA0003685857880000073
In the formula (3), k 3 Is a parameter, v d Is the speed of the running vehicle d, r dj Represents a distance vector between the running vehicle d and the current vehicle j, and r dj =(x j -x d ,y j -y d ) Wherein (x) d ,y d ) For the coordinates of the position of the center of mass, p, of the vehicle d travelling under the vehicle coordinate system d Is a distance threshold between the running vehicle d and the current vehicle j d Is v d And r dj The angle between the clockwise directions;
step 2.5: calculating the total safe potential field intensity E suffered by the current vehicle j by using the formula (4) j
E j =E R_j +E V_j +E D_j (4)
In the formula (4), E R_j Identifying for each traffic the sum of the potential field vectors at the current vehicle j position, E D_j For the sum of the potential field vectors of the respective vehicles at the current position of the vehicle j, E V_j And (4) carrying out vector summation on the potential fields of the current position of the vehicle j for each running vehicle.
Step 3, constructing a grid map by using a grid method according to a potential field intensity distribution diagram corresponding to the total safety potential field intensity E;
step 3.1: establishing a grid map by using the environmental information of the rectangular area where the current vehicle j is located; and the length range of the grid map is recorded as Y MIN ,Y MAX ]And the width range is marked as [ X ] MIN ,X MAX ](ii) a Let each grid in grid mapArea of (A) is marked as C R In this example, Y is each MIN ,Y MAX ,X MIN ,X MAX Are respectively set as-10, 30, -10, 10, and the units are m and C R Is set to 0.5 and has the unit of m 2
Step 3.2: in the grid map, the obstacle grid is represented by a black grid, the travel grid is represented by a white grid, and the risk level G (M, N) of the grid at the (M, N) position in the grid map is calculated using equation (5):
Figure BDA0003685857880000081
in the formula (5), M, N represents the abscissa and ordinate of the grid map, respectively, and M is x/C R ,N=y/C R (x, y) represents the position coordinates of an arbitrary point in the vehicle coordinate system, α (M, N) represents the risk level at the (M, N) position under the influence of the potential field, α (M, N) ∈ [0, 1), and is obtained by equation (6):
Figure BDA0003685857880000082
in the formula (6), the reaction mixture is,
Figure BDA0003685857880000083
indicating rounding down, E (M, N) indicating the strength of the potential field at the (M, N) position, and Δ indicating the number of levels of risk set, in this example Δ is taken to be 8, considering that each level of security should have a certain degree of discrimination and no significant faults.
Step 4, defining a state parameter set s and an action parameter set a for the current vehicle j to run;
constructing a deep neural network and initializing network parameters;
training the deep neural network based on the state parameter set s and the action parameter set a to obtain an optimal network model; so as to obtain the state parameter s of the current vehicle j at the ith step i Inputting the optimal network model and outputting the motion parameter a of the ith step i For the currentFig. 1 shows a flowchart of deep reinforcement learning for planning a travel path of a vehicle j.
Step 4.1: define action parameter set a ═ { a ═ a 0 ,a 1 ,a 2 ,…,a i ,…,a I In which a 0 Representing the initial motion parameter, a, of the vehicle j in the grid map i For the ith step action parameter, representing the current vehicle j in the grid map according to the course angle
Figure BDA0003685857880000094
Moving a grid forward, wherein the action parameters which can be selected in each step are shown in figure 2 according to the characteristics of the grid map;
define state parameter set s ═ s 0 ,s 1 ,s 2 ,…,s i ,…,s I },s 0 Represents the initial state parameter, s, of the current vehicle j i Indicates the execution of the i-1 st step action a i-1 A latter state parameter, and s i ={(M,N) i ,G(M,N) i },(M,N) i Representing the execution of the ith step action parameter a i Coordinate point of the next current vehicle j in the grid map, G (M, N) i Representing the execution of the ith step action parameter a i Risk level of the back coordinate point; i represents the maximum number of steps;
step 4.2: initializing greedy probability as epsilon, attenuation coefficient as gamma, and capacity of an experience playback pool as n;
step 4.3: constructing an online network, comprising: an input layer, a hidden layer and an output layer, and initializing a weight parameter and a bias parameter of an online network;
in this example, the state consists of the horizontal and vertical coordinates of the vehicle j in the grid map and the risk level of the corresponding position, and the action of each step has a different value in 8 in the course angle in the grid map, so that the input layer has 3 neurons and the output layer has 8 neurons;
step 4.4: the state parameter s of the ith step i Inputting into an input layer in the online network, and outputting the ith action parameter a by each neuron according to different course angles by using an output layer according to the formula (7) i Corresponding value function Q i
Q i =σ(W 2 ×Relu(W 1 s i +b 1 )+b 2 ) (7)
In the formula (7), W 1 As an array of weights from input layer to hidden layer, b 1 For an input layer to hidden layer bias array, W 2 Weight array for hidden layer to output layer, b 2 A bias array from a hidden layer to an output layer; relu is the activation function; sigma is a sigmod function;
step 4.5: the reward and punishment function of the depth reinforcement learning is defined by the following equations (8) to (11):
Figure BDA0003685857880000091
Figure BDA0003685857880000092
Figure BDA0003685857880000093
r i =w 1 r sm,i +w 2 r s,i +w 3 r end,i (11)
in formulae (8) to (11), r sm,i Reward value r for smoothness of the trajectory formed by the coordinate points traversed by the current vehicle j in step i s,i Reward value, r, for step i trajectory security s,i The safety training method is obtained by calculation according to the safety levels, the total number of the safety levels is more, therefore, the form of a function is adopted, the safety of all levels is considered, the training effect and the training efficiency are improved, and r end,i Reward value, r, for whether step I vehicle j reaches the end within maximum number of steps I i Total reward value, Δ δ, for step i vehicle j i The variation of the j course angle of the vehicle in the ith step is shown, delta is a threshold parameter of the variation of the course angle, and the delta is set to be 45 degrees in the example due to the characteristics of the grid map so as to ensure that the variation of the course angle of each step is not too large under the normal condition, thereby influencing the trackSmoothness of the last formed path, λ 1 Proportional parameter, λ, for the reward value for trajectory safety 2 Bias parameter for reward value for track security, η 1 、η 2 Zeta reward parameter for smoothness of track under different conditions 1 And ζ 2 Reward parameter, w, for whether a track reaches an endpoint under different conditions 1 、w 2 And w 3 Representing the proportion of the weights to different rewards, and calculating different weights for the reward function with no stop in the total reward value, so that the training effect can be expected more quickly, (x) j,i ,y j,i ) And (x) end ,y end ) Respectively representing a coordinate point of the current position of the ith step of the current vehicle j and a coordinate point of the target position, d 1 ,d 2 ,d 3 And d 4 Distance thresholds, which give a reward value when the vehicle approaches the end point, taking into account that in the initial training, the vehicle may rarely reach the midpoint;
step 4.6: in the actual training, the value of the value function of a certain step is larger, the value of the value function trained later is smaller, and the value function falls into the local optimal solution, so a greedy strategy is adopted to generate a random number tau between 0 and 1, whether tau is greater than epsilon is judged, and if tau is greater than epsilon, the value function Q corresponding to each neuron is selected i The course angle corresponding to the maximum function value is selected as the ith step action parameter executed by the current vehicle j, otherwise, the secondary function Q is selected i Randomly selecting a corresponding course angle as an action parameter executed by the current vehicle j;
step 4.7: training a deep neural network consisting of an online network and a target network;
step 4.7.1: creating an experience pool D for storing the state, action and reward information of the vehicle at each step;
the state parameter s of the current vehicle j in the ith step i Lower execution action parameter a i Obtaining the state parameter s of the step i +1 i+1 And the prize value r of step i i And forming a parameter form(s) i ,a i ,r i ,s i+1 );
When the number of the parameters in the experience pool is larger than n, replacing an earliest added parameter by a newly generated parameter;
step 4.7.2: constructing a target network with the same structure as the online network, and initializing the parameters of the target network into a weight array W of the online network 1 、W 2 And an offset array b 1 、b 2
Step 4.7.3: randomly extracting u parameter forms in the experience pool D, and obtaining the state parameter s of the (i + 1) th step of the current vehicle j i+1 Inputting a target network, and obtaining each neuron of an output layer of the target network by using the formula (12) to execute the (i + 1) step action a according to different course angles i+1 Corresponding value function Q' i+1
Q′ i+1 =σ(W′ 2 ×Relu(W′ 1 s i+1 +b′ 1 )+b′ 2 ) (12)
In the formula (12), W 1 ' and W 2 ' weight arrays for hidden and output layers of the target network, respectively, b 1 ' and b 2 ' bias arrays for hidden and output layers of the target network, respectively;
step 4.7.4: calculating the state parameter s of the current vehicle j in the ith step by using the formula (13) i Corresponding value function Q tag,i
Q tag,i =R i +γmax(Q′ i+1 ) (13)
In the formula (13), R i To execute the ith step action parameter a i The latter reward value, gamma, is the reward attenuation factor; q tag,i The calculation of (2) uses a Markov decision process, the value range of gamma is 0-1, and when gamma is 0, the current state outputs Q tag,i The value of the value function is calculated by considering the following neural network when the value of the gamma value tends to 1;
step 4.7.5: constructing a loss function loss by using the formula (14), training the online network by using a gradient descent method, and simultaneously calculating the loss function loss to update the parameter W of the online network 1 ,b 1 ,W 2 And b 2 Assigning parameters in the online network to a target network when the training times reach a fixed number;
loss=E((Q tag,i -Q real,i ) 2 ) (14)
in formula (14), Q real,i Representing the ith step action a in the form of the extracted u parameters i A corresponding value function; e represents expectation;
step 4.8: substituting the updated network parameters of the online network into the step 4.4-step 4.7 for iterative training until loss function loss is converged, thereby obtaining an optimal local path planning model;
step 4.9: and inputting the state parameters of the current vehicle j into the optimal local path planning model to obtain corresponding action parameters, recording the position coordinates of the vehicle in the grid map after executing the action parameters of each step, and converting the position coordinates into actual coordinates in a vehicle coordinate system so as to fit the actual coordinates, wherein the obtained fitted curve is the path planned by the current vehicle j.

Claims (4)

1. A driving condition local path planning method based on a safety potential field and a DQN algorithm is characterized by comprising the following steps:
step 1, acquiring environmental information of a current vehicle j in an environment of a vehicle network, and acquiring state information of the current vehicle j through a vehicle sensor;
step 2, establishing a vehicle coordinate system by taking the initial position of the current vehicle j as an original point, taking the advancing direction as a Y axis and taking the direction vertical to the Y axis as an X axis;
under the vehicle coordinate system, calculating the potential field intensity E of the road mark b of the current vehicle j to the current vehicle j according to the environment information and the state information R_bj Potential field intensity E of static obstacle c to current vehicle j D_cj And the potential field intensity E of the running vehicle d to the current vehicle j V_dj So as to obtain the total safe potential field intensity E of the current vehicle j;
step 3, constructing a grid map by using a grid method according to the potential field intensity distribution map corresponding to the total safety potential field intensity E;
step 4, defining a state parameter set s and an action parameter set a of the current vehicle j in driving;
constructing a deep neural network and initializing network parameters;
training the deep neural network based on a state parameter set s and an action parameter set a to obtain an optimal network model; so as to obtain the state parameter s of the current vehicle j at the ith step i Inputting the optimal network model and outputting the motion parameter a of the ith step i Used for planning the driving path of the current vehicle j.
2. The method for planning the local path of the driving condition based on the safety potential field and the DQN algorithm according to claim 1, wherein the step 2 comprises:
step 2.1: calculating the potential field intensity E of the road mark b to the current vehicle j by using the formula (1) R_bj
Figure FDA0003685857870000011
In the formula (1), T b Is a parameter determined by the type of road sign; k is a radical of 1 Is a parameter, p b For the distance threshold between the road sign b and the current vehicle j, r bj Represents the distance vector between the road sign b and the current vehicle j, and r bj =(x j -x b ,y j -y b ),(x j ,y j ) Is the centroid position coordinate of the current vehicle j in the vehicle coordinate system, (x) b ,y b ) Identifying the position coordinates of the road sign b under the vehicle coordinate system;
step 2.3: calculating the potential field intensity E of the static obstacle c to the current vehicle j by using the formula (2) D_cj
Figure FDA0003685857870000021
In the formula (2), ρ c Is a distance threshold between a static obstacle c and a current vehicle j, r cj Represents a distance vector between a static obstacle c and a current vehicle j, and r cj =(x j -x c ,y j -y c ) Wherein (x) c ,y c ) Is the coordinate of the centroid position of the static obstacle c under the vehicle coordinate system, k 2 And G is a parameter, M c Is the mass of the static obstacle c;
step 2.4: calculating the potential field intensity E of the running vehicle d to the current vehicle j by using the formula (3) V_dj
Figure FDA0003685857870000022
In the formula (3), k 3 Is a parameter, v d Is the speed of the running vehicle d, r dj Represents a distance vector between the running vehicle d and the current vehicle j, and r dj =(x j -x d ,y j -y d ) Wherein (x) d ,y d ) Is the centroid position coordinate, rho, of a running vehicle d in the vehicle coordinate system d Is a distance threshold between the running vehicle d and the current vehicle j d Is v d And r dj The angle between clockwise directions;
step 2.5: calculating the total safe potential field intensity E suffered by the current vehicle j by using the formula (4) j
E j =E R_j +E V_j +E D_j (4)
In formula (4), E R_j Identifying for each traffic the sum of the potential field vectors at the current vehicle j position, E D_j For the sum of the potential field vectors of the respective vehicles at the current position of the vehicle j, E V_j And (4) carrying out vector summation on the potential fields of the current position of the vehicle j for each running vehicle.
3. The method for planning the local path of the driving condition based on the safety potential field and the DQN algorithm according to claim 1, wherein the step 3 comprises:
step 3.1: establishing the grid map by using the environmental information of the rectangular area where the current vehicle j is located; and the length range of the grid map is recorded as [ Y ] MIN ,Y MAX ]And the width range is marked as [ X ] MIN ,X MAX ](ii) a Let the area of each grid in the grid map be denoted as C R
Step 3.2: in the grid map, the obstacle grid is represented by a black grid, the travel grid is represented by a white grid, and the risk level G (M, N) of the grid at the (M, N) position in the grid map is calculated using equation (5):
Figure FDA0003685857870000023
in the formula (5), M, N represents the abscissa and ordinate in the grid map, respectively, and M is x/C R ,N=y/C R (x, y) represents the position coordinates of any point in the vehicle coordinate system, α (M, N) represents the risk level at the (M, N) position under the influence of the potential field, α (M, N) ∈ [0, 1), and is obtained by equation (6):
Figure FDA0003685857870000031
in the formula (6), the reaction mixture is,
Figure FDA0003685857870000032
indicating rounding down, E (M, N) indicating the strength of the potential field at the (M, N) position, and Δ indicating the number of levels of the set risk level.
4. The method for planning the local path of the driving condition based on the safety potential field and the DQN algorithm according to claim 1, wherein the step 4 comprises:
step 4.1: define action parameter set a ═ a 0 ,a 1 ,a 2 ,…,a i ,…,a I In which a 0 Represents initial motion information of a vehicle j in the grid map, a i Representing the current vehicle j in the grid map according to the course angle for the ith step action parameter
Figure FDA0003685857870000033
Moving forward one grid;
define state parameter set s ═ { s ═ s 0 ,s 1 ,s 2 ,…,s i ,…,s I },s 0 Represents the initial state parameter, s, of the current vehicle j i Indicates the execution of the i-1 st step action a i-1 A latter state parameter, and s i ={(M,N) i ,G(M,N) i },(M,N) i Representing the execution of the ith step action parameter a i Coordinate point of the next current vehicle j in the grid map, G (M, N) i Representing the execution of the ith step action parameter a i Risk level of the back coordinate point; i represents the maximum number of steps;
step 4.2: initializing greedy probability as epsilon, attenuation coefficient as gamma, and capacity of an experience playback pool as n;
step 4.3: constructing an online network, comprising: an input layer, a hidden layer and an output layer, and initializing a weight parameter and a bias parameter of the online network;
step 4.4: the state parameter s of the step i is measured i Inputting into an input layer in the online network, wherein the output layer outputs each neuron to execute the ith action parameter a according to different course angles by using the formula (7) i Corresponding value function Q i
Q i =σ(W 2 ×Relu(W 1 s i +b 1 )+b 2 ) (7)
In the formula (7), W 1 As an array of weights from input layer to hidden layer, b 1 For an input layer to hidden layer bias array, W 2 Weight array for hidden layer to output layer, b 2 A bias array from a hidden layer to an output layer; relu is the activation function; sigma is a sigmod function;
step 4.5: the reward and punishment function of the depth reinforcement learning is defined by the following equations (8) to (11):
Figure FDA0003685857870000041
Figure FDA0003685857870000042
Figure FDA0003685857870000043
r i =w 1 r sm,i +w 2 r s,i +w 3 r end,i (11)
in formulae (8) to (11), r sm,i Reward value r for smoothness of the trajectory formed by the coordinate points traversed by the current vehicle j in step i s,i Reward value, r, for step i trajectory security end,i Reward value, r, for whether step I vehicle j reaches the end within maximum number of steps I i Total prize value, Δ δ, for step i vehicle j i The variation of the j course angle of the vehicle in the ith step is delta which is a threshold parameter of the variation of the course angle, and lambda 1 Proportional parameter, λ, for the reward value for trajectory safety 2 Bias parameter for reward value for track security, η 1 、η 2 Zeta, a reward parameter for smoothness of the trajectory under different conditions 1 And ζ 2 Reward parameter, w, for whether a track reaches an endpoint under different conditions 1 、w 2 And w 3 Representing the ratio of weights to different awards, (x) j,i ,y j,i ) And (x) end ,y end ) Respectively representing a coordinate point of the current position of the ith step of the current vehicle j and a coordinate point of the target position, d 1 ,d 2 ,d 3 And d 4 Are all distance thresholds;
step 4.6: generating a random number tau between 0 and 1, judging whether tau < epsilon is true, if yes, obtaining a value function Q corresponding to each neuron i The course angle corresponding to the maximum function value is selected as the ith step action parameter executed by the current vehicle j, otherwise, the secondary function Q is selected i Randomly selecting a corresponding course angle as an action parameter executed by the current vehicle j;
step 4.7: training a deep neural network consisting of an online network and a target network;
step 4.7.1: creating an experience pool D for storing the state, action and reward information of the vehicle at each step;
the state parameter s of the current vehicle j in the ith step i Lower execution action parameter a i Obtaining the state parameter s of the (i + 1) th step i+1 And the prize value r of step i i And forming a parameter form(s) i ,a i ,r i ,s i+1 );
When the number of the parameters in the experience pool is larger than n, replacing an earliest added parameter by a newly generated parameter;
step 4.7.2: constructing a target network with the same structure as the online network, and initializing the parameters of the target network into a weight array W of the online network 1 、W 2 And a bias array b 1 、b 2
Step 4.7.3: randomly extracting u parameter forms in an experience pool D, and converting the state parameter s of the (i + 1) th step of the current vehicle j i+1 Inputting the target network, and obtaining each neuron of an output layer of the target network by using a formula (12) to execute an i +1 step action a according to different course angles i+1 Corresponding value function Q' i+1
Q′ i+1 =σ(W 2 ′×Relu(W 1 ′s i+1 +b 1 ′)+b 2 ′) (12)
In the formula (12), W 1 ' and W 2 ' weight arrays for hidden and output layers of the target network, respectively, b 1 ' and b 2 ' bias arrays for hidden and output layers of the target network, respectively;
step 4.7.4: calculating the state parameter s of the current vehicle j in the ith step by using the formula (13) i Corresponding value function Q tag,i
Q tag,i =R i +γmax(Q′ i+1 ) (13)
In the formula (13), R i To execute the ith step action parameter a i The latter reward value, gamma, is the reward attenuation factor;
step 4.7.5: constructing a loss function loss by using a formula (14), training the online network by using a gradient descent method, and calculating the loss function loss to update the parameter W of the online network 1 ,b 1 ,W 2 And b 2 Assigning parameters in the online network to a target network when the training times reach a fixed number;
loss=E((Q tag,i -Q real,i ) 2 ) (14)
in formula (14), Q real,i Representing the ith step action a in the form of the extracted u parameters i A corresponding value function; e represents expectation;
step 4.8: substituting the updated network parameters of the online network into the step 4.4-step 4.7 for iterative training until loss function loss is converged, thereby obtaining an optimal local path planning model;
step 4.9: and inputting the state parameters of the current vehicle j into the optimal local path planning model to obtain corresponding action parameters, recording the position coordinates of the vehicle in the grid map after executing the action parameters of each step, and converting the position coordinates into actual coordinates in a vehicle coordinate system so as to fit the actual coordinates, wherein the obtained fitted curve is the path planned by the current vehicle j.
CN202210650446.3A 2022-06-09 2022-06-09 Driving condition local path planning method based on safety potential field and DQN algorithm Pending CN115031753A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210650446.3A CN115031753A (en) 2022-06-09 2022-06-09 Driving condition local path planning method based on safety potential field and DQN algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210650446.3A CN115031753A (en) 2022-06-09 2022-06-09 Driving condition local path planning method based on safety potential field and DQN algorithm

Publications (1)

Publication Number Publication Date
CN115031753A true CN115031753A (en) 2022-09-09

Family

ID=83122468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210650446.3A Pending CN115031753A (en) 2022-06-09 2022-06-09 Driving condition local path planning method based on safety potential field and DQN algorithm

Country Status (1)

Country Link
CN (1) CN115031753A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117601904A (en) * 2024-01-22 2024-02-27 中国第一汽车股份有限公司 Vehicle running track planning method and device, vehicle and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117601904A (en) * 2024-01-22 2024-02-27 中国第一汽车股份有限公司 Vehicle running track planning method and device, vehicle and storage medium
CN117601904B (en) * 2024-01-22 2024-05-14 中国第一汽车股份有限公司 Vehicle running track planning method and device, vehicle and storage medium

Similar Documents

Publication Publication Date Title
CN111061277B (en) Unmanned vehicle global path planning method and device
Cai et al. High-speed autonomous drifting with deep reinforcement learning
US11036232B2 (en) Iterative generation of adversarial scenarios
CN107063280A (en) A kind of intelligent vehicle path planning system and method based on control sampling
CN113044064B (en) Vehicle self-adaptive automatic driving decision method and system based on meta reinforcement learning
CN112784485B (en) Automatic driving key scene generation method based on reinforcement learning
CN112508164B (en) End-to-end automatic driving model pre-training method based on asynchronous supervised learning
CN113715842B (en) High-speed moving vehicle control method based on imitation learning and reinforcement learning
CN115257745A (en) Automatic driving lane change decision control method based on rule fusion reinforcement learning
CN116804879A (en) Robot path planning framework method for improving dung beetle algorithm and fusing DWA algorithm
CN115344052B (en) Vehicle path control method and control system based on improved group optimization algorithm
CN115031753A (en) Driving condition local path planning method based on safety potential field and DQN algorithm
CN113386790A (en) Automatic driving decision-making method for cross-sea bridge road condition
Sun et al. Human-like highway trajectory modeling based on inverse reinforcement learning
CN113487889B (en) Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent
Yang et al. Vehicle trajectory prediction based on LSTM network
CN116448134B (en) Vehicle path planning method and device based on risk field and uncertain analysis
CN116127853A (en) Unmanned driving overtaking decision method based on DDPG (distributed data base) with time sequence information fused
CN116360454A (en) Robot path collision avoidance planning method based on deep reinforcement learning in pedestrian environment
CN111443701A (en) Unmanned vehicle/robot behavior planning method based on heterogeneous deep learning
CN113033902B (en) Automatic driving lane change track planning method based on improved deep learning
CN114701517A (en) Multi-target complex traffic scene automatic driving solution based on reinforcement learning
CN114527759A (en) End-to-end driving method based on layered reinforcement learning
Anderson et al. Autonomous navigation via a deep Q network with one-hot image encoding
CN114779764B (en) Vehicle reinforcement learning movement planning method based on driving risk analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination