CN115031753A

CN115031753A - Driving condition local path planning method based on safety potential field and DQN algorithm

Info

Publication number: CN115031753A
Application number: CN202210650446.3A
Authority: CN
Inventors: 黄鹤; 周宇; 钱同林; 程腾; 白先旭; 付梦园; 张峰
Original assignee: Hefei University of Technology; Intelligent Manufacturing Institute of Hefei University Technology
Current assignee: Hefei University of Technology; Intelligent Manufacturing Institute of Hefei University Technology
Priority date: 2022-06-09
Filing date: 2022-06-09
Publication date: 2022-09-09

Abstract

The invention discloses a driving condition local path planning method based on a safety potential field and a DQN algorithm, which comprises the following steps: 1. acquiring surrounding environment information of the vehicle state information; 2. constructing an environment safety potential field model according to the collected environment information; 3. constructing an environment grid map by using a grid method according to the calculated potential field intensity distribution map; 4. initializing deep reinforcement learning parameters, constructing a deep neural network, training the deep neural network, obtaining an optimal path planning model, and performing path planning. According to the method, the grid map is constructed by using the safety potential field theory, and the local path planning of the automobile in various scenes is completed by deep reinforcement learning, so that the safety of the automobile is higher in the driving process, the automobile can pass more efficiently, and the safety navigation of the intelligent automobile is guaranteed.

Description

Driving condition local path planning method based on safety potential field and DQN algorithm

Technical Field

The invention relates to the field of intelligent automobile safety and path planning, in particular to a method for planning a local condition path of a driver based on a safety potential field and a DQN algorithm.

Background

The path planning is the most critical link for the autonomous navigation of the intelligent automobile. The method mainly aims to search an optimal or suboptimal path from a starting point to a target point under different driving scenes and guarantee the safety of the path. Intelligent driving has been widely applied to relatively simple scenes such as various mining areas and industrial parks, but for relatively complex driving scenes such as actual roads, environmental factors such as obstacles, traffic signs, ground road conditions and other driving vehicles need to be considered, which brings challenges to the research of path planning. With the arrival of the artificial intelligence era, the environment faced by the path planning field becomes more and more complex, which requires the path planning algorithm to have the ability of making rapid changes and the ability of flexible learning to the complex environment. The existing path planning algorithm still has the problems that the vehicle is trapped in local optimization during path planning, so that the whole path cannot be completely planned, and the existing path planning algorithm cannot adapt to complex and changeable scenes.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a driving condition local path planning method based on a safety potential field and a DQN algorithm, so that a grid map is constructed by using a safety potential field theory, and local path planning of an automobile in various scenes is completed by deep reinforcement learning, so that the automobile is higher in safety and more efficient to pass in the driving process, and a guarantee is provided for the safe navigation of an intelligent automobile.

In order to achieve the purpose of the invention, the following technical scheme is adopted:

the invention relates to a driving condition local path planning method based on a safety potential field and a DQN algorithm, which is characterized by comprising the following steps of:

step 1, acquiring environmental information of a current vehicle j in an environment of a vehicle network, and acquiring state information of the current vehicle j through a vehicle sensor;

step 2, establishing a vehicle coordinate system by taking the initial position of the current vehicle j as an original point, taking the advancing direction as a Y axis and taking the direction vertical to the Y axis as an X axis;

under the vehicle coordinate system, calculating the potential field intensity E of the road mark b of the current vehicle j to the current vehicle j according to the environment information and the state information _{R_bj} Potential field intensity E of static obstacle c to current vehicle j _{D_cj} And the potential field intensity E of the running vehicle d to the current vehicle j _{V_dj} So as to obtain the total safe potential field intensity E of the current vehicle j;

step 3, constructing a grid map by using a grid method according to the potential field intensity distribution map corresponding to the total safety potential field intensity E;

step 4, defining a state parameter set s and an action parameter set a for the current vehicle j to run;

constructing a deep neural network and initializing network parameters;

training the deep neural network based on a state parameter set s and an action parameter set a to obtain an optimal network model; so as to obtain the state parameter s of the current vehicle j at the ith step _i Inputting the optimal network model and outputting the motion parameter a of the ith step _i For planning the driving path of the current vehicle j.

The invention discloses a driving condition local path planning method based on a safety potential field and a DQN algorithm, which is characterized in that the step 2 comprises the following steps:

step 2.1: calculating the potential field intensity E of the road mark b to the current vehicle j by using the formula (1) _{R_bj} ：

In the formula (1), T _b Is a parameter determined by the type of road sign; k is a radical of ₁ Is a parameter, p _b For the distance threshold, r, between the road marking b and the current vehicle j _bj Represents the distance vector between the road sign b and the current vehicle j, and r _bj ＝(x _j -x _b ,y _j -y _b )，(x _j ,y _j ) Is the centroid position coordinate of the current vehicle j in the vehicle coordinate system, (x) _b ,y _b ) Identifying the position coordinates of the road sign b under the vehicle coordinate system;

step 2.3: calculating the potential field intensity E of the static obstacle c to the current vehicle j by using the formula (2) _{D_cj} ：

In the formula (2) (. rho) _c Is a distance threshold between a static obstacle c and a current vehicle j, r _cj Represents a distance vector between a static obstacle c and a current vehicle j, and r _cj ＝(x _j -x _c ,y _j -y _c ) Wherein (x) _c ,y _c ) Is the centroid position coordinate, k, of the static obstacle c in the vehicle coordinate system ₂ And G is a parameter, M _c Is the mass of the static obstacle c;

step 2.4: calculating the potential field intensity E of the running vehicle d to the current vehicle j by using the formula (3) _{V_dj} ：

In the formula (3), k ₃ Is a parameter that is a function of,v _d is the speed of the running vehicle d, r _dj Represents a distance vector between the running vehicle d and the current vehicle j, and r _dj ＝(x _j -x _d ,y _j -y _d ) Wherein (x) _d ,y _d ) Is the centroid position coordinate, rho, of a running vehicle d in the vehicle coordinate system _d Is a distance threshold between the running vehicle d and the current vehicle j _d Is v _d And r _dj The angle between clockwise directions;

step 2.5: calculating the total safe potential field intensity E suffered by the current vehicle j by using the formula (4) _j ：

E _j ＝E _{R_j} +E _{V_j} +E _{D_j} (4)

In the formula (4), E _{R_j} Identifying for each traffic the sum of the potential field vectors at the current vehicle j position, E _{D_j} For the sum of the potential field vectors of the respective vehicles at the current position of the vehicle j, E _{V_j} And (4) carrying out vector summation on the potential fields of the current position of the vehicle j for each running vehicle.

The step 3 comprises the following steps:

step 3.1: establishing the grid map by using the environmental information of the rectangular area where the current vehicle j is located; and the length range of the grid map is recorded as Y _MIN ,Y _MAX ]And the width range is marked as [ X ] _MIN ,X _MAX ](ii) a Let the area of each grid in the grid map be denoted as C _R ；

Step 3.2: in the grid map, the obstacle grid is represented by a black grid, the travel grid is represented by a white grid, and the risk level G (M, N) of the grid at the (M, N) position in the grid map is calculated using equation (5):

in the formula (5), M, N represents the abscissa and ordinate in the grid map, respectively, and M is x/C _R ，N＝y/C _R (x, y) represents the position coordinates of an arbitrary point in the vehicle coordinate system, and α (M, N) represents the position at (M, N) under the influence of the potential fieldα (M, N) e [0, 1), and is obtained from formula (6):

in the formula (6), the reaction mixture is,

indicating rounding down, E (M, N) indicating the strength of the potential field at the (M, N) position, and Δ indicating the number of levels of the set risk level.

The step 4 comprises the following steps:

step 4.1: define action parameter set a ═ { a ═ a ₀ ,a ₁ ,a ₂ ,…,a _i ,…,a _I In which a ₀ Represents initial motion information of a vehicle j in the grid map, a _i Representing the current vehicle j in the grid map according to the course angle for the ith step action parameter

Moving forward one grid;

define state parameter set s ═ { s ═ s ₀ ,s ₁ ,s ₂ ,…,s _i ,…,s _I }，s ₀ Represents the initial state parameter, s, of the current vehicle j _i Indicates the execution of the i-1 st step action a _i-1 A latter state parameter, and s _i ＝{(M,N) _i ,G(M,N) _i }，(M,N) _i Representing the execution of the ith step action parameter a _i Coordinate point of the next current vehicle j in the grid map, G (M, N) _i Representing the execution of the ith step action parameter a _i Risk level of the back coordinate point; i represents the maximum number of steps;

step 4.2: initializing greedy probability as epsilon, attenuation coefficient as gamma, and capacity of an experience playback pool as n;

step 4.3: constructing an online network, comprising: an input layer, a hidden layer and an output layer, and initializing the weight parameters and the bias parameters of the online network;

step 4.4: the state parameter s of the ith step _i Inputting into an input layer in the online network, wherein the output layer outputs each neuron to execute the ith action parameter a according to different course angles by using the formula (7) _i Corresponding value function Q _i ：

Q _i ＝σ(W ₂ ×Relu(W ₁ s _i +b ₁ )+b ₂ ) (7)

In the formula (7), W ₁ As weight array of input layer to hidden layer, b ₁ For an input layer to hidden layer bias array, W ₂ Weight array for hidden layer to output layer, b ₂ A bias array from a hidden layer to an output layer; relu is the activation function; sigma is a sigmod function;

step 4.5: the reward and punishment function of the depth reinforcement learning is defined by the following equations (8) to (11):

r _i ＝w ₁ r _sm,i +w ₂ r _s,i +w ₃ r _end,i (11)

in formulae (8) to (11), r _sm,i Reward value r for smoothness of the trajectory formed by the coordinate points traversed by the current vehicle j in step i _s,i Reward value, r, for step i trajectory security _end,i Reward value, r, for whether step I vehicle j reaches the end within maximum number of steps I _i Total prize value, Δ δ, for step i vehicle j _i The variation of the j course angle of the vehicle in the ith step is delta which is a threshold parameter of the variation of the course angle, and lambda ₁ Proportional parameter, λ, for the reward value for trajectory safety ₂ Bias parameter for reward value for track security, η ₁ 、η ₂ Zeta, a reward parameter for smoothness of the trajectory under different conditions ₁ And ζ ₂ Reward parameter, w, for whether a track reaches an endpoint under different conditions ₁ 、w ₂ And w ₃ Representing the ratio of weights to different awards, (x) _j,i ,y _j,i ) And (x) _end ,y _end ) A coordinate point representing the current position of the current vehicle j at the ith step and a coordinate point representing the target position, d ₁ ，d ₂ ，d ₃ And d ₄ Are all distance thresholds;

step 4.6: generating a random number tau between 0 and 1, judging whether tau < epsilon is established, and if yes, obtaining a value function Q corresponding to each neuron _i The course angle corresponding to the maximum function value is selected as the ith step action parameter executed by the current vehicle j, otherwise, the secondary function Q is selected _i Randomly selecting a corresponding course angle as an action parameter executed by the current vehicle j;

step 4.7: training a deep neural network consisting of an online network and a target network;

step 4.7.1: creating an experience pool D for storing the state, action and reward information of the vehicle at each step;

the state parameter s of the current vehicle j at the ith step _i Lower execution action parameter a _i Obtaining the state parameter s of the step i +1 _i+1 And the prize value r of step i _i And forming a parameter form(s) _i ,a _i ,r _i ,s _i+1 )；

When the number of the parameters in the experience pool is larger than n, replacing one parameter added at the earliest by one newly generated parameter;

step 4.7.2: constructing a target network with the same structure as the online network, and initializing the parameters of the target network into a weight array W of the online network ₁ 、W ₂ And a bias array b ₁ 、b ₂ ；

Step 4.7.3: randomly extracting u parameter forms in the experience pool D, and obtaining the state parameter s of the (i + 1) th step of the current vehicle j _i+1 Inputting the target network and obtaining the target network using equation (12)Each neuron of the output layer executes the (i + 1) th step action a according to different course angles _i+1 Corresponding value function Q' _i+1 ：

Q′ _i+1 ＝σ(W′ ₂ ×Relu(W′ ₁ s _i+1 +b′ ₁ )+b′ ₂ ) (12)

W 'in the formula (12)' ₁ And W' ₂ Weight arrays, b 'of hidden and output layers of the target network respectively' ₁ And b' ₂ Respectively are bias arrays of a hidden layer and an output layer of the target network;

step 4.7.4: calculating the state parameter s of the current vehicle j in the ith step by using the formula (13) _i Corresponding value function Q _tag,i ：

Q _tag,i ＝R _i +γmax(Q′ _i+1 ) (13)

In the formula (13), R _i To execute the ith step action parameter a _i The latter reward value, gamma, is the reward attenuation factor;

step 4.7.5: constructing a loss function loss by using the formula (14), training the online network by using a gradient descent method, and simultaneously calculating the loss function loss to update the parameter W of the online network ₁ ，b ₁ ，W ₂ And b ₂ Assigning parameters in the online network to a target network when the training times reach a fixed number;

loss＝E((Q _tag,i -Q _real,i ) ² ) (14)

in formula (14), Q _real,i Representing the ith step action a in the form of the extracted u parameters _i A corresponding value function; e represents expectation;

step 4.8: substituting the updated network parameters of the online network into the step 4.4-step 4.7 for iterative training until loss function loss is converged, thereby obtaining an optimal local path planning model;

step 4.9: and inputting the state parameters of the current vehicle j into the optimal local path planning model to obtain corresponding action parameters, recording the position coordinates of the vehicle in the grid map after executing the action parameters of each step, and converting the position coordinates into actual coordinates in a vehicle coordinate system so as to fit the actual coordinates, wherein the obtained fitted curve is the path planned by the current vehicle j.

Compared with the prior art, the invention has the beneficial effects that:

1. the grid map is constructed by utilizing the safety potential field, the limitation that the distance is taken as a single target in the traditional path planning is overcome, the safety, smoothness and traditional distance factors of the path are comprehensively considered, the environmental elements are simplified by utilizing the grid map, the grid map is easy to construct in an actual scene, and the planning of a short path is convenient;

2. compared with the traditional path planning algorithm reinforcement learning, the reinforcement learning algorithm has the advantages of exploration and development, can learn how to jump out local stable points under the condition that the traditional algorithm falls into local optimum, and can plan paths more timely and reliably in practical application compared with the traditional search and sampling algorithms;

3. the invention introduces the concept of a safety potential field, digitalizes more environmental factors and can deal with more and more complex driving scenes.

Drawings

FIG. 1 is a flow chart of the network training of the present invention;

fig. 2 is a schematic diagram of the action parameters that can be selected by the vehicle j in each step of the grid map.

Detailed Description

In this embodiment, a driving condition path planning method based on a safety potential field and a DQN algorithm can calculate the safety potential field distribution of the current environment of an automobile according to the current driving scene, thereby constructing a grid map, and finally design a reward function by using a depth reinforcement learning algorithm in combination with various factors, specifically including the following steps:

step 1, acquiring environmental information of a current vehicle j in an environment of a vehicle network, and acquiring state information of the current vehicle j through a vehicle sensor; the environment information of the vehicle j comprises the speed of the surrounding vehicle, the relative position relation with the vehicle j, road information, surrounding obstacle information and the like; the state information of the vehicle j comprises the speed and the course angle of the vehicle j;

under a vehicle coordinate system, calculating the potential field intensity E of the road mark b of the current vehicle j to the current vehicle j according to the environment information and the state information _{R_bj} Potential field intensity E of static obstacle c to current vehicle j _{D_cj} And the potential field intensity E of the running vehicle d to the current vehicle j _{V_dj} So as to obtain the total safe potential field intensity E of the current vehicle j;

In the formula (1), T _b Is a parameter determined by the type of road sign; k is a radical of ₁ Is a parameter, p _b For the distance threshold between the road sign b and the current vehicle j, r _bj Represents the distance vector between the road sign b and the current vehicle j, and r _bj ＝(x _j -x _b ,y _j -y _b )，(x _j ,y _j ) Is the centroid position coordinate of the current vehicle j in the vehicle coordinate system, (x) _b ,y _b ) Position coordinates of the road sign identifier b under the vehicle coordinate system are obtained;

In the formula (2), ρ _c Is a distance threshold between a static obstacle c and a current vehicle j, r _cj Representing the distance between a static obstacle c and the current vehicle jVector, and r _cj ＝(x _j -x _c ,y _j -y _c ) Wherein (x) _c ,y _c ) Is the centroid position coordinate, k, of the static obstacle c in the vehicle coordinate system ₂ And G is a parameter, M _c Is the mass of the static obstacle c;

In the formula (3), k ₃ Is a parameter, v _d Is the speed of the running vehicle d, r _dj Represents a distance vector between the running vehicle d and the current vehicle j, and r _dj ＝(x _j -x _d ,y _j -y _d ) Wherein (x) _d ,y _d ) For the coordinates of the position of the center of mass, p, of the vehicle d travelling under the vehicle coordinate system _d Is a distance threshold between the running vehicle d and the current vehicle j _d Is v _d And r _dj The angle between the clockwise directions;

E _j ＝E _{R_j} +E _{V_j} +E _{D_j} (4)

Step 3, constructing a grid map by using a grid method according to a potential field intensity distribution diagram corresponding to the total safety potential field intensity E;

step 3.1: establishing a grid map by using the environmental information of the rectangular area where the current vehicle j is located; and the length range of the grid map is recorded as Y _MIN ,Y _MAX ]And the width range is marked as [ X ] _MIN ,X _MAX ](ii) a Let each grid in grid mapArea of (A) is marked as C _R In this example, Y is each _MIN ，Y _MAX ，X _MIN ，X _MAX Are respectively set as-10, 30, -10, 10, and the units are m and C _R Is set to 0.5 and has the unit of m ² ；

in the formula (5), M, N represents the abscissa and ordinate of the grid map, respectively, and M is x/C _R ，N＝y/C _R (x, y) represents the position coordinates of an arbitrary point in the vehicle coordinate system, α (M, N) represents the risk level at the (M, N) position under the influence of the potential field, α (M, N) ∈ [0, 1), and is obtained by equation (6):

in the formula (6), the reaction mixture is,

indicating rounding down, E (M, N) indicating the strength of the potential field at the (M, N) position, and Δ indicating the number of levels of risk set, in this example Δ is taken to be 8, considering that each level of security should have a certain degree of discrimination and no significant faults.

constructing a deep neural network and initializing network parameters;

training the deep neural network based on the state parameter set s and the action parameter set a to obtain an optimal network model; so as to obtain the state parameter s of the current vehicle j at the ith step _i Inputting the optimal network model and outputting the motion parameter a of the ith step _i For the currentFig. 1 shows a flowchart of deep reinforcement learning for planning a travel path of a vehicle j.

Step 4.1: define action parameter set a ═ { a ═ a ₀ ,a ₁ ,a ₂ ,…,a _i ,…,a _I In which a ₀ Representing the initial motion parameter, a, of the vehicle j in the grid map _i For the ith step action parameter, representing the current vehicle j in the grid map according to the course angle

Moving a grid forward, wherein the action parameters which can be selected in each step are shown in figure 2 according to the characteristics of the grid map;

define state parameter set s ═ s ₀ ,s ₁ ,s ₂ ,…,s _i ,…,s _I }，s ₀ Represents the initial state parameter, s, of the current vehicle j _i Indicates the execution of the i-1 st step action a _i-1 A latter state parameter, and s _i ＝{(M,N) _i ,G(M,N) _i }，(M,N) _i Representing the execution of the ith step action parameter a _i Coordinate point of the next current vehicle j in the grid map, G (M, N) _i Representing the execution of the ith step action parameter a _i Risk level of the back coordinate point; i represents the maximum number of steps;

step 4.3: constructing an online network, comprising: an input layer, a hidden layer and an output layer, and initializing a weight parameter and a bias parameter of an online network;

in this example, the state consists of the horizontal and vertical coordinates of the vehicle j in the grid map and the risk level of the corresponding position, and the action of each step has a different value in 8 in the course angle in the grid map, so that the input layer has 3 neurons and the output layer has 8 neurons;

step 4.4: the state parameter s of the ith step _i Inputting into an input layer in the online network, and outputting the ith action parameter a by each neuron according to different course angles by using an output layer according to the formula (7) _i Corresponding value function Q _i ：

Q _i ＝σ(W ₂ ×Relu(W ₁ s _i +b ₁ )+b ₂ ) (7)

In the formula (7), W ₁ As an array of weights from input layer to hidden layer, b ₁ For an input layer to hidden layer bias array, W ₂ Weight array for hidden layer to output layer, b ₂ A bias array from a hidden layer to an output layer; relu is the activation function; sigma is a sigmod function;

r _i ＝w ₁ r _sm,i +w ₂ r _s,i +w ₃ r _end,i (11)

in formulae (8) to (11), r _sm,i Reward value r for smoothness of the trajectory formed by the coordinate points traversed by the current vehicle j in step i _s,i Reward value, r, for step i trajectory security _s,i The safety training method is obtained by calculation according to the safety levels, the total number of the safety levels is more, therefore, the form of a function is adopted, the safety of all levels is considered, the training effect and the training efficiency are improved, and r _end,i Reward value, r, for whether step I vehicle j reaches the end within maximum number of steps I _i Total reward value, Δ δ, for step i vehicle j _i The variation of the j course angle of the vehicle in the ith step is shown, delta is a threshold parameter of the variation of the course angle, and the delta is set to be 45 degrees in the example due to the characteristics of the grid map so as to ensure that the variation of the course angle of each step is not too large under the normal condition, thereby influencing the trackSmoothness of the last formed path, λ ₁ Proportional parameter, λ, for the reward value for trajectory safety ₂ Bias parameter for reward value for track security, η ₁ 、η ₂ Zeta reward parameter for smoothness of track under different conditions ₁ And ζ ₂ Reward parameter, w, for whether a track reaches an endpoint under different conditions ₁ 、w ₂ And w ₃ Representing the proportion of the weights to different rewards, and calculating different weights for the reward function with no stop in the total reward value, so that the training effect can be expected more quickly, (x) _j,i ,y _j,i ) And (x) _end ,y _end ) Respectively representing a coordinate point of the current position of the ith step of the current vehicle j and a coordinate point of the target position, d ₁ ，d ₂ ，d ₃ And d ₄ Distance thresholds, which give a reward value when the vehicle approaches the end point, taking into account that in the initial training, the vehicle may rarely reach the midpoint;

step 4.6: in the actual training, the value of the value function of a certain step is larger, the value of the value function trained later is smaller, and the value function falls into the local optimal solution, so a greedy strategy is adopted to generate a random number tau between 0 and 1, whether tau is greater than epsilon is judged, and if tau is greater than epsilon, the value function Q corresponding to each neuron is selected _i The course angle corresponding to the maximum function value is selected as the ith step action parameter executed by the current vehicle j, otherwise, the secondary function Q is selected _i Randomly selecting a corresponding course angle as an action parameter executed by the current vehicle j;

the state parameter s of the current vehicle j in the ith step _i Lower execution action parameter a _i Obtaining the state parameter s of the step i +1 _i+1 And the prize value r of step i _i And forming a parameter form(s) _i ,a _i ,r _i ,s _i+1 )；

When the number of the parameters in the experience pool is larger than n, replacing an earliest added parameter by a newly generated parameter;

step 4.7.2: constructing a target network with the same structure as the online network, and initializing the parameters of the target network into a weight array W of the online network ₁ 、W ₂ And an offset array b ₁ 、b ₂ ；

Step 4.7.3: randomly extracting u parameter forms in the experience pool D, and obtaining the state parameter s of the (i + 1) th step of the current vehicle j _i+1 Inputting a target network, and obtaining each neuron of an output layer of the target network by using the formula (12) to execute the (i + 1) step action a according to different course angles _i+1 Corresponding value function Q' _i+1 ：

Q′ _i+1 ＝σ(W′ ₂ ×Relu(W′ ₁ s _i+1 +b′ ₁ )+b′ ₂ ) (12)

In the formula (12), W ₁ ' and W ₂ ' weight arrays for hidden and output layers of the target network, respectively, b ₁ ' and b ₂ ' bias arrays for hidden and output layers of the target network, respectively;

Q _tag,i ＝R _i +γmax(Q′ _i+1 ) (13)

In the formula (13), R _i To execute the ith step action parameter a _i The latter reward value, gamma, is the reward attenuation factor; q _tag,i The calculation of (2) uses a Markov decision process, the value range of gamma is 0-1, and when gamma is 0, the current state outputs Q _tag,i The value of the value function is calculated by considering the following neural network when the value of the gamma value tends to 1;

loss＝E((Q _tag,i -Q _real,i ) ² ) (14)

Claims

1. A driving condition local path planning method based on a safety potential field and a DQN algorithm is characterized by comprising the following steps:

step 4, defining a state parameter set s and an action parameter set a of the current vehicle j in driving;

constructing a deep neural network and initializing network parameters;

training the deep neural network based on a state parameter set s and an action parameter set a to obtain an optimal network model; so as to obtain the state parameter s of the current vehicle j at the ith step _i Inputting the optimal network model and outputting the motion parameter a of the ith step _i Used for planning the driving path of the current vehicle j.

2. The method for planning the local path of the driving condition based on the safety potential field and the DQN algorithm according to claim 1, wherein the step 2 comprises:

In the formula (1), T _b Is a parameter determined by the type of road sign; k is a radical of ₁ Is a parameter, p _b For the distance threshold between the road sign b and the current vehicle j, r _bj Represents the distance vector between the road sign b and the current vehicle j, and r _bj ＝(x _j -x _b ,y _j -y _b )，(x _j ,y _j ) Is the centroid position coordinate of the current vehicle j in the vehicle coordinate system, (x) _b ,y _b ) Identifying the position coordinates of the road sign b under the vehicle coordinate system;

In the formula (2), ρ _c Is a distance threshold between a static obstacle c and a current vehicle j, r _cj Represents a distance vector between a static obstacle c and a current vehicle j, and r _cj ＝(x _j -x _c ,y _j -y _c ) Wherein (x) _c ,y _c ) Is the coordinate of the centroid position of the static obstacle c under the vehicle coordinate system, k ₂ And G is a parameter, M _c Is the mass of the static obstacle c;

In the formula (3), k ₃ Is a parameter, v _d Is the speed of the running vehicle d, r _dj Represents a distance vector between the running vehicle d and the current vehicle j, and r _dj ＝(x _j -x _d ,y _j -y _d ) Wherein (x) _d ,y _d ) Is the centroid position coordinate, rho, of a running vehicle d in the vehicle coordinate system _d Is a distance threshold between the running vehicle d and the current vehicle j _d Is v _d And r _dj The angle between clockwise directions;

E _j ＝E _{R_j} +E _{V_j} +E _{D_j} (4)

In formula (4), E _{R_j} Identifying for each traffic the sum of the potential field vectors at the current vehicle j position, E _{D_j} For the sum of the potential field vectors of the respective vehicles at the current position of the vehicle j, E _{V_j} And (4) carrying out vector summation on the potential fields of the current position of the vehicle j for each running vehicle.

3. The method for planning the local path of the driving condition based on the safety potential field and the DQN algorithm according to claim 1, wherein the step 3 comprises:

step 3.1: establishing the grid map by using the environmental information of the rectangular area where the current vehicle j is located; and the length range of the grid map is recorded as [ Y ] _MIN ,Y _MAX ]And the width range is marked as [ X ] _MIN ,X _MAX ](ii) a Let the area of each grid in the grid map be denoted as C _R ；

in the formula (5), M, N represents the abscissa and ordinate in the grid map, respectively, and M is x/C _R ，N＝y/C _R (x, y) represents the position coordinates of any point in the vehicle coordinate system, α (M, N) represents the risk level at the (M, N) position under the influence of the potential field, α (M, N) ∈ [0, 1), and is obtained by equation (6):

in the formula (6), the reaction mixture is,

4. The method for planning the local path of the driving condition based on the safety potential field and the DQN algorithm according to claim 1, wherein the step 4 comprises:

step 4.1: define action parameter set a ═ a ₀ ,a ₁ ,a ₂ ,…,a _i ,…,a _I In which a ₀ Represents initial motion information of a vehicle j in the grid map, a _i Representing the current vehicle j in the grid map according to the course angle for the ith step action parameter

Moving forward one grid;

step 4.3: constructing an online network, comprising: an input layer, a hidden layer and an output layer, and initializing a weight parameter and a bias parameter of the online network;

step 4.4: the state parameter s of the step i is measured _i Inputting into an input layer in the online network, wherein the output layer outputs each neuron to execute the ith action parameter a according to different course angles by using the formula (7) _i Corresponding value function Q _i ：

Q _i ＝σ(W ₂ ×Relu(W ₁ s _i +b ₁ )+b ₂ ) (7)

r _i ＝w ₁ r _sm,i +w ₂ r _s,i +w ₃ r _end,i (11)

in formulae (8) to (11), r _sm,i Reward value r for smoothness of the trajectory formed by the coordinate points traversed by the current vehicle j in step i _s,i Reward value, r, for step i trajectory security _end,i Reward value, r, for whether step I vehicle j reaches the end within maximum number of steps I _i Total prize value, Δ δ, for step i vehicle j _i The variation of the j course angle of the vehicle in the ith step is delta which is a threshold parameter of the variation of the course angle, and lambda ₁ Proportional parameter, λ, for the reward value for trajectory safety ₂ Bias parameter for reward value for track security, η ₁ 、η ₂ Zeta, a reward parameter for smoothness of the trajectory under different conditions ₁ And ζ ₂ Reward parameter, w, for whether a track reaches an endpoint under different conditions ₁ 、w ₂ And w ₃ Representing the ratio of weights to different awards, (x) _j,i ,y _j,i ) And (x) _end ,y _end ) Respectively representing a coordinate point of the current position of the ith step of the current vehicle j and a coordinate point of the target position, d ₁ ，d ₂ ，d ₃ And d ₄ Are all distance thresholds;

step 4.6: generating a random number tau between 0 and 1, judging whether tau < epsilon is true, if yes, obtaining a value function Q corresponding to each neuron _i The course angle corresponding to the maximum function value is selected as the ith step action parameter executed by the current vehicle j, otherwise, the secondary function Q is selected _i Randomly selecting a corresponding course angle as an action parameter executed by the current vehicle j;

the state parameter s of the current vehicle j in the ith step _i Lower execution action parameter a _i Obtaining the state parameter s of the (i + 1) th step _i+1 And the prize value r of step i _i And forming a parameter form(s) _i ,a _i ,r _i ,s _i+1 )；

Step 4.7.3: randomly extracting u parameter forms in an experience pool D, and converting the state parameter s of the (i + 1) th step of the current vehicle j _i+1 Inputting the target network, and obtaining each neuron of an output layer of the target network by using a formula (12) to execute an i +1 step action a according to different course angles _i+1 Corresponding value function Q' _i+1 ：

Q′ _i+1 ＝σ(W ₂ ′×Relu(W ₁ ′s _i+1 +b ₁ ′)+b ₂ ′) (12)

Q _tag,i ＝R _i +γmax(Q′ _i+1 ) (13)

step 4.7.5: constructing a loss function loss by using a formula (14), training the online network by using a gradient descent method, and calculating the loss function loss to update the parameter W of the online network ₁ ，b ₁ ，W ₂ And b ₂ Assigning parameters in the online network to a target network when the training times reach a fixed number;

loss＝E((Q _tag,i -Q _real,i ) ² ) (14)