CN114013443B

CN114013443B - Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning

Info

Publication number: CN114013443B
Application number: CN202111339265.0A
Authority: CN
Inventors: 崔建勋; 慈玉生; 要甲; 姜慧夫; 曲明成
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2021-11-12
Filing date: 2021-11-12
Publication date: 2022-09-23
Anticipated expiration: 2041-11-12
Also published as: CN114013443A

Abstract

A hierarchical reinforcement learning-based decision control method for lane change of an automatic driving vehicle belongs to the technical field of automatic driving control. The problem of there is the security poor/inefficiency in current autopilot process is solved. The method comprises the steps of establishing a decision neural network with 3 hidden layers by utilizing the speed in the actual driving scene of the automatic driving vehicle, the relative position of the automatic driving vehicle in the surrounding environment and the relative speed information, and training and fitting a Q valuation function to the decision neural network by utilizing a lane changing safety reward function to obtain the action of the maximum Q valuation; the method comprises the steps of establishing an acceleration decision model for deep Q learning by utilizing a reward function corresponding to the following or lane changing action and speed in an actual driving scene of an automatic driving vehicle and relative position information of vehicles in the surrounding environment, obtaining lane changing or following acceleration information, and generating a reference lane changing track by adopting a 5-degree polynomial curve when lane changing. The method is suitable for automatic driving lane change decision and control.

Description

Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning

Technical Field

The invention belongs to the technical field of automatic driving control.

Background

In general, the driving strategy of autonomous driving is a modular component. Roughly divided into 4 levels: (1) a strategic plan layer: generally responsible for the planning of the global path level from the starting point to the end point. The part relates to some related knowledge such as the shortest path, the weighted shortest path, GIS and the like, and the current research and implementation methods are relatively mature; (2) tactical layer decision making: generally, the system is responsible for behavior decisions in a local range in the actual driving process, such as following driving, lane changing, overtaking, accelerating, decelerating and the like; (3) local planning layer: according to the action intention of a tactical decision layer, the tactical decision layer is responsible for generating a track (projector) which is safe and accords with traffic regulations; (4) a vehicle control layer: the layer mainly adopts an optimal control method according to the generated track, and realizes the minimum deviation tracking of the generated track through controlling an accelerator, a brake and a steering wheel of the vehicle.

The lane change decision and the lane change trajectory are respectively key contents in an automatic driving tactical decision layer and a local planning layer, are basic decision behaviors in many driving scenes, and the safety, the efficiency and the quality of the automatic driving decision, the planning and the control are determined to a great extent by the performance level of the lane change decision and the lane change trajectory. The traditional method mainly comprises the following steps: (1) the track-changing decision is realized by adopting a rule-based (such as a finite-state machine) mode, and the track-changing track generation is generated by adopting an optimal control theory; (2) the lane change decision and the execution are bound together, learning is carried out in an end-to-end (end-to-end) mode, and the lane change vehicle control action is directly input from the state and output. In the mode (1), because the method is based on the rules in nature, the method is difficult to generalize under the undefined driving scene, and the rule set under the complex scene is difficult to define, even cannot be realized; although the method (2) is very efficient in decision making and can be well generalized to an undefined scene, the safety of lane changing cannot be completely guaranteed due to the pure learning-based method. In addition, the autopilot strategy is "hierarchical" in nature, i.e., the driving intent is generated first, then the trajectory is generated and the vehicle is controlled according to the intent, and if the decision and control are directly tied together, it is difficult to establish an efficient decision and control method.

Disclosure of Invention

The invention aims to solve the problems of poor safety and low efficiency in the existing automatic driving process, and provides a hierarchical reinforcement learning-based automatic driving vehicle lane change decision control method.

The invention relates to a hierarchical reinforcement learning-based automatic driving vehicle lane change decision control method, which comprises the following steps:

step one, establishing a decision neural network with 3 hidden layers by utilizing the speed in the actual driving scene of the automatic driving vehicle, the relative position of the automatic driving vehicle in the surrounding environment and the relative speed information, and training and fitting a Q valuation function to the decision neural network by utilizing a lane changing safety reward function to obtain the action of the maximum Q valuation;

step two, when the movement with the maximum Q valuation is taken as a lane changing movement, executing step three, when the movement with the maximum Q valuation is taken as a follow-up movement, establishing an acceleration decision model for deep Q learning by using the speed in the actual driving scene of the automatic driving vehicle and a reward function corresponding to the follow-up movement of the vehicle in the surrounding environment and acquiring the follow-up acceleration, and finishing one-time automatic driving decision and control;

thirdly, establishing an acceleration decision model for deep Q learning by using the speed in the actual driving scene of the automatic driving vehicle and the reward function corresponding to the relative position information of the vehicles in the surrounding environment and the lane changing action; acquiring acceleration information of a lane changing action;

step four, generating a reference lane changing track by using the acceleration information of the lane changing action and adopting a 5 th-order polynomial curve;

and step five, controlling the automatic driving vehicle to execute a lane changing action by adopting a pure tracking control method, and finishing one-time automatic driving lane changing decision and control.

Further, in the present invention, the speed in the actual driving scene of the autonomous vehicle and the relative position and relative speed information with the vehicle in the surrounding environment described in the first step, the second step and the third step are:

the relative position of the target automatic driving vehicle and the front vehicle of the current lane is as follows: Δ x _leader ＝|x _ego -x _leader L, |; wherein x is _ego Position coordinates, x, of the target autonomous vehicle in the lane direction _leader Automatically driving the position coordinates of the front vehicle of the vehicle along the lane direction for the current lane target;

the relative position of the target automatic driving vehicle and the front vehicle of the target lane is as follows: Δ x _target ＝|x _ego -x _target L, |; wherein x is _target Position coordinates of a front vehicle of the target lane along the lane direction;

the relative position of the target automatic driving vehicle and the rear vehicle of the target lane is as follows: Δ x _follow ＝|x _ego -x _follow L, |; wherein x is _follow Position coordinates of a rear vehicle of the target lane along the lane direction;

targetRelative speed of the autonomous vehicle and the front vehicle of the target lane: Δ v _ego ＝|v _ego -v _leader L, |; wherein v is _ego Speed of the vehicle, v, automatically driven for the purpose _leader Automatically driving the speed of the vehicle ahead for the current lane target;

relative speed of the target autonomous vehicle and the target lane front vehicle: Δ v _target ＝|v _ego -v _target L, |; wherein v is _target The speed of a vehicle in front of the target lane along the lane direction;

target autonomous vehicle speed: v. of _ego ；

Target autonomous vehicle acceleration: a is _ego 。

Further, in the present invention, in the first step, the lane change security reward function is:

wherein, w ₁ ，w ₂ ，w ₃ ，w ₄ The weight coefficient of the relative position of the target automatic driving vehicle and the vehicle in front of the current lane, the weight coefficient of the relative speed of the target automatic driving vehicle and the vehicle in front of the target lane, the relative position of the target automatic driving vehicle and the vehicle in front of the target lane and the weight coefficient of the relative speed of the target automatic driving vehicle and the vehicle in front of the target lane are respectively.

Further, in the present invention, in step one, in the decision neural network with 3 hidden layers, each hidden layer includes 100 neurons.

Further, in the second step, the specific method for establishing the acceleration decision model for deep Q learning is as follows:

taking the environment state as an input, the final Q estimation value of the acceleration decision model is obtained through 3 sub fully-connected neural networks A, B, C respectively:

environmental state: s ═ Δ x _leader ，Δx _target ，Δx _follow ，Δv _ego ，Δv _target ，v _ego ，a _ego )

Wherein a represents the longitudinal acceleration needed to be decided;

following reward function:

R _dis ＝-w _dis .|x _leader -x _ego equation two

R _v ＝w _v .|v _leader -v _ego Equation III

R _c ＝R _dis +R _v Formula four

Wherein R is _dis ，R _v Respectively representing a distance-related reward function and a speed-related reward function of the following state; w is a _dis And w _v Weights corresponding to the distance reward and the speed reward in the following state are respectively; r is _c Represents the comprehensive reward of the following state and the distance and the speed;

final Q estimation by the acceleration decision model:

Q(s，a)＝A(s).(B(s)-R _c |a) ² + C(s) formula five

Wherein R is _c | a represents the comprehensive reward obtained in the following state under the condition that the acceleration is a; a,(s), B,(s), C,(s) are respectively the output of 3 sub fully-connected neural networks under the current state s.

Further, in the third step, the specific method for establishing the acceleration decision model of deep Q learning by using the actual driving scene information and speed of the autonomous vehicle and the reward function corresponding to the lane change action of the vehicle in the surrounding environment is as follows:

environmental state: s ═ Δ x _leader ，Δx _target ，Δx _follw Δv _ego ，Δv _target ，v _ego ，a _ego )

Wherein a represents the longitudinal acceleration needed to be decided;

lane change reward function:

r _dis ＝-w _dis .|min(Δx _leader ，Δx _target )-Δx _follow equation six

r _v ＝-w _v .|min(v _leader ，v _target )-v _ego Formula seven |

R _A ＝r _dis +r _v Equation eight

Wherein r is _dis ，r _v Respectively representing rewards related to distance and speed when changing the lane; w is a _dis And w _v Respectively awarding corresponding weights for distance and speed in the lane changing state; r _A Representing a composite reward related to distance and speed when changing lane status;

final Q value of the acceleration decision model:

Q(s，a)＝A(s).(B(s)-R _A |a) ² + C(s) formula nine

Wherein R is _A And | a represents the instant reward obtained in the lane change state under the condition that the acceleration is a, and A(s), B(s) and C(s) are respectively the output of 3 sub fully-connected neural networks in the current state s.

Further, in the fourth step of the present invention, a reference lane change trajectory generated by using the acceleration information of the lane change action and adopting a polynomial curve of degree 5 is:

x(t)＝a ₅ t ⁵ +a ₄ t ⁴ +a ₃ t ³ +a ₂ t ² +a ₁ t+a ₀ formula ten

y(t)＝b ₅ t ⁵ +b ₄ t ⁴ +b ₃ t ³ +b ₂ t ² +b ₁ t+b ₀ Formula eleven

Wherein x (t) is the position coordinate of the track point along the transverse direction of the road at the moment t, y (t) is the position coordinate of the track point along the longitudinal direction of the road at the moment t, t is time, and the parameter a ₁ ，...，a ₅ ，b ₁ ，...，b ₅ By the expectation function:

is determined by changing a ₁ ，...，a ₅ ，b ₁ ，...，b ₅ The value of the target value (A) optimizes an expectation function, so that the distance and risk of the acceleration a of the expectation function corresponding to a reference track at the time T under the constraint of a track planning boundary and the constraint of a traffic speed limit are minimized, and the comfort is maximized, wherein T is a time window of reference track changing planning,

travel distance term, w, representing a reference lane change trajectory _d P (dangerous | a, t) represents a security risk term for the reference lane change trajectory, w _c P (comfort | a, t) denotes a comfort term referring to the lane change trajectory, w _d ，w _c Weight of risk term and weight of comfort, w, respectively, of the reference trajectory _c < 0, P (dangerous | a, t) is the probability of safety risk in the objective function, and P (comfort | a, t) is the degree of comfort in the objective function.

Further, in the present invention, the constraint conditions of the trajectory planning boundary are specifically: bringing the reference trajectory within the lane line:

wherein x is _min 、y _min 、x _max And y _max Respectively representing the boundary coordinates of the lane lines corresponding to the current vehicle.

Further, in the present invention, the traffic speed limit constraint condition is specifically: and (3) making the site speed at any point of the reference track not exceed the traffic speed limit value:

wherein upsilon is _x，min υ _x，max ，υ _y，min ，υ _y，max Respectively representing the range of allowable speeds of the autonomous vehicle in both x and y directions.

Further, in the invention, in the fifth step, a pure tracking control method is adopted, and a concrete method for controlling the automatic driving vehicle to execute the lane changing action is as follows:

and controlling the steering wheel angle in the lane changing action process of the automatic driving vehicle by adopting a pure tracking control algorithm according to the generated reference lane changing track:

delta (t) is a steering wheel angle calculated by a pure tracking control algorithm at the moment t; α (t) is an actual steering wheel angle; l _d Is the distance viewed forward and L is the wheelbase of the vehicle.

The method of the invention combines the advantages of generalization and optimal control based on learning mode, and simultaneously, because the lane change decision and the acceleration decision are processed hierarchically by adopting two models, the lane change decision model and the acceleration decision model are adopted to utilize the Q-estimation neural network, so that the processing is more efficient and more accurate, and the method is essentially closer to the human driving lane change behavior of 'lane change intention generation → lane change track generation → lane change action execution', thereby being capable of generating safer, more robust and more efficient decision and control output.

Drawings

FIG. 1 is a schematic diagram of an automated driving lane change based decision and control method of the present invention;

FIG. 2 is a schematic view of lane change scene parameters; in the figure, ego is the target autonomous vehicle, leader is the vehicle ahead of the target autonomous vehicle in the current lane, target is the vehicle ahead of the target autonomous vehicle in the target lane, and follow is the vehicle behind the target autonomous vehicle in the target lane;

FIG. 3 is a network architecture diagram of an acceleration decision model.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the embodiments and features of the embodiments of the present invention may be combined with each other without conflict.

The first embodiment is as follows: the present embodiment is described below with reference to fig. 1, and the method for controlling a lane change decision of an autonomous vehicle based on hierarchical reinforcement learning in the present embodiment includes:

step one, establishing a decision neural network with 3 hidden layers by utilizing the speed in the actual driving scene of the automatic driving vehicle and the relative position and relative speed information of the vehicle in the surrounding environment, and training and fitting a Q valuation function to the decision neural network by utilizing a lane changing safety reward function to obtain the action of the maximum Q valuation;

step two, when the movement with the maximum Q valuation is taken as the lane changing movement, executing step three, when the movement with the maximum Q valuation is taken as the continuous following movement, establishing an acceleration decision model for deep Q learning by utilizing the speed in the actual driving scene of the automatic driving vehicle and the relative position information of the vehicle in the surrounding environment and a reward function corresponding to the following movement, obtaining the following acceleration, and finishing one-time automatic driving decision and control;

In the present embodiment, the input is a lane change request/command and environmental state information. The need to change lanes may come from a higher level of behavioral decision, such as in the case of a cut-in, which is attempted to trigger a lane change request or instruction due to the fact that the vehicle ahead of the lane in which the autonomous target vehicle is located is traveling too slowly, and the autonomous vehicle is gaining a higher driving efficiency benefit. Meanwhile, environmental information (mainly information such as relative position and speed of surrounding environmental vehicles) around the target autonomous vehicle must be synchronously input, and the environmental information is the basis of lane changing decision of the autonomous vehicle.

The method adopts the framework of two decision models, including a lane change decision model and an acceleration decision model. And after receiving the lane change requirement and the environmental state information, the lane change decision model determines whether to change lanes, adjusts the longitudinal acceleration (decision) of the automatic driving vehicle, and further executes the following and lane change behaviors.

Further, the present embodiment is described with reference to fig. 2, and in the present embodiment, the speed in the actual driving scene of the autonomous vehicle and the relative position and relative speed information with respect to the vehicle in the surrounding environment in the first step, the second step, and the third step are:

the relative position of the target automatic driving vehicle and the front vehicle of the current lane is as follows: Δ x _leader ＝|x _ego -x _leader L; wherein x is _ego Position coordinates, x, of the target autonomous vehicle in the lane direction _leader Automatically driving position coordinates of a front vehicle of the vehicle along the lane direction for a current lane target;

the relative position of the target automatic driving vehicle and the front vehicle of the target lane is as follows: Δ x _target ＝|x _ego -x _target L, |; wherein x is _target Position coordinates of a vehicle in front of the target lane along the lane direction;

relative speed of the target autonomous vehicle and the target lane front vehicle: Δ v _ego ＝|v _ego -v _leader L, |; wherein v is _ego Speed of the vehicle, v, automatically driven for the purpose _leader Automatically driving the speed of the vehicle ahead for the current lane target;

relative speed of the target autonomous vehicle and the target lane front vehicle: Δ v _target ＝|v _ego -v _target L; wherein v is _target The speed of the front vehicle of the target lane along the lane direction;

target autonomous vehicle speed: v. of _ego ；

Target autonomous vehicle acceleration: a is _ego 。

In this embodiment, a schematic diagram of a lane change environment state definition is shown in fig. 2, where Ego is an autonomous vehicle, and the other vehicles are background vehicles. Each vehicle has its own state including 4 pieces of information of position abscissa, position ordinate, speed and acceleration. Ambient state s ═ Δ x _leader ，Δx _target ，Δx _follow ，Δv _ego ，Δv _target ，v _ego ，a _ego )。

Further, in the present embodiment, in the step one, the lane change security reward function is:

wherein, w ₁ ，w ₂ ，w ₃ ，w ₄ The weight coefficients of the relative position of the target automatic driving vehicle and the front vehicle of the current lane, the weight coefficient of the relative speed of the target automatic driving vehicle and the front vehicle of the target lane, the relative position of the target automatic driving vehicle and the front vehicle of the target lane and the weight coefficient of the relative speed of the target automatic driving vehicle and the front vehicle of the target lane are respectively;

in this embodiment, w ₁ ＝0.4，w ₂ ＝0.6，w ₃ ＝0.4，w ₄ ＝0.6。

Further, as described with reference to fig. 2, in the first step of the present embodiment, in the decision neural network with 3 hidden layers, each hidden layer includes 100 neurons.

Further, in the present embodiment, in the second step, a specific method for establishing the acceleration decision model for deep Q learning is as follows:

taking the environment state as an input, and obtaining a final Q estimation value of the acceleration decision model through 3 sub fully-connected neural networks A, B, C respectively:

and (3) environmental state: s ═ Δ x _leader ，Δx _target ，Δx _follow ，Δv _ego ，Δv _target ，v _ego ，a _ego )

Wherein a represents the longitudinal acceleration required for decision making;

following reward function:

R _dis ＝-w _dis .|x _leader -x _ego equation two

R _v ＝-w _v .|v _leader -v _ego Equation III

R _c ＝R _dis +R _v Formula four

Wherein R is _dis ，R _v Respectively representing a distance-related reward function and a speed-related reward function of the following state; w is a _dis And w _v Weights corresponding to the distance reward and the speed reward of the following state are respectively set; r _c A composite reward representing the following state in relation to distance and speed;

final Q estimation by the acceleration decision model:

Q(s，a)＝A(s).(B(s)-R _c |a) ² + C(s) formula five

Wherein R is _c | a represents the comprehensive reward obtained in the following state under the condition that the acceleration is a; a,(s), B,(s), C(s) are respectively the output of 3 sub fully-connected neural networks under the current state s.

Further, in the present embodiment, in step three, a specific method of establishing an acceleration decision model for deep Q learning is performed by using an incentive function corresponding to lane change actions and actual driving scene information and speed of an autonomous vehicle, and relative position information of vehicles in the surrounding environment:

Wherein a represents the longitudinal acceleration needed to be decided;

lane change reward function:

r _dis ＝-w _dis .|min(Δx _leader ，Δx _target )-Δx _follow equation six

r _v ＝-w _v .|min(v _leader ，v _target )-v _ego Formula seven |

R _A ＝r _dis +r _v Equation eight

Wherein r is _dis ，r _v Respectively representing rewards related to distance and speed when changing the lane; w is a _dis And w _v Respectively rewarding distance and speed corresponding to the weight in the lane changing state; r _A Representing a composite reward associated with distance and speed when changing lane status.

Final Q value of the acceleration decision model:

Q(s，a)＝A(s).(B(s)-R _A |a) ² + C(s) formula nine

In this embodiment, the acceleration decision model receives a decision output from the lane change decision model, i.e., whether to change lanes. And if the lane is not changed, triggering the following behavior, and if the lane is changed, triggering the lane changing behavior. As shown in fig. 1, the acceleration decision model is responsible for deciding a longitudinal acceleration (continuous value along the road direction), and the acceleration decision model is responsible for generating a safe track and then controlling the vehicle to track the generated track. In this embodiment, the actual driving scene information, the speed, and the relative position information of the vehicle in the surrounding environment of the autonomous vehicle are the environment states, the three sub fully-connected neural networks include three sub fully-connected neural networks, and each sub fully-connected neural network includes 200 neurons.

Further, in the fourth step of the present embodiment, the acceleration information of the lane change operation is utilized, and a 5 th-order polynomial curve is adopted to generate a reference lane change track as follows:

Wherein x (t) is the position coordinate of the track point at the time t along the longitudinal direction of the road, y (t) is the position coordinate of the track point at the time t along the transverse direction of the road, t is time, and the parameter a ₁ ，...，a ₅ ，b ₁ ，...，b ₅ By the expectation function:

travel distance term, w, representing a reference lane change trajectory _d P (dangerous | a, t) denotes a reference frameSafety risk item of track, w _c P (comfort | a, t) denotes a comfort term referring to the lane change trajectory, w _d ，w _c Weight of risk term and weight of comfort, w, of reference trajectory, respectively _c < 0, P (dangerous | a, t) is the probability of safety risk in the objective function, and P (comfort | a, t) is the degree of comfort in the objective function.

In the present embodiment, the trajectory of the autonomous vehicle following or changing lanes is planned next based on the acceleration a output by the acceleration decision model. The planning of the track is based on two indexes, which are respectively: safety and comfort. Firstly, a reference track-changing track is generated by adopting a polynomial curve of degree 5, and the safety is embodied by the distance and the risk of the reference track.

Further, in this embodiment, the constraint conditions of the trajectory planning boundary are specifically: bringing the reference trajectory within the lane line:

Further, in this embodiment, the traffic speed limit constraint condition is specifically: and (3) making the site speed at any point of the reference track not exceed the traffic speed limit value:

wherein v is _x，min 、v _x，max 、v _y，min And v _y，max Representing the range of allowable speeds of the autonomous vehicle in both the lateral y and longitudinal x directions of the road, respectively.

Further, in the present embodiment, in the fifth step, a pure tracking control method is adopted, and a specific method of controlling the autonomous vehicle to perform the lane change operation is as follows:

and controlling the steering wheel angle in the lane changing process of the automatic driving vehicle by adopting a pure tracking control algorithm according to the generated reference lane changing track:

delta (t) is a steering wheel rotation angle calculated by a pure tracking control algorithm at the moment t; α (t) is an actual steering wheel angle; l _d Is the distance viewed forward and L is the wheelbase of the vehicle.

Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that various dependent claims and the features described herein may be combined in ways different from those described in the original claims. It is also to be understood that features described in connection with individual embodiments may be used in other described embodiments.

Claims

1. A decision control method for lane change of an automatic driving vehicle based on layered reinforcement learning is characterized by comprising the following steps:

step one, establishing a decision neural network with 3 hidden layers by using the speed in the actual driving scene of an automatic driving vehicle and the relative position and relative speed information of the vehicle in the surrounding environment, and training and fitting a Q valuation function to the decision neural network by using a lane changing safety reward function to obtain the action of the maximum Q valuation;

the lane change security reward function is:

wherein w ₁ ，w ₂ ，w ₃ ，w ₄ The weight coefficients of the relative positions of the target automatic driving vehicle and the vehicle in front of the current lane, the weight coefficient of the relative speed of the target automatic driving vehicle and the vehicle in front of the current lane, the relative position of the target automatic driving vehicle and the vehicle in front of the current lane and the weight coefficient of the relative speed of the target automatic driving vehicle and the vehicle in front of the current lane are respectively;

2. The method as claimed in claim 1, wherein the speed in the actual driving scene of the autonomous vehicle and the relative position and relative speed information with the vehicle in the surrounding environment in the first step, the second step and the third step are:

the relative position of the target automatic driving vehicle and the front vehicle of the current lane is as follows: Δ x _leader ＝|x _ego -x _leader L, |; wherein x is _ego To the eyesPosition coordinate, x, of a target autonomous vehicle in the direction of the lane _leader Automatically driving position coordinates of a front vehicle of the vehicle along the lane direction for a current lane target;

the relative position of the target automatic driving vehicle and the front vehicle of the target lane is as follows: Δ x _target ＝|x _ego -x _target L; wherein x is _target Position coordinates of a vehicle in front of the target lane along the lane direction;

the relative position of the target automatic driving vehicle and the rear vehicle of the target lane is as follows: Δ x _follow ＝|x _ego -x _follow L; wherein x is _follow Position coordinates of a rear vehicle of the target lane along the lane direction;

the relative speed of the target automatic driving vehicle and the vehicle in front of the current lane: Δ v _ego ＝|v _ego -v _leader L, |; wherein v is _ego Speed of the vehicle, v, automatically driven for the purpose _leader Automatically driving the speed of the vehicle ahead for the current lane target;

relative speed between the target autonomous vehicle and the vehicle in front of the target lane: Δ v _target ＝|v _ego -v _target L, |; wherein v is _target The speed of the front vehicle of the target lane along the lane direction;

target autonomous vehicle speed: v. of _ego ；

Target autonomous vehicle acceleration: a is _ego 。

3. The method for controlling the lane change decision of the automatic driving vehicle based on the hierarchical reinforcement learning as claimed in claim 1 or 2, wherein in the step one, in the decision neural network with 3 hidden layers, each hidden layer comprises 100 neurons.

4. The automatic driving vehicle lane change decision control method based on the hierarchical reinforcement learning according to claim 1 or 2, characterized in that in the second step, a specific method for establishing an acceleration decision model of deep Q learning is as follows:

Wherein a represents the longitudinal acceleration needed to be decided;

following reward function:

R _dis ＝-w _dis .|x _leader -x _ego equation two

R _v ＝-w _v .|v _leader -v _ego Equation III

R _c ＝R _dis +R _v Formula four

final Q estimation by the acceleration decision model:

Q(s，a)＝A(s).(B(s)-R _c |a) ² + C(s) formula five

Wherein R is _c A represents the comprehensive reward obtained in the following state under the condition that the acceleration is a; a,(s), B,(s), C,(s) are respectively the output of 3 sub fully-connected neural networks under the current state s.

5. The method for controlling the lane change decision of the automatic driving vehicle based on the hierarchical reinforcement learning as claimed in claim 1 or 2, wherein in the third step, a concrete method for establishing an acceleration decision model of deep Q learning by using the actual driving scene information, speed of the automatic driving vehicle and the reward function corresponding to the lane change action of the relative position information of the vehicle in the surrounding environment is as follows:

lane change reward function:

r _dis ＝-w _dis .|min(Δx _leader ，Δx _target )-Δx _follow equation six

r _v ＝-w _v .|min(v _leader ，v _target )-v _ego Formula seven |

R _A ＝r _dis +r _v Equation eight

Wherein r is _dis ，r _v Respectively representing rewards related to distance and speed when changing the lane; w is a _dis And w _v Respectively rewarding distance and speed corresponding to the weight in the lane changing state; r _A Representing a composite reward related to distance and speed when changing lane status;

final Q value of the acceleration decision model:

Q(s，a)＝A(s).(B(s)-R _A |a) ² + C(s) formula nine

6. The automatic driving vehicle lane change decision-making control method based on hierarchical reinforcement learning as claimed in claim 5, characterized in that in step four, the acceleration information of the lane change action is utilized, and a 5 th-order polynomial curve is adopted to generate a reference lane change track as follows:

x(t)＝a ₅ t ⁵ +a ₄ t ⁴ +a ₃ t ³ +a ₂ t ² +a ₁ t+a ₀ equation of ten

7. The method for controlling the lane change decision of the automatic driving vehicle based on the hierarchical reinforcement learning as claimed in claim 5, wherein the constraint conditions of the trajectory planning boundary are specifically as follows: bringing the reference trajectory within the lane line:

8. The method for controlling the lane change decision of the automatic driving vehicle based on the hierarchical reinforcement learning as claimed in claim 5, wherein the traffic speed limit constraint conditions are specifically as follows: and (3) making the site speed at any point of the reference track not exceed the traffic speed limit value:

9. The decision-making method for lane change of the automatic driving vehicle based on the hierarchical reinforcement learning as claimed in claim 2, wherein in the fifth step, a pure tracking control method is adopted to control the automatic driving vehicle to execute a specific method of lane change action:

according to the reference lane changing track, a pure tracking control algorithm is adopted to control the steering wheel angle in the lane changing action process of the automatic driving vehicle:

delta (t) is a steering wheel angle calculated by a pure tracking control algorithm at the moment t; alpha (t) is the actual steering wheel angle of the automatic driving vehicle at the time t; l _d Is the distance viewed forward and L is the wheelbase of the vehicle.