CN114013443B - Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning - Google Patents

Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning Download PDF

Info

Publication number
CN114013443B
CN114013443B CN202111339265.0A CN202111339265A CN114013443B CN 114013443 B CN114013443 B CN 114013443B CN 202111339265 A CN202111339265 A CN 202111339265A CN 114013443 B CN114013443 B CN 114013443B
Authority
CN
China
Prior art keywords
lane
vehicle
target
automatic driving
speed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111339265.0A
Other languages
Chinese (zh)
Other versions
CN114013443A (en
Inventor
崔建勋
慈玉生
要甲
姜慧夫
曲明成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202111339265.0A priority Critical patent/CN114013443B/en
Publication of CN114013443A publication Critical patent/CN114013443A/en
Application granted granted Critical
Publication of CN114013443B publication Critical patent/CN114013443B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/18Propelling the vehicle
    • B60W30/18009Propelling the vehicle related to particular drive situations
    • B60W30/18163Lane change; Overtaking manoeuvres
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/10Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to vehicle motion
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0015Planning or execution of driving tasks specially adapted for safety

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Control Of Driving Devices And Active Controlling Of Vehicle (AREA)

Abstract

A hierarchical reinforcement learning-based decision control method for lane change of an automatic driving vehicle belongs to the technical field of automatic driving control. The problem of there is the security poor/inefficiency in current autopilot process is solved. The method comprises the steps of establishing a decision neural network with 3 hidden layers by utilizing the speed in the actual driving scene of the automatic driving vehicle, the relative position of the automatic driving vehicle in the surrounding environment and the relative speed information, and training and fitting a Q valuation function to the decision neural network by utilizing a lane changing safety reward function to obtain the action of the maximum Q valuation; the method comprises the steps of establishing an acceleration decision model for deep Q learning by utilizing a reward function corresponding to the following or lane changing action and speed in an actual driving scene of an automatic driving vehicle and relative position information of vehicles in the surrounding environment, obtaining lane changing or following acceleration information, and generating a reference lane changing track by adopting a 5-degree polynomial curve when lane changing. The method is suitable for automatic driving lane change decision and control.

Description

Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning
Technical Field
The invention belongs to the technical field of automatic driving control.
Background
In general, the driving strategy of autonomous driving is a modular component. Roughly divided into 4 levels: (1) a strategic plan layer: generally responsible for the planning of the global path level from the starting point to the end point. The part relates to some related knowledge such as the shortest path, the weighted shortest path, GIS and the like, and the current research and implementation methods are relatively mature; (2) tactical layer decision making: generally, the system is responsible for behavior decisions in a local range in the actual driving process, such as following driving, lane changing, overtaking, accelerating, decelerating and the like; (3) local planning layer: according to the action intention of a tactical decision layer, the tactical decision layer is responsible for generating a track (projector) which is safe and accords with traffic regulations; (4) a vehicle control layer: the layer mainly adopts an optimal control method according to the generated track, and realizes the minimum deviation tracking of the generated track through controlling an accelerator, a brake and a steering wheel of the vehicle.
The lane change decision and the lane change trajectory are respectively key contents in an automatic driving tactical decision layer and a local planning layer, are basic decision behaviors in many driving scenes, and the safety, the efficiency and the quality of the automatic driving decision, the planning and the control are determined to a great extent by the performance level of the lane change decision and the lane change trajectory. The traditional method mainly comprises the following steps: (1) the track-changing decision is realized by adopting a rule-based (such as a finite-state machine) mode, and the track-changing track generation is generated by adopting an optimal control theory; (2) the lane change decision and the execution are bound together, learning is carried out in an end-to-end (end-to-end) mode, and the lane change vehicle control action is directly input from the state and output. In the mode (1), because the method is based on the rules in nature, the method is difficult to generalize under the undefined driving scene, and the rule set under the complex scene is difficult to define, even cannot be realized; although the method (2) is very efficient in decision making and can be well generalized to an undefined scene, the safety of lane changing cannot be completely guaranteed due to the pure learning-based method. In addition, the autopilot strategy is "hierarchical" in nature, i.e., the driving intent is generated first, then the trajectory is generated and the vehicle is controlled according to the intent, and if the decision and control are directly tied together, it is difficult to establish an efficient decision and control method.
Disclosure of Invention
The invention aims to solve the problems of poor safety and low efficiency in the existing automatic driving process, and provides a hierarchical reinforcement learning-based automatic driving vehicle lane change decision control method.
The invention relates to a hierarchical reinforcement learning-based automatic driving vehicle lane change decision control method, which comprises the following steps:
step one, establishing a decision neural network with 3 hidden layers by utilizing the speed in the actual driving scene of the automatic driving vehicle, the relative position of the automatic driving vehicle in the surrounding environment and the relative speed information, and training and fitting a Q valuation function to the decision neural network by utilizing a lane changing safety reward function to obtain the action of the maximum Q valuation;
step two, when the movement with the maximum Q valuation is taken as a lane changing movement, executing step three, when the movement with the maximum Q valuation is taken as a follow-up movement, establishing an acceleration decision model for deep Q learning by using the speed in the actual driving scene of the automatic driving vehicle and a reward function corresponding to the follow-up movement of the vehicle in the surrounding environment and acquiring the follow-up acceleration, and finishing one-time automatic driving decision and control;
thirdly, establishing an acceleration decision model for deep Q learning by using the speed in the actual driving scene of the automatic driving vehicle and the reward function corresponding to the relative position information of the vehicles in the surrounding environment and the lane changing action; acquiring acceleration information of a lane changing action;
step four, generating a reference lane changing track by using the acceleration information of the lane changing action and adopting a 5 th-order polynomial curve;
and step five, controlling the automatic driving vehicle to execute a lane changing action by adopting a pure tracking control method, and finishing one-time automatic driving lane changing decision and control.
Further, in the present invention, the speed in the actual driving scene of the autonomous vehicle and the relative position and relative speed information with the vehicle in the surrounding environment described in the first step, the second step and the third step are:
the relative position of the target automatic driving vehicle and the front vehicle of the current lane is as follows: Δ x leader =|x ego -x leader L, |; wherein x is ego Position coordinates, x, of the target autonomous vehicle in the lane direction leader Automatically driving the position coordinates of the front vehicle of the vehicle along the lane direction for the current lane target;
the relative position of the target automatic driving vehicle and the front vehicle of the target lane is as follows: Δ x target =|x ego -x target L, |; wherein x is target Position coordinates of a front vehicle of the target lane along the lane direction;
the relative position of the target automatic driving vehicle and the rear vehicle of the target lane is as follows: Δ x follow =|x ego -x follow L, |; wherein x is follow Position coordinates of a rear vehicle of the target lane along the lane direction;
targetRelative speed of the autonomous vehicle and the front vehicle of the target lane: Δ v ego =|v ego -v leader L, |; wherein v is ego Speed of the vehicle, v, automatically driven for the purpose leader Automatically driving the speed of the vehicle ahead for the current lane target;
relative speed of the target autonomous vehicle and the target lane front vehicle: Δ v target =|v ego -v target L, |; wherein v is target The speed of a vehicle in front of the target lane along the lane direction;
target autonomous vehicle speed: v. of ego
Target autonomous vehicle acceleration: a is ego
Further, in the present invention, in the first step, the lane change security reward function is:
Figure BDA0003351846860000031
wherein, w 1 ,w 2 ,w 3 ,w 4 The weight coefficient of the relative position of the target automatic driving vehicle and the vehicle in front of the current lane, the weight coefficient of the relative speed of the target automatic driving vehicle and the vehicle in front of the target lane, the relative position of the target automatic driving vehicle and the vehicle in front of the target lane and the weight coefficient of the relative speed of the target automatic driving vehicle and the vehicle in front of the target lane are respectively.
Further, in the present invention, in step one, in the decision neural network with 3 hidden layers, each hidden layer includes 100 neurons.
Further, in the second step, the specific method for establishing the acceleration decision model for deep Q learning is as follows:
taking the environment state as an input, the final Q estimation value of the acceleration decision model is obtained through 3 sub fully-connected neural networks A, B, C respectively:
environmental state: s ═ Δ x leader ,Δx target ,Δx follow ,Δv ego ,Δv target ,v ego ,a ego )
Wherein a represents the longitudinal acceleration needed to be decided;
following reward function:
R dis =-w dis .|x leader -x ego equation two
R v =w v .|v leader -v ego Equation III
R c =R dis +R v Formula four
Wherein R is dis ,R v Respectively representing a distance-related reward function and a speed-related reward function of the following state; w is a dis And w v Weights corresponding to the distance reward and the speed reward in the following state are respectively; r is c Represents the comprehensive reward of the following state and the distance and the speed;
final Q estimation by the acceleration decision model:
Q(s,a)=A(s).(B(s)-R c |a) 2 + C(s) formula five
Wherein R is c | a represents the comprehensive reward obtained in the following state under the condition that the acceleration is a; a,(s), B,(s), C,(s) are respectively the output of 3 sub fully-connected neural networks under the current state s.
Further, in the third step, the specific method for establishing the acceleration decision model of deep Q learning by using the actual driving scene information and speed of the autonomous vehicle and the reward function corresponding to the lane change action of the vehicle in the surrounding environment is as follows:
taking the environment state as an input, the final Q estimation value of the acceleration decision model is obtained through 3 sub fully-connected neural networks A, B, C respectively:
environmental state: s ═ Δ x leader ,Δx target ,Δx follw Δv ego ,Δv target ,v ego ,a ego )
Wherein a represents the longitudinal acceleration needed to be decided;
lane change reward function:
r dis =-w dis .|min(Δx leader ,Δx target )-Δx follow equation six
r v =-w v .|min(v leader ,v target )-v ego Formula seven |
R A =r dis +r v Equation eight
Wherein r is dis ,r v Respectively representing rewards related to distance and speed when changing the lane; w is a dis And w v Respectively awarding corresponding weights for distance and speed in the lane changing state; r A Representing a composite reward related to distance and speed when changing lane status;
final Q value of the acceleration decision model:
Q(s,a)=A(s).(B(s)-R A |a) 2 + C(s) formula nine
Wherein R is A And | a represents the instant reward obtained in the lane change state under the condition that the acceleration is a, and A(s), B(s) and C(s) are respectively the output of 3 sub fully-connected neural networks in the current state s.
Further, in the fourth step of the present invention, a reference lane change trajectory generated by using the acceleration information of the lane change action and adopting a polynomial curve of degree 5 is:
x(t)=a 5 t 5 +a 4 t 4 +a 3 t 3 +a 2 t 2 +a 1 t+a 0 formula ten
y(t)=b 5 t 5 +b 4 t 4 +b 3 t 3 +b 2 t 2 +b 1 t+b 0 Formula eleven
Wherein x (t) is the position coordinate of the track point along the transverse direction of the road at the moment t, y (t) is the position coordinate of the track point along the longitudinal direction of the road at the moment t, t is time, and the parameter a 1 ,...,a 5 ,b 1 ,...,b 5 By the expectation function:
Figure BDA0003351846860000051
is determined by changing a 1 ,...,a 5 ,b 1 ,...,b 5 The value of the target value (A) optimizes an expectation function, so that the distance and risk of the acceleration a of the expectation function corresponding to a reference track at the time T under the constraint of a track planning boundary and the constraint of a traffic speed limit are minimized, and the comfort is maximized, wherein T is a time window of reference track changing planning,
Figure BDA0003351846860000052
travel distance term, w, representing a reference lane change trajectory d P (dangerous | a, t) represents a security risk term for the reference lane change trajectory, w c P (comfort | a, t) denotes a comfort term referring to the lane change trajectory, w d ,w c Weight of risk term and weight of comfort, w, respectively, of the reference trajectory c < 0, P (dangerous | a, t) is the probability of safety risk in the objective function, and P (comfort | a, t) is the degree of comfort in the objective function.
Further, in the present invention, the constraint conditions of the trajectory planning boundary are specifically: bringing the reference trajectory within the lane line:
Figure BDA0003351846860000053
wherein x is min 、y min 、x max And y max Respectively representing the boundary coordinates of the lane lines corresponding to the current vehicle.
Further, in the present invention, the traffic speed limit constraint condition is specifically: and (3) making the site speed at any point of the reference track not exceed the traffic speed limit value:
Figure BDA0003351846860000054
wherein upsilon is x,min υ x,max ,υ y,min ,υ y,max Respectively representing the range of allowable speeds of the autonomous vehicle in both x and y directions.
Further, in the invention, in the fifth step, a pure tracking control method is adopted, and a concrete method for controlling the automatic driving vehicle to execute the lane changing action is as follows:
and controlling the steering wheel angle in the lane changing action process of the automatic driving vehicle by adopting a pure tracking control algorithm according to the generated reference lane changing track:
Figure BDA0003351846860000055
Figure BDA0003351846860000056
delta (t) is a steering wheel angle calculated by a pure tracking control algorithm at the moment t; α (t) is an actual steering wheel angle; l d Is the distance viewed forward and L is the wheelbase of the vehicle.
The method of the invention combines the advantages of generalization and optimal control based on learning mode, and simultaneously, because the lane change decision and the acceleration decision are processed hierarchically by adopting two models, the lane change decision model and the acceleration decision model are adopted to utilize the Q-estimation neural network, so that the processing is more efficient and more accurate, and the method is essentially closer to the human driving lane change behavior of 'lane change intention generation → lane change track generation → lane change action execution', thereby being capable of generating safer, more robust and more efficient decision and control output.
Drawings
FIG. 1 is a schematic diagram of an automated driving lane change based decision and control method of the present invention;
FIG. 2 is a schematic view of lane change scene parameters; in the figure, ego is the target autonomous vehicle, leader is the vehicle ahead of the target autonomous vehicle in the current lane, target is the vehicle ahead of the target autonomous vehicle in the target lane, and follow is the vehicle behind the target autonomous vehicle in the target lane;
FIG. 3 is a network architecture diagram of an acceleration decision model.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
The first embodiment is as follows: the present embodiment is described below with reference to fig. 1, and the method for controlling a lane change decision of an autonomous vehicle based on hierarchical reinforcement learning in the present embodiment includes:
step one, establishing a decision neural network with 3 hidden layers by utilizing the speed in the actual driving scene of the automatic driving vehicle and the relative position and relative speed information of the vehicle in the surrounding environment, and training and fitting a Q valuation function to the decision neural network by utilizing a lane changing safety reward function to obtain the action of the maximum Q valuation;
step two, when the movement with the maximum Q valuation is taken as the lane changing movement, executing step three, when the movement with the maximum Q valuation is taken as the continuous following movement, establishing an acceleration decision model for deep Q learning by utilizing the speed in the actual driving scene of the automatic driving vehicle and the relative position information of the vehicle in the surrounding environment and a reward function corresponding to the following movement, obtaining the following acceleration, and finishing one-time automatic driving decision and control;
thirdly, establishing an acceleration decision model for deep Q learning by using the speed in the actual driving scene of the automatic driving vehicle and the reward function corresponding to the relative position information of the vehicles in the surrounding environment and the lane changing action; acquiring acceleration information of a lane changing action;
step four, generating a reference lane changing track by using the acceleration information of the lane changing action and adopting a 5 th-order polynomial curve;
and step five, controlling the automatic driving vehicle to execute a lane changing action by adopting a pure tracking control method, and finishing one-time automatic driving lane changing decision and control.
In the present embodiment, the input is a lane change request/command and environmental state information. The need to change lanes may come from a higher level of behavioral decision, such as in the case of a cut-in, which is attempted to trigger a lane change request or instruction due to the fact that the vehicle ahead of the lane in which the autonomous target vehicle is located is traveling too slowly, and the autonomous vehicle is gaining a higher driving efficiency benefit. Meanwhile, environmental information (mainly information such as relative position and speed of surrounding environmental vehicles) around the target autonomous vehicle must be synchronously input, and the environmental information is the basis of lane changing decision of the autonomous vehicle.
The method adopts the framework of two decision models, including a lane change decision model and an acceleration decision model. And after receiving the lane change requirement and the environmental state information, the lane change decision model determines whether to change lanes, adjusts the longitudinal acceleration (decision) of the automatic driving vehicle, and further executes the following and lane change behaviors.
Further, the present embodiment is described with reference to fig. 2, and in the present embodiment, the speed in the actual driving scene of the autonomous vehicle and the relative position and relative speed information with respect to the vehicle in the surrounding environment in the first step, the second step, and the third step are:
the relative position of the target automatic driving vehicle and the front vehicle of the current lane is as follows: Δ x leader =|x ego -x leader L; wherein x is ego Position coordinates, x, of the target autonomous vehicle in the lane direction leader Automatically driving position coordinates of a front vehicle of the vehicle along the lane direction for a current lane target;
the relative position of the target automatic driving vehicle and the front vehicle of the target lane is as follows: Δ x target =|x ego -x target L, |; wherein x is target Position coordinates of a vehicle in front of the target lane along the lane direction;
the relative position of the target automatic driving vehicle and the rear vehicle of the target lane is as follows: Δ x follow =|x ego -x follow L, |; wherein x is follow Position coordinates of a rear vehicle of the target lane along the lane direction;
relative speed of the target autonomous vehicle and the target lane front vehicle: Δ v ego =|v ego -v leader L, |; wherein v is ego Speed of the vehicle, v, automatically driven for the purpose leader Automatically driving the speed of the vehicle ahead for the current lane target;
relative speed of the target autonomous vehicle and the target lane front vehicle: Δ v target =|v ego -v target L; wherein v is target The speed of the front vehicle of the target lane along the lane direction;
target autonomous vehicle speed: v. of ego
Target autonomous vehicle acceleration: a is ego
In this embodiment, a schematic diagram of a lane change environment state definition is shown in fig. 2, where Ego is an autonomous vehicle, and the other vehicles are background vehicles. Each vehicle has its own state including 4 pieces of information of position abscissa, position ordinate, speed and acceleration. Ambient state s ═ Δ x leader ,Δx target ,Δx follow ,Δv ego ,Δv target ,v ego ,a ego )。
Further, in the present embodiment, in the step one, the lane change security reward function is:
Figure BDA0003351846860000081
wherein, w 1 ,w 2 ,w 3 ,w 4 The weight coefficients of the relative position of the target automatic driving vehicle and the front vehicle of the current lane, the weight coefficient of the relative speed of the target automatic driving vehicle and the front vehicle of the target lane, the relative position of the target automatic driving vehicle and the front vehicle of the target lane and the weight coefficient of the relative speed of the target automatic driving vehicle and the front vehicle of the target lane are respectively;
in this embodiment, w 1 =0.4,w 2 =0.6,w 3 =0.4,w 4 =0.6。
Further, as described with reference to fig. 2, in the first step of the present embodiment, in the decision neural network with 3 hidden layers, each hidden layer includes 100 neurons.
Further, in the present embodiment, in the second step, a specific method for establishing the acceleration decision model for deep Q learning is as follows:
taking the environment state as an input, and obtaining a final Q estimation value of the acceleration decision model through 3 sub fully-connected neural networks A, B, C respectively:
and (3) environmental state: s ═ Δ x leader ,Δx target ,Δx follow ,Δv ego ,Δv target ,v ego ,a ego )
Wherein a represents the longitudinal acceleration required for decision making;
following reward function:
R dis =-w dis .|x leader -x ego equation two
R v =-w v .|v leader -v ego Equation III
R c =R dis +R v Formula four
Wherein R is dis ,R v Respectively representing a distance-related reward function and a speed-related reward function of the following state; w is a dis And w v Weights corresponding to the distance reward and the speed reward of the following state are respectively set; r c A composite reward representing the following state in relation to distance and speed;
final Q estimation by the acceleration decision model:
Q(s,a)=A(s).(B(s)-R c |a) 2 + C(s) formula five
Wherein R is c | a represents the comprehensive reward obtained in the following state under the condition that the acceleration is a; a,(s), B,(s), C(s) are respectively the output of 3 sub fully-connected neural networks under the current state s.
Further, in the present embodiment, in step three, a specific method of establishing an acceleration decision model for deep Q learning is performed by using an incentive function corresponding to lane change actions and actual driving scene information and speed of an autonomous vehicle, and relative position information of vehicles in the surrounding environment:
taking the environment state as an input, the final Q estimation value of the acceleration decision model is obtained through 3 sub fully-connected neural networks A, B, C respectively:
environmental state: s ═ Δ x leader ,Δx target ,Δx follow ,Δv ego ,Δv target ,v ego ,a ego )
Wherein a represents the longitudinal acceleration needed to be decided;
lane change reward function:
r dis =-w dis .|min(Δx leader ,Δx target )-Δx follow equation six
r v =-w v .|min(v leader ,v target )-v ego Formula seven |
R A =r dis +r v Equation eight
Wherein r is dis ,r v Respectively representing rewards related to distance and speed when changing the lane; w is a dis And w v Respectively rewarding distance and speed corresponding to the weight in the lane changing state; r A Representing a composite reward associated with distance and speed when changing lane status.
Final Q value of the acceleration decision model:
Q(s,a)=A(s).(B(s)-R A |a) 2 + C(s) formula nine
Wherein R is A And | a represents the instant reward obtained in the lane change state under the condition that the acceleration is a, and A(s), B(s) and C(s) are respectively the output of 3 sub fully-connected neural networks in the current state s.
In this embodiment, the acceleration decision model receives a decision output from the lane change decision model, i.e., whether to change lanes. And if the lane is not changed, triggering the following behavior, and if the lane is changed, triggering the lane changing behavior. As shown in fig. 1, the acceleration decision model is responsible for deciding a longitudinal acceleration (continuous value along the road direction), and the acceleration decision model is responsible for generating a safe track and then controlling the vehicle to track the generated track. In this embodiment, the actual driving scene information, the speed, and the relative position information of the vehicle in the surrounding environment of the autonomous vehicle are the environment states, the three sub fully-connected neural networks include three sub fully-connected neural networks, and each sub fully-connected neural network includes 200 neurons.
Further, in the fourth step of the present embodiment, the acceleration information of the lane change operation is utilized, and a 5 th-order polynomial curve is adopted to generate a reference lane change track as follows:
x(t)=a 5 t 5 +a 4 t 4 +a 3 t 3 +a 2 t 2 +a 1 t+a 0 formula ten
y(t)=b 5 t 5 +b 4 t 4 +b 3 t 3 +b 2 t 2 +b 1 t+b 0 Formula eleven
Wherein x (t) is the position coordinate of the track point at the time t along the longitudinal direction of the road, y (t) is the position coordinate of the track point at the time t along the transverse direction of the road, t is time, and the parameter a 1 ,...,a 5 ,b 1 ,...,b 5 By the expectation function:
Figure BDA0003351846860000101
is determined by changing a 1 ,...,a 5 ,b 1 ,...,b 5 The value of the target value (A) optimizes an expectation function, so that the distance and risk of the acceleration a of the expectation function corresponding to a reference track at the time T under the constraint of a track planning boundary and the constraint of a traffic speed limit are minimized, and the comfort is maximized, wherein T is a time window of reference track changing planning,
Figure BDA0003351846860000102
travel distance term, w, representing a reference lane change trajectory d P (dangerous | a, t) denotes a reference frameSafety risk item of track, w c P (comfort | a, t) denotes a comfort term referring to the lane change trajectory, w d ,w c Weight of risk term and weight of comfort, w, of reference trajectory, respectively c < 0, P (dangerous | a, t) is the probability of safety risk in the objective function, and P (comfort | a, t) is the degree of comfort in the objective function.
In the present embodiment, the trajectory of the autonomous vehicle following or changing lanes is planned next based on the acceleration a output by the acceleration decision model. The planning of the track is based on two indexes, which are respectively: safety and comfort. Firstly, a reference track-changing track is generated by adopting a polynomial curve of degree 5, and the safety is embodied by the distance and the risk of the reference track.
Further, in this embodiment, the constraint conditions of the trajectory planning boundary are specifically: bringing the reference trajectory within the lane line:
Figure BDA0003351846860000111
wherein x is min 、y min 、x max And y max Respectively representing the boundary coordinates of the lane lines corresponding to the current vehicle.
Further, in this embodiment, the traffic speed limit constraint condition is specifically: and (3) making the site speed at any point of the reference track not exceed the traffic speed limit value:
Figure BDA0003351846860000112
wherein v is x,min 、v x,max 、v y,min And v y,max Representing the range of allowable speeds of the autonomous vehicle in both the lateral y and longitudinal x directions of the road, respectively.
Further, in the present embodiment, in the fifth step, a pure tracking control method is adopted, and a specific method of controlling the autonomous vehicle to perform the lane change operation is as follows:
and controlling the steering wheel angle in the lane changing process of the automatic driving vehicle by adopting a pure tracking control algorithm according to the generated reference lane changing track:
Figure BDA0003351846860000113
Figure BDA0003351846860000114
delta (t) is a steering wheel rotation angle calculated by a pure tracking control algorithm at the moment t; α (t) is an actual steering wheel angle; l d Is the distance viewed forward and L is the wheelbase of the vehicle.
Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that various dependent claims and the features described herein may be combined in ways different from those described in the original claims. It is also to be understood that features described in connection with individual embodiments may be used in other described embodiments.

Claims (9)

1. A decision control method for lane change of an automatic driving vehicle based on layered reinforcement learning is characterized by comprising the following steps:
step one, establishing a decision neural network with 3 hidden layers by using the speed in the actual driving scene of an automatic driving vehicle and the relative position and relative speed information of the vehicle in the surrounding environment, and training and fitting a Q valuation function to the decision neural network by using a lane changing safety reward function to obtain the action of the maximum Q valuation;
the lane change security reward function is:
Figure FDA0003793650220000011
wherein w 1 ,w 2 ,w 3 ,w 4 The weight coefficients of the relative positions of the target automatic driving vehicle and the vehicle in front of the current lane, the weight coefficient of the relative speed of the target automatic driving vehicle and the vehicle in front of the current lane, the relative position of the target automatic driving vehicle and the vehicle in front of the current lane and the weight coefficient of the relative speed of the target automatic driving vehicle and the vehicle in front of the current lane are respectively;
step two, when the movement with the maximum Q valuation is taken as a lane changing movement, executing step three, when the movement with the maximum Q valuation is taken as a follow-up movement, establishing an acceleration decision model for deep Q learning by using the speed in the actual driving scene of the automatic driving vehicle and a reward function corresponding to the follow-up movement of the vehicle in the surrounding environment and acquiring the follow-up acceleration, and finishing one-time automatic driving decision and control;
thirdly, establishing an acceleration decision model for deep Q learning by using the speed in the actual driving scene of the automatic driving vehicle and the reward function corresponding to the relative position information of the vehicles in the surrounding environment and the lane changing action; acquiring acceleration information of a lane changing action;
step four, generating a reference lane changing track by using the acceleration information of the lane changing action and adopting a 5 th-order polynomial curve;
and step five, controlling the automatic driving vehicle to execute a lane changing action by adopting a pure tracking control method, and finishing one-time automatic driving lane changing decision and control.
2. The method as claimed in claim 1, wherein the speed in the actual driving scene of the autonomous vehicle and the relative position and relative speed information with the vehicle in the surrounding environment in the first step, the second step and the third step are:
the relative position of the target automatic driving vehicle and the front vehicle of the current lane is as follows: Δ x leader =|x ego -x leader L, |; wherein x is ego To the eyesPosition coordinate, x, of a target autonomous vehicle in the direction of the lane leader Automatically driving position coordinates of a front vehicle of the vehicle along the lane direction for a current lane target;
the relative position of the target automatic driving vehicle and the front vehicle of the target lane is as follows: Δ x target =|x ego -x target L; wherein x is target Position coordinates of a vehicle in front of the target lane along the lane direction;
the relative position of the target automatic driving vehicle and the rear vehicle of the target lane is as follows: Δ x follow =|x ego -x follow L; wherein x is follow Position coordinates of a rear vehicle of the target lane along the lane direction;
the relative speed of the target automatic driving vehicle and the vehicle in front of the current lane: Δ v ego =|v ego -v leader L, |; wherein v is ego Speed of the vehicle, v, automatically driven for the purpose leader Automatically driving the speed of the vehicle ahead for the current lane target;
relative speed between the target autonomous vehicle and the vehicle in front of the target lane: Δ v target =|v ego -v target L, |; wherein v is target The speed of the front vehicle of the target lane along the lane direction;
target autonomous vehicle speed: v. of ego
Target autonomous vehicle acceleration: a is ego
3. The method for controlling the lane change decision of the automatic driving vehicle based on the hierarchical reinforcement learning as claimed in claim 1 or 2, wherein in the step one, in the decision neural network with 3 hidden layers, each hidden layer comprises 100 neurons.
4. The automatic driving vehicle lane change decision control method based on the hierarchical reinforcement learning according to claim 1 or 2, characterized in that in the second step, a specific method for establishing an acceleration decision model of deep Q learning is as follows:
taking the environment state as an input, the final Q estimation value of the acceleration decision model is obtained through 3 sub fully-connected neural networks A, B, C respectively:
environmental state: s ═ Δ x leader ,Δx target ,Δx follow ,Δv ego ,Δv target ,v ego ,a ego )
Wherein a represents the longitudinal acceleration needed to be decided;
following reward function:
R dis =-w dis .|x leader -x ego equation two
R v =-w v .|v leader -v ego Equation III
R c =R dis +R v Formula four
Wherein R is dis ,R v Respectively representing a distance-related reward function and a speed-related reward function of the following state; w is a dis And w v Weights corresponding to the distance reward and the speed reward of the following state are respectively set; r c A composite reward representing the following state in relation to distance and speed;
final Q estimation by the acceleration decision model:
Q(s,a)=A(s).(B(s)-R c |a) 2 + C(s) formula five
Wherein R is c A represents the comprehensive reward obtained in the following state under the condition that the acceleration is a; a,(s), B,(s), C,(s) are respectively the output of 3 sub fully-connected neural networks under the current state s.
5. The method for controlling the lane change decision of the automatic driving vehicle based on the hierarchical reinforcement learning as claimed in claim 1 or 2, wherein in the third step, a concrete method for establishing an acceleration decision model of deep Q learning by using the actual driving scene information, speed of the automatic driving vehicle and the reward function corresponding to the lane change action of the relative position information of the vehicle in the surrounding environment is as follows:
taking the environment state as an input, the final Q estimation value of the acceleration decision model is obtained through 3 sub fully-connected neural networks A, B, C respectively:
and (3) environmental state: s ═ Δ x leader ,Δx target ,Δx follow ,Δv ego ,Δv target ,v ego ,a ego )
Wherein a represents the longitudinal acceleration required for decision making;
lane change reward function:
r dis =-w dis .|min(Δx leader ,Δx target )-Δx follow equation six
r v =-w v .|min(v leader ,v target )-v ego Formula seven |
R A =r dis +r v Equation eight
Wherein r is dis ,r v Respectively representing rewards related to distance and speed when changing the lane; w is a dis And w v Respectively rewarding distance and speed corresponding to the weight in the lane changing state; r A Representing a composite reward related to distance and speed when changing lane status;
final Q value of the acceleration decision model:
Q(s,a)=A(s).(B(s)-R A |a) 2 + C(s) formula nine
Wherein R is A And | a represents the instant reward obtained in the lane change state under the condition that the acceleration is a, and A(s), B(s) and C(s) are respectively the output of 3 sub fully-connected neural networks in the current state s.
6. The automatic driving vehicle lane change decision-making control method based on hierarchical reinforcement learning as claimed in claim 5, characterized in that in step four, the acceleration information of the lane change action is utilized, and a 5 th-order polynomial curve is adopted to generate a reference lane change track as follows:
x(t)=a 5 t 5 +a 4 t 4 +a 3 t 3 +a 2 t 2 +a 1 t+a 0 equation of ten
y(t)=b 5 t 5 +b 4 t 4 +b 3 t 3 +b 2 t 2 +b 1 t+b 0 Formula eleven
Wherein x (t) is the position coordinate of the track point along the transverse direction of the road at the moment t, y (t) is the position coordinate of the track point along the longitudinal direction of the road at the moment t, t is time, and the parameter a 1 ,...,a 5 ,b 1 ,...,b 5 By the expectation function:
Figure FDA0003793650220000041
is determined by changing a 1 ,...,a 5 ,b 1 ,...,b 5 The value of the target value (A) optimizes an expectation function, so that the distance and risk of the acceleration a of the expectation function corresponding to a reference track at the time T under the constraint of a track planning boundary and the constraint of a traffic speed limit are minimized, and the comfort is maximized, wherein T is a time window of reference track changing planning,
Figure FDA0003793650220000042
travel distance term, w, representing a reference lane change trajectory d P (dangerous | a, t) represents a security risk term for the reference lane change trajectory, w c P (comfort | a, t) denotes a comfort term referring to the lane change trajectory, w d ,w c Weight of risk term and weight of comfort, w, respectively, of the reference trajectory c < 0, P (dangerous | a, t) is the probability of safety risk in the objective function, and P (comfort | a, t) is the degree of comfort in the objective function.
7. The method for controlling the lane change decision of the automatic driving vehicle based on the hierarchical reinforcement learning as claimed in claim 5, wherein the constraint conditions of the trajectory planning boundary are specifically as follows: bringing the reference trajectory within the lane line:
Figure FDA0003793650220000043
wherein x is min 、y min 、x max And y max Respectively representing the boundary coordinates of the lane lines corresponding to the current vehicle.
8. The method for controlling the lane change decision of the automatic driving vehicle based on the hierarchical reinforcement learning as claimed in claim 5, wherein the traffic speed limit constraint conditions are specifically as follows: and (3) making the site speed at any point of the reference track not exceed the traffic speed limit value:
Figure FDA0003793650220000044
wherein v is x,min 、v x,max 、v y,min And v y,max Representing the range of allowable speeds of the autonomous vehicle in both the lateral y and longitudinal x directions of the road, respectively.
9. The decision-making method for lane change of the automatic driving vehicle based on the hierarchical reinforcement learning as claimed in claim 2, wherein in the fifth step, a pure tracking control method is adopted to control the automatic driving vehicle to execute a specific method of lane change action:
according to the reference lane changing track, a pure tracking control algorithm is adopted to control the steering wheel angle in the lane changing action process of the automatic driving vehicle:
Figure FDA0003793650220000051
Figure FDA0003793650220000052
delta (t) is a steering wheel angle calculated by a pure tracking control algorithm at the moment t; alpha (t) is the actual steering wheel angle of the automatic driving vehicle at the time t; l d Is the distance viewed forward and L is the wheelbase of the vehicle.
CN202111339265.0A 2021-11-12 2021-11-12 Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning Active CN114013443B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111339265.0A CN114013443B (en) 2021-11-12 2021-11-12 Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111339265.0A CN114013443B (en) 2021-11-12 2021-11-12 Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning

Publications (2)

Publication Number Publication Date
CN114013443A CN114013443A (en) 2022-02-08
CN114013443B true CN114013443B (en) 2022-09-23

Family

ID=80063836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111339265.0A Active CN114013443B (en) 2021-11-12 2021-11-12 Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning

Country Status (1)

Country Link
CN (1) CN114013443B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023141940A1 (en) * 2022-01-28 2023-08-03 华为技术有限公司 Intelligent driving method and device, and vehicle
CN114880938B (en) * 2022-05-16 2023-04-18 重庆大学 Method for realizing decision of automatically driving automobile behavior
CN114802307B (en) * 2022-05-23 2023-05-05 哈尔滨工业大学 Intelligent vehicle transverse control method under automatic and manual mixed driving scene
CN115116249B (en) * 2022-06-06 2023-08-01 苏州科技大学 Method for estimating different permeability and road traffic capacity of automatic driving vehicle
CN115082900B (en) * 2022-07-19 2023-06-16 湖南大学无锡智能控制研究院 Intelligent vehicle driving decision system and method in parking lot scene
CN117275240B (en) * 2023-11-21 2024-02-20 之江实验室 Traffic signal reinforcement learning control method and device considering multiple types of driving styles

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10703370B2 (en) * 2018-08-24 2020-07-07 Ford Global Technologies, Llc Vehicle action control
JP7048456B2 (en) * 2018-08-30 2022-04-05 本田技研工業株式会社 Learning devices, learning methods, and programs
CN111413957B (en) * 2018-12-18 2021-11-02 北京航迹科技有限公司 System and method for determining driving actions in autonomous driving
CN109901574B (en) * 2019-01-28 2021-08-13 华为技术有限公司 Automatic driving method and device
WO2020159247A1 (en) * 2019-01-31 2020-08-06 엘지전자 주식회사 Image output device
CN115578711A (en) * 2019-05-21 2023-01-06 华为技术有限公司 Automatic channel changing method, device and storage medium
CN110716562A (en) * 2019-09-25 2020-01-21 南京航空航天大学 Decision-making method for multi-lane driving of unmanned vehicle based on reinforcement learning
CN110673602B (en) * 2019-10-24 2022-11-25 驭势科技(北京)有限公司 Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment
CN111273668B (en) * 2020-02-18 2021-09-03 福州大学 Unmanned vehicle motion track planning system and method for structured road
CN112498354B (en) * 2020-12-25 2021-11-12 郑州轻工业大学 Multi-time scale self-learning lane changing method considering personalized driving experience

Also Published As

Publication number Publication date
CN114013443A (en) 2022-02-08

Similar Documents

Publication Publication Date Title
CN114013443B (en) Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning
CN110969848B (en) Automatic driving overtaking decision method based on reinforcement learning under opposite double lanes
Sun et al. A fast integrated planning and control framework for autonomous driving via imitation learning
Zhang et al. Adaptive decision-making for automated vehicles under roundabout scenarios using optimization embedded reinforcement learning
Chen et al. Conditional DQN-based motion planning with fuzzy logic for autonomous driving
CN114407931B (en) Safe driving decision method for automatic driving operation vehicle of high class person
CN110187639A (en) A kind of trajectory planning control method based on Parameter Decision Making frame
CN111679660B (en) Unmanned deep reinforcement learning method integrating human-like driving behaviors
Godoy et al. A driverless vehicle demonstration on motorways and in urban environments
Chiang et al. Embedded driver-assistance system using multiple sensors for safe overtaking maneuver
CN114564016A (en) Navigation obstacle avoidance control method, system and model combining path planning and reinforcement learning
Xu et al. A nash Q-learning based motion decision algorithm with considering interaction to traffic participants
CN113581182B (en) Automatic driving vehicle lane change track planning method and system based on reinforcement learning
Fehér et al. Hierarchical evasive path planning using reinforcement learning and model predictive control
CN113264043A (en) Unmanned driving layered motion decision control method based on deep reinforcement learning
Yan et al. A multi-vehicle game-theoretic framework for decision making and planning of autonomous vehicles in mixed traffic
Zhang et al. Structured road-oriented motion planning and tracking framework for active collision avoidance of autonomous vehicles
Lattarulo et al. Real-time trajectory planning method based on n-order curve optimization
Siboo et al. An empirical study of ddpg and ppo-based reinforcement learning algorithms for autonomous driving
Guo et al. Self-defensive coordinated maneuvering of an intelligent vehicle platoon in mixed traffic
Ruan et al. Longitudinal planning and control method for autonomous vehicles based on a new potential field model
CN113353102B (en) Unprotected left-turn driving control method based on deep reinforcement learning
Bellingard et al. Adaptive and Reliable Multi-Risk Assessment and Management Control Strategy for Autonomous Navigation in Dense Roundabouts
Cardamone et al. Advanced overtaking behaviors for blocking opponents in racing games using a fuzzy architecture
Duan et al. Encoding Distributional Soft Actor-Critic for Autonomous Driving in Multi-Lane Scenarios [Research Frontier][Research Frontier]

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant