CN113264059B

CN113264059B - Unmanned vehicle motion decision control method supporting multiple driving behaviors and based on deep reinforcement learning

Info

Publication number: CN113264059B
Application number: CN202110533119.5A
Authority: CN
Inventors: 黄志清; 曲志伟
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-05-17
Filing date: 2021-05-17
Publication date: 2022-10-11
Anticipated expiration: 2041-05-17
Also published as: CN113264059A

Abstract

The invention discloses a deep reinforcement learning-based unmanned vehicle motion decision control method supporting multiple driving behaviors, which divides the driving behaviors into 'decision of key action' and 'execution of key action', realizes decision control supporting multiple driving behaviors, avoids repeated modeling of the same task, and greatly reduces the complexity of a model. Meanwhile, a deep reinforcement learning algorithm is introduced, and a mode of interactive learning of an intelligent agent and the environment is adopted, so that the problem that data is difficult to obtain and complicated data preprocessing work are avoided. The scheme solves the problem of multi-driving-behavior-oriented motion decision control and the problem that an end-to-end motion decision control model based on learning supports multi-driving behaviors, so that an unmanned vehicle can finish different driving behaviors according to corresponding upper-layer behavior instructions, and autonomous driving of the vehicle is realized.

Description

Unmanned vehicle motion decision control method supporting multiple driving behaviors and based on deep reinforcement learning

Technical Field

The invention relates to the field of artificial intelligence unmanned driving and an end-to-end vehicle decision control technology, in particular to an unmanned vehicle motion decision control scheme for multiple driving behaviors, which is realized based on a deep reinforcement learning algorithm in a scene needing multiple driving behaviors to be executed alternatively such as a highway.

Background

Currently, research and application of unmanned vehicles are actively deployed in multiple countries in the United states, japan and Europe and promoted to the national strategic level, china also puts unmanned vehicle technology into the national innovation strategic plan, unmanned driving can reduce 90% of traffic accidents, the time consumed by commuting and energy consumption can be effectively reduced, hundreds of millions of tons of automobile carbon dioxide emission can be reduced every year, and research and development of unmanned driving technology are significant for promoting the national innovation strategic technology and solving the national requirements in the fields of traffic, energy and the like. The unmanned vehicle is a product of automobile, artificial intelligence and communication cross-border fusion, is also the most important component of a future smart city, can comprehensively utilize environmental information such as roads, vehicle positions, obstacles and the like sensed by a vehicle-mounted sensing system, automatically plans a driving route and controls the vehicle to reach a preset target, so that the vehicle can safely and reliably drive on the roads. The motion decision control link is responsible for executing driving behaviors in the unmanned system, motion control information is output according to road traffic environment and the self condition of the vehicle to control the vehicle to finish behavior instructions (such as following, lane changing, overtaking and the like), the motion decision control link is an important component in unmanned research, along with the fact that unmanned technology is combined with vehicles such as private cars, buses and the like to be gradually applied to open road scenes, the unmanned vehicles need to execute various driving behaviors such as lane changing, overtaking, following and the like alternately to finish driving tasks, and diversification and synergy of the driving behaviors provide challenges for motion decision control schemes.

With the development of artificial intelligence technology, an end-to-end unmanned vehicle motion decision control model is formed by utilizing deep learning and reinforcement learning methods, and becomes a research hotspot in the field of unmanned vehicle motion decision control. The learning-based end-to-end decision control directly outputs vehicle control signals such as an accelerator, a brake and a steering wheel after acquiring vehicle and driving environment information by training a neural network, does not split a motion decision control task, and has strong intelligence. Early pilotnets of NVIDIA are typical representatives, and a Convolutional Neural Network (CNN) is trained to establish direct mapping between environmental image information and corresponding steering wheel control quantity when a human driver drives a vehicle, so that end-to-end horizontal steering control of a human-like unmanned vehicle is realized, good results are obtained in a simulated scene and a real road, but the training process depends on a large amount of data sets. Later, this problem was solved by the introduction of reinforcement learning algorithm, which can utilize unlabeled data to decide the correct action sequence by using artificially set reward function as the evaluation mechanism of the strategy, thus alleviating the data dependency problem of deep learning method and increasing the interpretability and adjustability of the decision process. However, the existing end-to-end model generally only aims at a single driving behavior, and an unmanned vehicle is used as the highest automatic driving level, and needs to support diversified driving behaviors to guide the unmanned vehicle under the environment of a complex road structure and dynamic traffic flow, so that safe and man-like autonomous driving is realized, and therefore the end-to-end unmanned vehicle motion decision control model facing multiple driving behaviors becomes an urgent problem to be solved in the field of unmanned driving.

In view of the above, the invention realizes a multi-driving-behavior-oriented unmanned vehicle motion decision control scheme based on a deep reinforcement learning algorithm under a layered architecture. Based on a branch selection structure under a layered architecture, the driving behaviors are divided into 'decision of key action' and 'execution of key action', so that decision control supporting various driving behaviors is realized, repeated modeling of the same task is avoided, and the complexity of a model is greatly reduced. Meanwhile, a deep reinforcement learning algorithm is introduced, and a mode of interactive learning of an intelligent agent and the environment is adopted, so that the problem that data is difficult to obtain and complicated data preprocessing work are avoided. The scheme solves the problem of motion decision control facing multiple driving behaviors.

Disclosure of Invention

The invention mainly aims to provide a multi-driving-behavior-oriented unmanned vehicle motion decision control scheme, and aims to solve the problem that an end-to-end motion decision control model based on learning supports multiple driving behaviors, so that an unmanned vehicle can finish different driving behaviors according to corresponding upper-layer behavior instructions, and the autonomous driving of the vehicle is realized. The specific scheme is shown in figure 1.

The technical scheme adopted by the invention is the structural improvement of an end-to-end unmanned vehicle decision control model based on a deep reinforcement learning algorithm, and the unmanned vehicle motion decision control oriented to multiple driving behaviors is realized. The scheme is that driving behaviors are decomposed into 'decisions of key actions' and 'execution of key actions', the implementation of each driving behavior can be generalized to the change of a horizontal and vertical expected state of a vehicle when the vehicle is running, decision control supporting multiple driving behaviors is realized by introducing action instructions, the upper layer of a model is a branch selection structure formed by independent action decision layers corresponding to each driving behavior so as to ensure that different dynamic decision rules of each driving behavior and the dynamic environment information of each side weight can be fully considered, action instructions with the same property output by each action decision layer are all executed by a unified vehicle control layer, a discrete action decision problem of the action decision layers is solved by a DQN (Deep Q-learning Network) Deep reinforcement learning algorithm, a continuous control signal output problem of the vehicle control layer is solved by a DDPG (Deep learning algorithm), a PyTorch Deep learning framework is used for building a neural Network, a selected development language is Python3.5, a model can receive multiple driving behavior information, and the corresponding vehicle control signal of the corresponding driving behavior is output by combining with the corresponding environment control signal.

The model structure is shown in fig. 2 and mainly comprises the following parts:

1 lane change driving behavior

Under the multi-lane road scene, the unmanned vehicle drives in the current lane in an expected state, and when the lane change driving behavior instruction is received, the unmanned vehicle can be controlled to enter the adjacent lane to stably drive in a smooth lane change track.

2 overtaking driving behavior

In a linear overtaking scene, if a target vehicle which runs at the same speed as the unmanned vehicle exists in an adjacent lane, the unmanned vehicle increases the target expected speed of the unmanned vehicle after receiving an overtaking driving action command until the target vehicle is completely overtaken, and then the unmanned vehicle returns to the original speed to run again.

3 following a car driving behavior

When the unmanned vehicle detects that a vehicle running at a low speed appears in front in the process of running, the unmanned vehicle can actively reduce the longitudinal expected speed of the unmanned vehicle when the unmanned vehicle runs, the unmanned vehicle stably runs behind the unmanned vehicle while keeping a safe distance with the front vehicle, and the unmanned vehicle can recover the expected running speed after the front vehicle changes lanes or the speed of the front vehicle is increased.

The scheme is realized as follows:

firstly, dividing the decomposition granularity of the transverse and longitudinal states of the unmanned vehicle according to a driving scene, establishing a target driving state when the unmanned vehicle runs, then carrying out vehicle control layer training based on the target driving state, finally creating corresponding reward functions aiming at different driving behaviors such as following, lane changing, overtaking and the like, and respectively training an action decision network of each driving behavior by combining the vehicle control layer.

The virtual environment of VirtualEnv is configured under a Ubuntu server, a TORCS simulation platform is set up, a simulation platform configuration file is modified, and the initial position of the unmanned vehicle and the running state of surrounding dynamic vehicles are initialized for use by the method. The unmanned vehicle carries out interactive training through the built simulation environment, and then a neural network model forming motion decision control is obtained.

S1) an unmanned vehicle learns lane keeping based on a target driving state;

1) Determining a lateral offset position;

2) Determining a longitudinal desired speed;

s2) learning lane change basic behaviors by changing the transverse offset position;

1) Dividing a horizontal deviation minimum unit;

2) Changing the self transverse offset position of the unmanned vehicle in the neural network;

s3) learning overtaking basic behaviors by changing the longitudinal expected speed;

1) Dividing the minimum change amplitude of the longitudinal speed;

2) Adding a condition control instruction representing a longitudinal expected speed in a neural network input state space;

s4) combining with a dynamic decision process of learning lane changing and overtaking driving behaviors of surrounding vehicles;

1) Deciding an action instruction based on the dynamic environment;

2) Achieving a target driving behavior;

s5) learning following driving behaviors based on front target vehicle

1) The unmanned vehicle learns a deceleration avoidance strategy;

2) Adjusting a vehicle speed based on a forward vehicle state;

in the above steps, the specific implementation method is as follows:

the contents of S1) are that a vehicle control layer realizes the stable running of the unmanned vehicle in a target driving state, an initial transverse offset position and a longitudinal expected speed are firstly established for the unmanned vehicle as the target driving state, then a reward function is defined around the target driving state, the unmanned vehicle is guided to learn an output strategy according with the target driving state, so that a reasonable vehicle control signal is decided, and the unmanned vehicle can stably run in a current lane at the target speed, as shown in FIG. 3.

The contents stated in S2) are that the unmanned vehicle learns to stably run in each lane at the target speed, first, a plurality of lateral deviation point locations are determined according to a driving scene, then, the point location deviation is subtracted from the "lateral position" in the neural network input state space of the vehicle control layer, and after a period of stability training, the unmanned vehicle can stably run along each lateral point location at the target speed, and this part and the part of S3) are all completed by the vehicle control layer by changing the target driving state in a static road scene that does not include other traffic participants, as shown in fig. 4.

S3) the unmanned vehicle learns to run in each lane at different target speeds, a condition control instruction is added into an input state space of a vehicle control layer neural network to adjust the longitudinal expected speed of the vehicle during running, speed grades are divided according to the lowest speed limit and the highest speed limit in a driving scene, the robustness of the unmanned vehicle is trained and enhanced under each vehicle speed state respectively to adapt to different vehicle speed running states, and then the overtaking driving behavior is realized by adjusting the vehicle speed, as shown in figure 5.

And S4) taking lane change and overtaking driving behaviors as independent action decision layers respectively, taking expected changes of transverse deviation and longitudinal speed as action instructions, and transmitting the action instructions to a lower vehicle control layer to change a target driving state when the vehicle runs, wherein in the action instruction decision process, the action instructions are selectively issued and appropriately adjusted according to time by referring to dynamic information of other surrounding vehicles and the like to ensure the dynamic safety of the driving behavior execution process, as shown in FIG 6.

S5) the content is an action decision layer independent from the following driving action, according to the detected front vehicle information, the longitudinal expected speed of the unmanned vehicle is dynamically adjusted by outputting acceleration and deceleration action instructions to a vehicle control layer, the unmanned vehicle stably follows the rear of the unmanned vehicle while keeping a safe distance with the front vehicle, the following driving action is realized, and when no vehicle exists in front of the unmanned vehicle, the unmanned vehicle can be recovered to normally drive at the original expected speed.

The motion decision control model receives the driving behavior command, outputs a vehicle control signal and completes the decision control process facing to the multiple driving behaviors, which is the key point of the invention. Firstly, the model tries to receive a driving behavior instruction, if the driving behavior instruction is not received, a vehicle control layer controls the vehicle to keep a lane based on a target driving state, and if the driving behavior instruction is received, a corresponding action decision layer is selected through a branch architecture; then the action decision layer outputs an action instruction sequence and transmits the action instruction sequence to a vehicle control layer in real time until the target driving behavior is finished; and finally, the vehicle control layer receives the action command, updates the target driving state, and performs cycle control on the vehicle to complete the action command. The specific decision-making control flow is as follows,

compared with the existing unmanned vehicle motion decision control scheme, the motion decision control scheme for the multi-driving behaviors provided by the invention has the following benefits:

the scheme of the invention is based on a branch selection structure under a layered architecture, and a deep reinforcement learning algorithm is utilized to disassemble and model a motion decision control process, so that the scheme is more flexible and has stronger expandability compared with the traditional end-to-end motion decision control scheme, and is more suitable for a driving scene facing multiple driving behaviors.

The invention divides the driving behaviors into 'decision of key action' and 'execution of key action', summarizes the realization principle of each driving behavior into the change of the horizontal and vertical expected state of the vehicle when driving, realizes the decision control supporting various driving behaviors by introducing action instructions, avoids repeated modeling of the same task and greatly reduces the complexity of the model.

3, the invention introduces a deep reinforcement learning algorithm in unmanned vehicle motion decision control, and adopts a mode of interactive learning of an intelligent agent and the environment, thereby avoiding the problem of difficult data acquisition and complicated data preprocessing work.

Drawings

Fig. 1 is a system architecture diagram.

FIG. 2 is a model structure diagram.

FIG. 3 is a vehicle control scheme.

Fig. 4 is a diagram of a lateral control scheme.

Fig. 5 is a diagram of a vertical control scheme.

Fig. 6 is a diagram of an action decision scheme.

Detailed Description

For the purpose of promoting a better understanding of the objects, aspects and advantages of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings. The specific steps of the embodiment are as follows:

step 001: the driving behavior involved is selected and a motion decision control model framework is created, as shown in fig. 2.

Step 002: an initial lateral offset position and a longitudinal desired speed are established as a target driving state, so that the unmanned vehicle can stably drive in the current lane at the target speed, as shown in fig. 3.

Step 003: changing the lateral offset in the vehicle control layer neural network input allows the unmanned vehicle to learn to drive stably in each lane at the target speed, as shown in fig. 4.

Step 004: adding a condition control command into an input state space of a neural network of a vehicle control layer to adjust a longitudinal expected speed when the vehicle runs, and enabling the unmanned vehicle to learn to run in each lane at different target speeds, as shown in fig. 5.

Step 005: independent action decision layers are respectively established based on lane changing and overtaking driving behaviors, and the expected changes of transverse deviation and longitudinal speed obtained by decomposing the execution process of the driving behaviors are used as action commands and transmitted to a lower vehicle control layer, as shown in fig. 6.

Step 006: and the independent action decision layer of the following driving behavior outputs acceleration and deceleration action instructions to the vehicle control layer according to the detected vehicle information in front, and stably drives along the rear part of the vehicle while keeping a safe distance with the vehicle in front.

Step 007: and extracting the trained action decision network model of each driving behavior, and combining the action decision network model and the vehicle control layer into a complete unmanned vehicle motion decision control model facing multiple driving behaviors.

In the implementation case of the invention, the driving behaviors are selected from the driving behaviors which can be related under a plurality of driving scenes, including driving behaviors such as following, changing lanes, straight-line overtaking and the like, a virtual environment of VirtualEnv is built on a Ubuntu server, and the human-computer interaction in the training is based on a TORCS simulation platform. The model continuously tries to receive driving behavior instructions, the received driving behavior instructions are transmitted to a meta-action decision layer in a model frame established in the step 001, a corresponding action decision network is selected in the step 007, a decision task of the driving behavior is established, the action decision layer outputs an action instruction sequence and transmits the action instruction sequence to a vehicle control layer in real time through the step 005/006, finally the vehicle control layer receives the action instructions, the target driving state is updated through the step 003/004, and vehicle control signals are output, so that a decision control process facing multiple driving behaviors is completed.

Claims

1. The unmanned vehicle motion decision control method based on deep reinforcement learning and supporting multiple driving behaviors is characterized by comprising the following steps: the driving behaviors are decomposed into 'decisions of key actions' and 'execution of key actions', the implementation of each driving behavior is summarized into the change of a horizontal and vertical expected state of a vehicle when the vehicle is driven, the decision control supporting various driving behaviors is realized by introducing action instructions, the upper layer of a model is a branch selection structure consisting of independent action decision layers corresponding to each driving behavior, the action instructions with the same property output by each action decision layer are all submitted to a unified vehicle control layer for execution, the discrete action decision problem of the action decision layers is solved by using a DQN (differential Quadrature reference network) deep reinforcement learning algorithm, the continuous control signal output of the vehicle control layer is solved by using a DDPG (distributed data packet) deep reinforcement learning algorithm, a PyTorch deep learning framework is used for building a neural network, the selected development language is Python3.5, the model receives various driving behavior instructions, and outputs a corresponding control signal of the vehicle by combining with environmental state information;

the method comprises the following steps:

1) Lane change driving behavior;

under the highway scene of multiple lanes, the unmanned vehicle drives in the current lane in an expected state, and when a lane change driving behavior instruction is received, the unmanned vehicle can be controlled to enter an adjacent lane to stably drive in a smooth lane change track;

2) Overtaking driving behavior;

in a scene of straight overtaking, if a target vehicle which runs at the same speed as the unmanned vehicle exists in an adjacent lane, the unmanned vehicle increases the target expected speed of the unmanned vehicle after receiving an overtaking driving behavior command until the target vehicle is completely overtaken, and then the unmanned vehicle returns to the original speed to run again;

3) Following driving behavior;

when the unmanned vehicle detects that a vehicle running at a low speed appears in front of the unmanned vehicle in the running process, the unmanned vehicle can actively reduce the longitudinal expected speed of the unmanned vehicle when the unmanned vehicle runs, stably runs behind the unmanned vehicle while keeping a safe distance with the front vehicle, and recovers the expected running speed after the front vehicle changes lanes or the speed of the unmanned vehicle is increased;

firstly, dividing the decomposition granularity of the transverse and longitudinal states of the unmanned vehicle according to a driving scene, determining a target driving state when the unmanned vehicle runs, then performing vehicle control layer training based on the target driving state, finally creating corresponding reward functions aiming at different driving behaviors of following the vehicle, changing lanes and overtaking, and respectively training an action decision network of each driving behavior by combining the vehicle control layer;

configuring a VirtualEnv virtual environment under a Ubuntu server, constructing a TORCS simulation platform, modifying a simulation platform configuration file, and initializing the initial position of the unmanned vehicle and the driving state of surrounding dynamic vehicles; the unmanned vehicle carries out interactive training through the built simulation environment, and then a neural network model forming motion decision control is obtained;

s1) an unmanned vehicle learns lane keeping based on a target driving state;

1) Determining a lateral offset position;

2) Determining a longitudinal desired speed;

1) Dividing a horizontal deviation minimum unit;

1) Dividing the minimum change amplitude of the longitudinal speed;

1) Deciding an action instruction based on the dynamic environment;

2) Achieving a target driving behavior;

s5) learning following driving behaviors based on front target vehicles

1) The unmanned vehicle learns a deceleration avoidance strategy;

2) The vehicle speed is adjusted based on the preceding vehicle state.

2. The deep reinforcement learning-based unmanned vehicle motion decision control method supporting multi-driving behaviors of claim 1, characterized in that: in the above steps, the specific implementation method is as follows:

s1) the unmanned vehicle stably runs in a target driving state through a vehicle control layer, an initial transverse offset position and a longitudinal expected speed are firstly established for the unmanned vehicle as the target driving state, then a reward function is defined around the target driving state, and the unmanned vehicle is guided to learn an output strategy according with the target driving state so as to decide a vehicle control signal to enable the unmanned vehicle to stably run in a current lane at the target speed.

3. The deep reinforcement learning-based unmanned vehicle motion decision making control method supporting multiple driving behaviors according to claim 1, characterized in that: s2) the contents are that the unmanned vehicle learns to stably run in each lane at the target speed, a plurality of transverse deviation point positions are determined according to a driving scene, then the point position deviation is subtracted from the transverse position in the neural network input state space of the vehicle control layer, after a period of stability training, the unmanned vehicle can stably run along each transverse point position at the target speed, and the part S3) is completed by the vehicle control layer by changing the target driving state in a static road scene without other traffic participants.

4. The deep reinforcement learning-based unmanned vehicle motion decision control method supporting multi-driving behaviors of claim 1, characterized in that: s3), the unmanned vehicle learns to run in each lane at different target speeds, a condition control instruction is added into an input state space of a neural network of a vehicle control layer to adjust the longitudinal expected speed of the vehicle during running, speed grades are divided according to the lowest speed limit and the highest speed limit in a driving scene, the robustness of the unmanned vehicle is trained and enhanced under each vehicle speed state respectively to adapt to different vehicle speed running states, and the overtaking driving behavior is realized by adjusting the vehicle speed.

5. The deep reinforcement learning-based unmanned vehicle motion decision control method supporting multi-driving behaviors of claim 1, characterized in that: and S4) taking lane changing and overtaking driving behaviors as independent action decision layers respectively, taking expected changes of transverse deviation and longitudinal speed as action instructions, and transmitting the action instructions to a lower vehicle control layer to change a target driving state when the vehicle runs, wherein in the action instruction decision process, the action instructions are selectively issued and appropriately adjusted according to opportunity by referring to dynamic information of other surrounding vehicles and the like to ensure the dynamic safety of the driving behavior execution process.

6. The deep reinforcement learning-based unmanned vehicle motion decision control method supporting multi-driving behaviors of claim 1, characterized in that: s5) the content is an action decision layer independent from the following driving action, according to the detected front vehicle information, the longitudinal expected speed of the unmanned vehicle is dynamically adjusted by outputting acceleration and deceleration action instructions to a vehicle control layer, the unmanned vehicle stably follows the rear of the unmanned vehicle while keeping a safe distance with the front vehicle, the following driving action is realized, and when no vehicle exists in front of the unmanned vehicle, the unmanned vehicle can be recovered to normally drive at the original expected speed.

7. The deep reinforcement learning-based unmanned vehicle motion decision control method supporting multi-driving behaviors of claim 1, characterized in that: the motion decision control model receives driving behavior instructions, outputs vehicle control signals and completes a decision control process facing multiple driving behaviors; firstly, the model tries to receive a driving behavior instruction, if the driving behavior instruction is not received, a vehicle control layer controls the vehicle to keep a lane based on a target driving state, and if the driving behavior instruction is received, a corresponding action decision layer is selected through a branch architecture; then the action decision layer outputs an action instruction sequence and transmits the action instruction sequence to a vehicle control layer in real time until the target driving behavior is finished; and finally, the vehicle control layer receives the action command, updates the target driving state, and performs cycle control on the vehicle to complete the action command.

8. The deep reinforcement learning-based unmanned vehicle motion decision making control method supporting multiple driving behaviors according to claim 1, characterized in that: the method comprises the following steps of 001: selecting related driving behaviors, and creating a motion decision control model framework;

step 002: establishing an initial transverse offset position and a longitudinal expected speed as a target driving state, so that the unmanned vehicle can stably run in the current lane at the target speed;

step 003: changing the transverse offset in the neural network input of the vehicle control layer to enable the unmanned vehicle to learn to stably run in each lane at the target speed;

step 004: adding a condition control instruction into an input state space of a neural network of a vehicle control layer to adjust the longitudinal expected speed when the vehicle runs, and enabling the unmanned vehicle to learn to run in each lane at different target speeds;

step 005: establishing independent action decision layers respectively based on lane changing and overtaking driving behaviors, taking the expected changes of transverse deviation and longitudinal speed obtained by decomposing the execution process of the driving behaviors as action instructions, and transmitting the action instructions to a lower vehicle control layer;

step 006: the independent action decision layer of the following driving behavior outputs acceleration and deceleration action instructions to the vehicle control layer according to the detected vehicle information in front, and the vehicle stably runs behind the front vehicle while keeping a safe distance with the front vehicle;