CN113264059B - Unmanned vehicle motion decision control method supporting multiple driving behaviors and based on deep reinforcement learning - Google Patents

Unmanned vehicle motion decision control method supporting multiple driving behaviors and based on deep reinforcement learning Download PDF

Info

Publication number
CN113264059B
CN113264059B CN202110533119.5A CN202110533119A CN113264059B CN 113264059 B CN113264059 B CN 113264059B CN 202110533119 A CN202110533119 A CN 202110533119A CN 113264059 B CN113264059 B CN 113264059B
Authority
CN
China
Prior art keywords
vehicle
unmanned vehicle
driving
speed
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110533119.5A
Other languages
Chinese (zh)
Other versions
CN113264059A (en
Inventor
黄志清
曲志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110533119.5A priority Critical patent/CN113264059B/en
Publication of CN113264059A publication Critical patent/CN113264059A/en
Application granted granted Critical
Publication of CN113264059B publication Critical patent/CN113264059B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/10Path keeping
    • B60W30/12Lane keeping
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/14Adaptive cruise control
    • B60W30/16Control of distance between vehicles, e.g. keeping a distance to preceding vehicle
    • B60W30/165Automatically following the path of a preceding lead vehicle, e.g. "electronic tow-bar"
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/18Propelling the vehicle
    • B60W30/18009Propelling the vehicle related to particular drive situations
    • B60W30/18163Lane change; Overtaking manoeuvres
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/08Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
    • B60W40/09Driving style or behaviour
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0002Automatic control, details of type of controller or control system architecture

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Control Of Driving Devices And Active Controlling Of Vehicle (AREA)
  • Steering Control In Accordance With Driving Conditions (AREA)

Abstract

The invention discloses a deep reinforcement learning-based unmanned vehicle motion decision control method supporting multiple driving behaviors, which divides the driving behaviors into 'decision of key action' and 'execution of key action', realizes decision control supporting multiple driving behaviors, avoids repeated modeling of the same task, and greatly reduces the complexity of a model. Meanwhile, a deep reinforcement learning algorithm is introduced, and a mode of interactive learning of an intelligent agent and the environment is adopted, so that the problem that data is difficult to obtain and complicated data preprocessing work are avoided. The scheme solves the problem of multi-driving-behavior-oriented motion decision control and the problem that an end-to-end motion decision control model based on learning supports multi-driving behaviors, so that an unmanned vehicle can finish different driving behaviors according to corresponding upper-layer behavior instructions, and autonomous driving of the vehicle is realized.

Description

Unmanned vehicle motion decision control method supporting multiple driving behaviors and based on deep reinforcement learning
Technical Field
The invention relates to the field of artificial intelligence unmanned driving and an end-to-end vehicle decision control technology, in particular to an unmanned vehicle motion decision control scheme for multiple driving behaviors, which is realized based on a deep reinforcement learning algorithm in a scene needing multiple driving behaviors to be executed alternatively such as a highway.
Background
Currently, research and application of unmanned vehicles are actively deployed in multiple countries in the United states, japan and Europe and promoted to the national strategic level, china also puts unmanned vehicle technology into the national innovation strategic plan, unmanned driving can reduce 90% of traffic accidents, the time consumed by commuting and energy consumption can be effectively reduced, hundreds of millions of tons of automobile carbon dioxide emission can be reduced every year, and research and development of unmanned driving technology are significant for promoting the national innovation strategic technology and solving the national requirements in the fields of traffic, energy and the like. The unmanned vehicle is a product of automobile, artificial intelligence and communication cross-border fusion, is also the most important component of a future smart city, can comprehensively utilize environmental information such as roads, vehicle positions, obstacles and the like sensed by a vehicle-mounted sensing system, automatically plans a driving route and controls the vehicle to reach a preset target, so that the vehicle can safely and reliably drive on the roads. The motion decision control link is responsible for executing driving behaviors in the unmanned system, motion control information is output according to road traffic environment and the self condition of the vehicle to control the vehicle to finish behavior instructions (such as following, lane changing, overtaking and the like), the motion decision control link is an important component in unmanned research, along with the fact that unmanned technology is combined with vehicles such as private cars, buses and the like to be gradually applied to open road scenes, the unmanned vehicles need to execute various driving behaviors such as lane changing, overtaking, following and the like alternately to finish driving tasks, and diversification and synergy of the driving behaviors provide challenges for motion decision control schemes.
With the development of artificial intelligence technology, an end-to-end unmanned vehicle motion decision control model is formed by utilizing deep learning and reinforcement learning methods, and becomes a research hotspot in the field of unmanned vehicle motion decision control. The learning-based end-to-end decision control directly outputs vehicle control signals such as an accelerator, a brake and a steering wheel after acquiring vehicle and driving environment information by training a neural network, does not split a motion decision control task, and has strong intelligence. Early pilotnets of NVIDIA are typical representatives, and a Convolutional Neural Network (CNN) is trained to establish direct mapping between environmental image information and corresponding steering wheel control quantity when a human driver drives a vehicle, so that end-to-end horizontal steering control of a human-like unmanned vehicle is realized, good results are obtained in a simulated scene and a real road, but the training process depends on a large amount of data sets. Later, this problem was solved by the introduction of reinforcement learning algorithm, which can utilize unlabeled data to decide the correct action sequence by using artificially set reward function as the evaluation mechanism of the strategy, thus alleviating the data dependency problem of deep learning method and increasing the interpretability and adjustability of the decision process. However, the existing end-to-end model generally only aims at a single driving behavior, and an unmanned vehicle is used as the highest automatic driving level, and needs to support diversified driving behaviors to guide the unmanned vehicle under the environment of a complex road structure and dynamic traffic flow, so that safe and man-like autonomous driving is realized, and therefore the end-to-end unmanned vehicle motion decision control model facing multiple driving behaviors becomes an urgent problem to be solved in the field of unmanned driving.
In view of the above, the invention realizes a multi-driving-behavior-oriented unmanned vehicle motion decision control scheme based on a deep reinforcement learning algorithm under a layered architecture. Based on a branch selection structure under a layered architecture, the driving behaviors are divided into 'decision of key action' and 'execution of key action', so that decision control supporting various driving behaviors is realized, repeated modeling of the same task is avoided, and the complexity of a model is greatly reduced. Meanwhile, a deep reinforcement learning algorithm is introduced, and a mode of interactive learning of an intelligent agent and the environment is adopted, so that the problem that data is difficult to obtain and complicated data preprocessing work are avoided. The scheme solves the problem of motion decision control facing multiple driving behaviors.
Disclosure of Invention
The invention mainly aims to provide a multi-driving-behavior-oriented unmanned vehicle motion decision control scheme, and aims to solve the problem that an end-to-end motion decision control model based on learning supports multiple driving behaviors, so that an unmanned vehicle can finish different driving behaviors according to corresponding upper-layer behavior instructions, and the autonomous driving of the vehicle is realized. The specific scheme is shown in figure 1.
The technical scheme adopted by the invention is the structural improvement of an end-to-end unmanned vehicle decision control model based on a deep reinforcement learning algorithm, and the unmanned vehicle motion decision control oriented to multiple driving behaviors is realized. The scheme is that driving behaviors are decomposed into 'decisions of key actions' and 'execution of key actions', the implementation of each driving behavior can be generalized to the change of a horizontal and vertical expected state of a vehicle when the vehicle is running, decision control supporting multiple driving behaviors is realized by introducing action instructions, the upper layer of a model is a branch selection structure formed by independent action decision layers corresponding to each driving behavior so as to ensure that different dynamic decision rules of each driving behavior and the dynamic environment information of each side weight can be fully considered, action instructions with the same property output by each action decision layer are all executed by a unified vehicle control layer, a discrete action decision problem of the action decision layers is solved by a DQN (Deep Q-learning Network) Deep reinforcement learning algorithm, a continuous control signal output problem of the vehicle control layer is solved by a DDPG (Deep learning algorithm), a PyTorch Deep learning framework is used for building a neural Network, a selected development language is Python3.5, a model can receive multiple driving behavior information, and the corresponding vehicle control signal of the corresponding driving behavior is output by combining with the corresponding environment control signal.
The model structure is shown in fig. 2 and mainly comprises the following parts:
1 lane change driving behavior
Under the multi-lane road scene, the unmanned vehicle drives in the current lane in an expected state, and when the lane change driving behavior instruction is received, the unmanned vehicle can be controlled to enter the adjacent lane to stably drive in a smooth lane change track.
2 overtaking driving behavior
In a linear overtaking scene, if a target vehicle which runs at the same speed as the unmanned vehicle exists in an adjacent lane, the unmanned vehicle increases the target expected speed of the unmanned vehicle after receiving an overtaking driving action command until the target vehicle is completely overtaken, and then the unmanned vehicle returns to the original speed to run again.
3 following a car driving behavior
When the unmanned vehicle detects that a vehicle running at a low speed appears in front in the process of running, the unmanned vehicle can actively reduce the longitudinal expected speed of the unmanned vehicle when the unmanned vehicle runs, the unmanned vehicle stably runs behind the unmanned vehicle while keeping a safe distance with the front vehicle, and the unmanned vehicle can recover the expected running speed after the front vehicle changes lanes or the speed of the front vehicle is increased.
The scheme is realized as follows:
firstly, dividing the decomposition granularity of the transverse and longitudinal states of the unmanned vehicle according to a driving scene, establishing a target driving state when the unmanned vehicle runs, then carrying out vehicle control layer training based on the target driving state, finally creating corresponding reward functions aiming at different driving behaviors such as following, lane changing, overtaking and the like, and respectively training an action decision network of each driving behavior by combining the vehicle control layer.
The virtual environment of VirtualEnv is configured under a Ubuntu server, a TORCS simulation platform is set up, a simulation platform configuration file is modified, and the initial position of the unmanned vehicle and the running state of surrounding dynamic vehicles are initialized for use by the method. The unmanned vehicle carries out interactive training through the built simulation environment, and then a neural network model forming motion decision control is obtained.
S1) an unmanned vehicle learns lane keeping based on a target driving state;
1) Determining a lateral offset position;
2) Determining a longitudinal desired speed;
s2) learning lane change basic behaviors by changing the transverse offset position;
1) Dividing a horizontal deviation minimum unit;
2) Changing the self transverse offset position of the unmanned vehicle in the neural network;
s3) learning overtaking basic behaviors by changing the longitudinal expected speed;
1) Dividing the minimum change amplitude of the longitudinal speed;
2) Adding a condition control instruction representing a longitudinal expected speed in a neural network input state space;
s4) combining with a dynamic decision process of learning lane changing and overtaking driving behaviors of surrounding vehicles;
1) Deciding an action instruction based on the dynamic environment;
2) Achieving a target driving behavior;
s5) learning following driving behaviors based on front target vehicle
1) The unmanned vehicle learns a deceleration avoidance strategy;
2) Adjusting a vehicle speed based on a forward vehicle state;
in the above steps, the specific implementation method is as follows:
the contents of S1) are that a vehicle control layer realizes the stable running of the unmanned vehicle in a target driving state, an initial transverse offset position and a longitudinal expected speed are firstly established for the unmanned vehicle as the target driving state, then a reward function is defined around the target driving state, the unmanned vehicle is guided to learn an output strategy according with the target driving state, so that a reasonable vehicle control signal is decided, and the unmanned vehicle can stably run in a current lane at the target speed, as shown in FIG. 3.
The contents stated in S2) are that the unmanned vehicle learns to stably run in each lane at the target speed, first, a plurality of lateral deviation point locations are determined according to a driving scene, then, the point location deviation is subtracted from the "lateral position" in the neural network input state space of the vehicle control layer, and after a period of stability training, the unmanned vehicle can stably run along each lateral point location at the target speed, and this part and the part of S3) are all completed by the vehicle control layer by changing the target driving state in a static road scene that does not include other traffic participants, as shown in fig. 4.
S3) the unmanned vehicle learns to run in each lane at different target speeds, a condition control instruction is added into an input state space of a vehicle control layer neural network to adjust the longitudinal expected speed of the vehicle during running, speed grades are divided according to the lowest speed limit and the highest speed limit in a driving scene, the robustness of the unmanned vehicle is trained and enhanced under each vehicle speed state respectively to adapt to different vehicle speed running states, and then the overtaking driving behavior is realized by adjusting the vehicle speed, as shown in figure 5.
And S4) taking lane change and overtaking driving behaviors as independent action decision layers respectively, taking expected changes of transverse deviation and longitudinal speed as action instructions, and transmitting the action instructions to a lower vehicle control layer to change a target driving state when the vehicle runs, wherein in the action instruction decision process, the action instructions are selectively issued and appropriately adjusted according to time by referring to dynamic information of other surrounding vehicles and the like to ensure the dynamic safety of the driving behavior execution process, as shown in FIG 6.
S5) the content is an action decision layer independent from the following driving action, according to the detected front vehicle information, the longitudinal expected speed of the unmanned vehicle is dynamically adjusted by outputting acceleration and deceleration action instructions to a vehicle control layer, the unmanned vehicle stably follows the rear of the unmanned vehicle while keeping a safe distance with the front vehicle, the following driving action is realized, and when no vehicle exists in front of the unmanned vehicle, the unmanned vehicle can be recovered to normally drive at the original expected speed.
The motion decision control model receives the driving behavior command, outputs a vehicle control signal and completes the decision control process facing to the multiple driving behaviors, which is the key point of the invention. Firstly, the model tries to receive a driving behavior instruction, if the driving behavior instruction is not received, a vehicle control layer controls the vehicle to keep a lane based on a target driving state, and if the driving behavior instruction is received, a corresponding action decision layer is selected through a branch architecture; then the action decision layer outputs an action instruction sequence and transmits the action instruction sequence to a vehicle control layer in real time until the target driving behavior is finished; and finally, the vehicle control layer receives the action command, updates the target driving state, and performs cycle control on the vehicle to complete the action command. The specific decision-making control flow is as follows,
Figure BDA0003068715830000041
Figure BDA0003068715830000051
compared with the existing unmanned vehicle motion decision control scheme, the motion decision control scheme for the multi-driving behaviors provided by the invention has the following benefits:
the scheme of the invention is based on a branch selection structure under a layered architecture, and a deep reinforcement learning algorithm is utilized to disassemble and model a motion decision control process, so that the scheme is more flexible and has stronger expandability compared with the traditional end-to-end motion decision control scheme, and is more suitable for a driving scene facing multiple driving behaviors.
The invention divides the driving behaviors into 'decision of key action' and 'execution of key action', summarizes the realization principle of each driving behavior into the change of the horizontal and vertical expected state of the vehicle when driving, realizes the decision control supporting various driving behaviors by introducing action instructions, avoids repeated modeling of the same task and greatly reduces the complexity of the model.
3, the invention introduces a deep reinforcement learning algorithm in unmanned vehicle motion decision control, and adopts a mode of interactive learning of an intelligent agent and the environment, thereby avoiding the problem of difficult data acquisition and complicated data preprocessing work.
Drawings
Fig. 1 is a system architecture diagram.
FIG. 2 is a model structure diagram.
FIG. 3 is a vehicle control scheme.
Fig. 4 is a diagram of a lateral control scheme.
Fig. 5 is a diagram of a vertical control scheme.
Fig. 6 is a diagram of an action decision scheme.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings. The specific steps of the embodiment are as follows:
step 001: the driving behavior involved is selected and a motion decision control model framework is created, as shown in fig. 2.
Step 002: an initial lateral offset position and a longitudinal desired speed are established as a target driving state, so that the unmanned vehicle can stably drive in the current lane at the target speed, as shown in fig. 3.
Step 003: changing the lateral offset in the vehicle control layer neural network input allows the unmanned vehicle to learn to drive stably in each lane at the target speed, as shown in fig. 4.
Step 004: adding a condition control command into an input state space of a neural network of a vehicle control layer to adjust a longitudinal expected speed when the vehicle runs, and enabling the unmanned vehicle to learn to run in each lane at different target speeds, as shown in fig. 5.
Step 005: independent action decision layers are respectively established based on lane changing and overtaking driving behaviors, and the expected changes of transverse deviation and longitudinal speed obtained by decomposing the execution process of the driving behaviors are used as action commands and transmitted to a lower vehicle control layer, as shown in fig. 6.
Step 006: and the independent action decision layer of the following driving behavior outputs acceleration and deceleration action instructions to the vehicle control layer according to the detected vehicle information in front, and stably drives along the rear part of the vehicle while keeping a safe distance with the vehicle in front.
Step 007: and extracting the trained action decision network model of each driving behavior, and combining the action decision network model and the vehicle control layer into a complete unmanned vehicle motion decision control model facing multiple driving behaviors.
In the implementation case of the invention, the driving behaviors are selected from the driving behaviors which can be related under a plurality of driving scenes, including driving behaviors such as following, changing lanes, straight-line overtaking and the like, a virtual environment of VirtualEnv is built on a Ubuntu server, and the human-computer interaction in the training is based on a TORCS simulation platform. The model continuously tries to receive driving behavior instructions, the received driving behavior instructions are transmitted to a meta-action decision layer in a model frame established in the step 001, a corresponding action decision network is selected in the step 007, a decision task of the driving behavior is established, the action decision layer outputs an action instruction sequence and transmits the action instruction sequence to a vehicle control layer in real time through the step 005/006, finally the vehicle control layer receives the action instructions, the target driving state is updated through the step 003/004, and vehicle control signals are output, so that a decision control process facing multiple driving behaviors is completed.

Claims (8)

1. The unmanned vehicle motion decision control method based on deep reinforcement learning and supporting multiple driving behaviors is characterized by comprising the following steps: the driving behaviors are decomposed into 'decisions of key actions' and 'execution of key actions', the implementation of each driving behavior is summarized into the change of a horizontal and vertical expected state of a vehicle when the vehicle is driven, the decision control supporting various driving behaviors is realized by introducing action instructions, the upper layer of a model is a branch selection structure consisting of independent action decision layers corresponding to each driving behavior, the action instructions with the same property output by each action decision layer are all submitted to a unified vehicle control layer for execution, the discrete action decision problem of the action decision layers is solved by using a DQN (differential Quadrature reference network) deep reinforcement learning algorithm, the continuous control signal output of the vehicle control layer is solved by using a DDPG (distributed data packet) deep reinforcement learning algorithm, a PyTorch deep learning framework is used for building a neural network, the selected development language is Python3.5, the model receives various driving behavior instructions, and outputs a corresponding control signal of the vehicle by combining with environmental state information;
the method comprises the following steps:
1) Lane change driving behavior;
under the highway scene of multiple lanes, the unmanned vehicle drives in the current lane in an expected state, and when a lane change driving behavior instruction is received, the unmanned vehicle can be controlled to enter an adjacent lane to stably drive in a smooth lane change track;
2) Overtaking driving behavior;
in a scene of straight overtaking, if a target vehicle which runs at the same speed as the unmanned vehicle exists in an adjacent lane, the unmanned vehicle increases the target expected speed of the unmanned vehicle after receiving an overtaking driving behavior command until the target vehicle is completely overtaken, and then the unmanned vehicle returns to the original speed to run again;
3) Following driving behavior;
when the unmanned vehicle detects that a vehicle running at a low speed appears in front of the unmanned vehicle in the running process, the unmanned vehicle can actively reduce the longitudinal expected speed of the unmanned vehicle when the unmanned vehicle runs, stably runs behind the unmanned vehicle while keeping a safe distance with the front vehicle, and recovers the expected running speed after the front vehicle changes lanes or the speed of the unmanned vehicle is increased;
firstly, dividing the decomposition granularity of the transverse and longitudinal states of the unmanned vehicle according to a driving scene, determining a target driving state when the unmanned vehicle runs, then performing vehicle control layer training based on the target driving state, finally creating corresponding reward functions aiming at different driving behaviors of following the vehicle, changing lanes and overtaking, and respectively training an action decision network of each driving behavior by combining the vehicle control layer;
configuring a VirtualEnv virtual environment under a Ubuntu server, constructing a TORCS simulation platform, modifying a simulation platform configuration file, and initializing the initial position of the unmanned vehicle and the driving state of surrounding dynamic vehicles; the unmanned vehicle carries out interactive training through the built simulation environment, and then a neural network model forming motion decision control is obtained;
s1) an unmanned vehicle learns lane keeping based on a target driving state;
1) Determining a lateral offset position;
2) Determining a longitudinal desired speed;
s2) learning lane change basic behaviors by changing the transverse offset position;
1) Dividing a horizontal deviation minimum unit;
2) Changing the self transverse offset position of the unmanned vehicle in the neural network;
s3) learning overtaking basic behaviors by changing the longitudinal expected speed;
1) Dividing the minimum change amplitude of the longitudinal speed;
2) Adding a condition control instruction representing a longitudinal expected speed in a neural network input state space;
s4) combining with a dynamic decision process of learning lane changing and overtaking driving behaviors of surrounding vehicles;
1) Deciding an action instruction based on the dynamic environment;
2) Achieving a target driving behavior;
s5) learning following driving behaviors based on front target vehicles
1) The unmanned vehicle learns a deceleration avoidance strategy;
2) The vehicle speed is adjusted based on the preceding vehicle state.
2. The deep reinforcement learning-based unmanned vehicle motion decision control method supporting multi-driving behaviors of claim 1, characterized in that: in the above steps, the specific implementation method is as follows:
s1) the unmanned vehicle stably runs in a target driving state through a vehicle control layer, an initial transverse offset position and a longitudinal expected speed are firstly established for the unmanned vehicle as the target driving state, then a reward function is defined around the target driving state, and the unmanned vehicle is guided to learn an output strategy according with the target driving state so as to decide a vehicle control signal to enable the unmanned vehicle to stably run in a current lane at the target speed.
3. The deep reinforcement learning-based unmanned vehicle motion decision making control method supporting multiple driving behaviors according to claim 1, characterized in that: s2) the contents are that the unmanned vehicle learns to stably run in each lane at the target speed, a plurality of transverse deviation point positions are determined according to a driving scene, then the point position deviation is subtracted from the transverse position in the neural network input state space of the vehicle control layer, after a period of stability training, the unmanned vehicle can stably run along each transverse point position at the target speed, and the part S3) is completed by the vehicle control layer by changing the target driving state in a static road scene without other traffic participants.
4. The deep reinforcement learning-based unmanned vehicle motion decision control method supporting multi-driving behaviors of claim 1, characterized in that: s3), the unmanned vehicle learns to run in each lane at different target speeds, a condition control instruction is added into an input state space of a neural network of a vehicle control layer to adjust the longitudinal expected speed of the vehicle during running, speed grades are divided according to the lowest speed limit and the highest speed limit in a driving scene, the robustness of the unmanned vehicle is trained and enhanced under each vehicle speed state respectively to adapt to different vehicle speed running states, and the overtaking driving behavior is realized by adjusting the vehicle speed.
5. The deep reinforcement learning-based unmanned vehicle motion decision control method supporting multi-driving behaviors of claim 1, characterized in that: and S4) taking lane changing and overtaking driving behaviors as independent action decision layers respectively, taking expected changes of transverse deviation and longitudinal speed as action instructions, and transmitting the action instructions to a lower vehicle control layer to change a target driving state when the vehicle runs, wherein in the action instruction decision process, the action instructions are selectively issued and appropriately adjusted according to opportunity by referring to dynamic information of other surrounding vehicles and the like to ensure the dynamic safety of the driving behavior execution process.
6. The deep reinforcement learning-based unmanned vehicle motion decision control method supporting multi-driving behaviors of claim 1, characterized in that: s5) the content is an action decision layer independent from the following driving action, according to the detected front vehicle information, the longitudinal expected speed of the unmanned vehicle is dynamically adjusted by outputting acceleration and deceleration action instructions to a vehicle control layer, the unmanned vehicle stably follows the rear of the unmanned vehicle while keeping a safe distance with the front vehicle, the following driving action is realized, and when no vehicle exists in front of the unmanned vehicle, the unmanned vehicle can be recovered to normally drive at the original expected speed.
7. The deep reinforcement learning-based unmanned vehicle motion decision control method supporting multi-driving behaviors of claim 1, characterized in that: the motion decision control model receives driving behavior instructions, outputs vehicle control signals and completes a decision control process facing multiple driving behaviors; firstly, the model tries to receive a driving behavior instruction, if the driving behavior instruction is not received, a vehicle control layer controls the vehicle to keep a lane based on a target driving state, and if the driving behavior instruction is received, a corresponding action decision layer is selected through a branch architecture; then the action decision layer outputs an action instruction sequence and transmits the action instruction sequence to a vehicle control layer in real time until the target driving behavior is finished; and finally, the vehicle control layer receives the action command, updates the target driving state, and performs cycle control on the vehicle to complete the action command.
8. The deep reinforcement learning-based unmanned vehicle motion decision making control method supporting multiple driving behaviors according to claim 1, characterized in that: the method comprises the following steps of 001: selecting related driving behaviors, and creating a motion decision control model framework;
step 002: establishing an initial transverse offset position and a longitudinal expected speed as a target driving state, so that the unmanned vehicle can stably run in the current lane at the target speed;
step 003: changing the transverse offset in the neural network input of the vehicle control layer to enable the unmanned vehicle to learn to stably run in each lane at the target speed;
step 004: adding a condition control instruction into an input state space of a neural network of a vehicle control layer to adjust the longitudinal expected speed when the vehicle runs, and enabling the unmanned vehicle to learn to run in each lane at different target speeds;
step 005: establishing independent action decision layers respectively based on lane changing and overtaking driving behaviors, taking the expected changes of transverse deviation and longitudinal speed obtained by decomposing the execution process of the driving behaviors as action instructions, and transmitting the action instructions to a lower vehicle control layer;
step 006: the independent action decision layer of the following driving behavior outputs acceleration and deceleration action instructions to the vehicle control layer according to the detected vehicle information in front, and the vehicle stably runs behind the front vehicle while keeping a safe distance with the front vehicle;
step 007: and extracting the trained action decision network model of each driving behavior, and combining the action decision network model and the vehicle control layer into a complete unmanned vehicle motion decision control model facing multiple driving behaviors.
CN202110533119.5A 2021-05-17 2021-05-17 Unmanned vehicle motion decision control method supporting multiple driving behaviors and based on deep reinforcement learning Active CN113264059B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110533119.5A CN113264059B (en) 2021-05-17 2021-05-17 Unmanned vehicle motion decision control method supporting multiple driving behaviors and based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110533119.5A CN113264059B (en) 2021-05-17 2021-05-17 Unmanned vehicle motion decision control method supporting multiple driving behaviors and based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113264059A CN113264059A (en) 2021-08-17
CN113264059B true CN113264059B (en) 2022-10-11

Family

ID=77231122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110533119.5A Active CN113264059B (en) 2021-05-17 2021-05-17 Unmanned vehicle motion decision control method supporting multiple driving behaviors and based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113264059B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113561986B (en) * 2021-08-18 2024-03-15 武汉理工大学 Automatic driving automobile decision making method and device
CN113885491A (en) * 2021-08-29 2022-01-04 北京工业大学 Unmanned decision-making and control method based on federal deep reinforcement learning
CN113844448A (en) * 2021-09-18 2021-12-28 广东松科智能科技有限公司 Deep reinforcement learning-based lane keeping method
CN113879323B (en) * 2021-10-26 2023-03-14 清华大学 Reliable learning type automatic driving decision-making method, system, storage medium and equipment
CN114179835B (en) * 2021-12-30 2024-01-05 清华大学苏州汽车研究院(吴江) Automatic driving vehicle decision training method based on reinforcement learning in real scene

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108897313A (en) * 2018-05-23 2018-11-27 清华大学 A kind of end-to-end Vehicular automatic driving system construction method of layer-stepping
US10940863B2 (en) * 2018-11-01 2021-03-09 GM Global Technology Operations LLC Spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle
CN111923919B (en) * 2019-05-13 2021-11-23 广州汽车集团股份有限公司 Vehicle control method, vehicle control device, computer equipment and storage medium
CN112580795A (en) * 2019-09-29 2021-03-30 华为技术有限公司 Neural network acquisition method and related equipment
CN110568760B (en) * 2019-10-08 2021-07-02 吉林大学 Parameterized learning decision control system and method suitable for lane changing and lane keeping
CN110969848B (en) * 2019-11-26 2022-06-17 武汉理工大学 Automatic driving overtaking decision method based on reinforcement learning under opposite double lanes
CN111413973A (en) * 2020-03-26 2020-07-14 北京汽车集团有限公司 Lane change decision method and device for vehicle, electronic equipment and storage medium
CN111845741B (en) * 2020-06-28 2021-08-03 江苏大学 Automatic driving decision control method and system based on hierarchical reinforcement learning
CN112201069B (en) * 2020-09-25 2021-10-29 厦门大学 Deep reinforcement learning-based method for constructing longitudinal following behavior model of driver

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度强化学习的端到端无人驾驶决策;黄志清等;《电子学报》;20200915(第09期);1711-1718 *

Also Published As

Publication number Publication date
CN113264059A (en) 2021-08-17

Similar Documents

Publication Publication Date Title
CN113264059B (en) Unmanned vehicle motion decision control method supporting multiple driving behaviors and based on deep reinforcement learning
CN114407931B (en) Safe driving decision method for automatic driving operation vehicle of high class person
CN110969848B (en) Automatic driving overtaking decision method based on reinforcement learning under opposite double lanes
CN113291308B (en) Vehicle self-learning lane-changing decision-making system and method considering driving behavior characteristics
CN110187639A (en) A kind of trajectory planning control method based on Parameter Decision Making frame
CN108932840A (en) Automatic driving vehicle urban intersection passing method based on intensified learning
CN103956045B (en) Utilize semi-true object emulation technology means to realize method that fleet works in coordination with driving
CN114013443B (en) Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning
CN113264043A (en) Unmanned driving layered motion decision control method based on deep reinforcement learning
Unsal Intelligent navigation of autonomous vehicles in an automated highway system: Learning methods and interacting vehicles approach
CN114564016A (en) Navigation obstacle avoidance control method, system and model combining path planning and reinforcement learning
CN111645673B (en) Automatic parking method based on deep reinforcement learning
CN111899509B (en) Intelligent networking automobile state vector calculation method based on vehicle-road information coupling
CN114035575A (en) Unmanned vehicle motion planning method and system based on semantic segmentation
CN110320916A (en) Consider the autonomous driving vehicle method for planning track and system of occupant's impression
CN116486356A (en) Narrow scene track generation method based on self-adaptive learning technology
CN115257789A (en) Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment
Wang et al. An intelligent self-driving truck system for highway transportation
Orgován et al. Autonomous drifting using reinforcement learning
Guan et al. Learn collision-free self-driving skills at urban intersections with model-based reinforcement learning
Ren et al. Self-learned intelligence for integrated decision and control of automated vehicles at signalized intersections
CN114802306A (en) Intelligent vehicle integrated decision-making system based on man-machine co-driving concept
Guo et al. Self-defensive coordinated maneuvering of an intelligent vehicle platoon in mixed traffic
CN114355897A (en) Vehicle path tracking control method based on model and reinforcement learning hybrid switching
CN116909131A (en) Vehicle formation track planning modeling method for signalless intersection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant