CN115257819A - Decision-making method for safe driving of large-scale commercial vehicle in urban low-speed environment - Google Patents

Decision-making method for safe driving of large-scale commercial vehicle in urban low-speed environment Download PDF

Info

Publication number
CN115257819A
CN115257819A CN202211070514.5A CN202211070514A CN115257819A CN 115257819 A CN115257819 A CN 115257819A CN 202211070514 A CN202211070514 A CN 202211070514A CN 115257819 A CN115257819 A CN 115257819A
Authority
CN
China
Prior art keywords
vehicle
driving
network
safe driving
substep
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211070514.5A
Other languages
Chinese (zh)
Inventor
李旭
胡玮明
胡悦
胡锦超
陆红伟
徐启敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202211070514.5A priority Critical patent/CN115257819A/en
Publication of CN115257819A publication Critical patent/CN115257819A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions
    • B60W2050/0028Mathematical models, e.g. for simulation
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2300/00Indexing codes relating to the type of vehicle
    • B60W2300/12Trucks; Load vehicles
    • B60W2300/125Heavy duty trucks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2420/00Indexing codes relating to the type of sensors based on the principle of their operation
    • B60W2420/40Photo, light or radio wave sensitive means, e.g. infrared sensors
    • B60W2420/403Image sensing, e.g. optical camera
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2420/00Indexing codes relating to the type of sensors based on the principle of their operation
    • B60W2420/40Photo, light or radio wave sensitive means, e.g. infrared sensors
    • B60W2420/408Radar; Laser, e.g. lidar
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2556/00Input parameters relating to data
    • B60W2556/10Historical data

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Transportation (AREA)
  • Computational Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Analysis (AREA)
  • Automation & Control Theory (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mechanical Engineering (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a decision-making method for safe driving of a large-scale commercial vehicle in an urban low-speed environment. And secondly, constructing a multi-head attention-based safe driving decision model of the commercial vehicle. The model contains two sub-networks, deep double-Q network and generative confrontation mock-learning. The deep double-Q network learns the safe driving strategies under the edge scenes such as dangerous scenes and conflict scenes in an unsupervised learning mode; an antagonistic emulation learning sub-network is generated that emulates safe driving behavior of a human driver under different driving conditions and driving regimes. And finally, training a safe driving decision model to obtain driving strategies under different driving conditions and driving conditions. The method provided by the invention can simulate the safe driving behavior of a human driver, and considers the influence of factors such as a visual blind area, a sudden obstacle and the like on the driving safety.

Description

Decision-making method for safe driving of large-scale commercial vehicle in urban low-speed environment
Technical Field
The invention relates to a decision-making method for driving commercial vehicles, in particular to a decision-making method for safe driving of large commercial vehicles in an urban low-speed environment, and belongs to the technical field of automobile safety.
Background
In urban traffic environments, road traffic accidents caused by blind areas of driver's sight lines account for the highest percentage, and the main causes of these accidents are large-scale operation vehicles such as heavy trucks, large buses, motor trains and the like. Different from passenger vehicles, large commercial vehicles have the characteristics of large volume, long vehicle body, large wheelbase, high driving position and the like, and a plurality of static and dynamic vision blind areas exist around the vehicle body, such as the front part of the vehicle head, the vicinity of the right front wheel, the lower part of the right rearview mirror and the like. When the commercial vehicle turns, particularly turns to the right, pedestrians and non-motor vehicles in the blind area of the visual field are extremely easy to collide and even roll, and the main area of the serious safety accident is generated. In addition, compared with a closed expressway scene, under the condition of non-mixed urban traffic environment, the types and the number of traffic participants are relatively large, and the situation that the operating vehicles meet obstacles suddenly occurs, so that the danger is higher. Therefore, how to improve the driving safety of commercial vehicles in an open urban traffic environment with interference of multiple traffic targets is a key problem to be solved urgently at present and is also a key point for guaranteeing the urban road traffic safety.
At present, the active development of the automatic driving technology becomes an important means for guaranteeing the running safety of vehicles widely accepted at home and abroad. As a key part of achieving high-quality automatic driving, driving decisions determine the rationality and safety of automatic driving of commercial vehicles. If the driver can be warned of danger in 1.5 seconds before the traffic accident happens, and a reliable and effective safe driving strategy is provided, the frequency of the traffic accident caused by factors such as blind vision areas, suddenly meeting obstacles and the like can be greatly reduced. Therefore, the research on the safe driving decision method of the large-scale commercial vehicle plays an important role in guaranteeing the driving safety of the commercial vehicle.
Many patents and documents have been developed for collision avoidance driving decisions, but they are mainly directed to passenger vehicles. Compared with a passenger vehicle, the commercial vehicle has a larger visual blind area and longer braking distance and braking time. The anti-collision decision method for the passenger vehicle cannot be directly applied to the commercial vehicle. On the other hand, some patents have studied the safe driving decision of commercial vehicles, such as a decision method for safe driving of highly human-like automatic driving commercial vehicles (application No. 202210158758.2), a decision method for lane change of large commercial vehicles based on deep learning (publication No. CN 113954837A), etc., but these decision methods are all oriented to highway scenes.
Different from a highway scene with few traffic participant types, the urban traffic environment has the characteristics of openness, interference of multiple traffic targets, non-mixed movement and the like. Especially, the existence of factors such as a vehicle vision blind area and an obstacle suddenly presents higher challenges for safe driving of commercial vehicles in an urban traffic environment. Therefore, the decision-making method for safe driving of commercial vehicles oriented to the highway scene cannot be directly applied to the open and interfered urban traffic environment.
Generally, for an open urban traffic environment interfered by multiple traffic targets, the existing method is difficult to meet the requirement of an operating vehicle on safe driving decision, and a safe driving decision method capable of providing specific driving suggestions such as driving actions and driving paths is lacked, in particular to a safe driving decision research of a large operating vehicle considering the influence of visual blind areas and sudden obstacles.
Disclosure of Invention
The invention aims to: the invention provides a decision-making method for safe driving of a large commercial vehicle in an urban low-speed environment, aiming at automatically-driven commercial vehicles such as heavy trucks and heavy trucks, and aiming at realizing decision-making for safe driving of the large commercial vehicle in the urban low-speed environment and guaranteeing the driving safety of the vehicle. The method comprehensively considers the influence of factors such as visual blind areas, suddenly meeting obstacles, different driving conditions and the like on the driving safety, can simulate the safe driving behavior of human drivers, provides a more reasonable and safe driving strategy for automatically driving the operating vehicle, and can effectively ensure the driving safety of the automatically driving operating vehicle. Meanwhile, the method does not need to consider complex vehicle dynamics equations and vehicle body parameters, the calculation method is simple and clear, the safe driving strategy of the automatic driving operation vehicle can be output in real time, the cost of the used sensor is low, and the method is convenient for large-scale popularization.
The technical scheme is as follows: in order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows: a decision-making method for safe driving of large commercial vehicles in an urban low-speed environment. Firstly, the safe driving behaviors of human drivers in the urban traffic environment are collected, and a safe driving behavior data set is constructed and formed. And secondly, constructing a multi-head attention-based decision model for safe driving of the commercial vehicle. The model contains two sub-networks of deep double-Q network and generative confrontation mimic learning. The deep double-Q network learns the safe driving strategies under the edge scenes such as dangerous scenes and conflict scenes in an unsupervised learning mode; an antagonistic emulation learning sub-network is generated to emulate safe driving behavior under different driving conditions and driving regimes. And finally, training a safe driving decision model to obtain driving strategies under different driving conditions and driving conditions, and realizing high-level decision output of safe driving behaviors of the commercial vehicle. The method specifically comprises the following steps:
the method comprises the following steps: collecting safe driving behaviors of human drivers in urban traffic environment
In order to realize a driving decision comparable to that of a human driver, safe driving behaviors under different driving conditions and driving conditions are collected in a mode of actual road testing and driving simulation, and a data set representing the safe driving behaviors of the human driver is constructed. The method specifically comprises the following 4 sub-steps:
substep 1: and constructing a multi-dimensional target information synchronous acquisition system by using a millimeter wave radar, a 128-line laser radar, a vision sensor, a Beidou sensor and an inertial sensor.
And substep 2: under a real urban environment, a plurality of drivers sequentially drive a commercial vehicle carrying a multi-dimensional target information synchronous acquisition system, relevant data of various driving behaviors such as lane change, lane keeping, vehicle following, acceleration and deceleration and the like of the drivers are acquired and processed, and multi-source heterogeneous description data of the driving behaviors are acquired, such as the distances of obstacles in a plurality of different directions measured by a radar or a vision sensor, the positions, the speeds, the accelerations, the yaw angular velocities and the like measured by a Beidou sensor and an inertial sensor, the steering wheel corners measured by a vehicle-mounted sensor and the like.
Substep 3: in order to simulate safe driving behaviors under marginal scenes such as dangerous scenes, conflict scenes and the like, a virtual city scene based on hardware-in-the-loop simulation is built, and the built city traffic scenes comprise the following three types:
(1) During the driving process of the vehicle, a transversely approaching traffic participant (namely, a sudden obstacle) appears in front of the vehicle;
(2) In the steering process of the vehicle, static traffic participants exist in a visual blind area of the vehicle;
(3) During the turning of the vehicle, moving traffic participants are present in the blind vision zone of the vehicle.
In the above traffic scenarios, there are a variety of road network structures (straight roads, curves, and intersections) and a variety of traffic participants (commercial vehicles, passenger vehicles, non-motor vehicles, and pedestrians).
A plurality of drivers drive the operating vehicles in the virtual scene through real controllers (a steering wheel, an accelerator and a brake pedal), and information such as the transverse and longitudinal position, the transverse and longitudinal speed, the transverse and longitudinal acceleration, the relative distance and the relative speed with surrounding traffic participants and the like of the vehicles is collected.
Substep 4: based on data collected by a real city environment and a driving simulation environment, a driving behavior data set for safe driving decision learning is constructed and formed, and the driving behavior data set can be specifically expressed as follows:
Figure BDA0003829935270000031
wherein X represents a binary group covering state and action, i.e. a constructed data set representing safe driving behavior of human driver,(s) j ,a j ) Indicating the "state" at time j-action "pairs, wherein s j Indicates the state at time j, a j Indicating the action at time j, i.e. human driver based on state s j The actions made, n represents the number of "state-action" pairs in the database.
Step two: construction of multi-head attention-based decision model for safe driving of commercial vehicle
In order to realize the safe driving decision of the large-scale commercial vehicle in the urban low-speed environment, the invention comprehensively considers the influence of factors such as a visual blind area, an emergent obstacle, a driving condition and the like on the driving safety, and establishes a commercial vehicle safe driving decision model. In consideration of the fact that the perception capability of deep learning is combined with the decision capability of the deep learning in the deep reinforcement learning mode, the traffic environment is explored in an unsupervised learning mode, and the deep reinforcement learning mode is utilized to learn the safe driving strategy in the boundary scene such as a dangerous scene and a conflict scene. In addition, the invention simulates the safe driving behavior of the human driver under different driving conditions and driving conditions by means of the imitation learning in consideration of the ability of the imitation learning to have the imitation chart. Therefore, the constructed safe driving decision model consists of two parts, which are specifically described as follows:
substep 1: defining basic parameters of a safe driving decision model
Firstly, converting the safe driving decision problem under the urban low-speed environment into a limited Markov decision process. Secondly, basic parameters of a safe driving decision model are defined.
(1) Defining a state space
To describe the motion state of the own vehicle and nearby traffic participants, the present invention constructs a state space using time series data and an occupancy grid map. The specific description is as follows:
S t =[S 1 (t),S 2 (t),S 3 (t)] (2)
in the formula, S t Representing the state space at time t, S 1 (t) and S 2 (t) a state space relating time-series data at time t, S 3 (t) represents the state space associated with the occupied trellis diagram at time t.
Firstly, the motion state of the self-vehicle is described by using continuous position, speed, acceleration and course angle information:
S 1 (t)=[p x ,p y ,v x ,v y ,a x ,a ys ] (3)
in the formula, p x ,p y Respectively represents the transverse position and the longitudinal position of the bicycle, and the units are meter and v x ,v y Respectively, the transverse speed and the longitudinal speed of the self-vehicle, and the unit is meter per second, a x ,a y Respectively represents the lateral acceleration and the longitudinal acceleration of the bicycle, and the unit is meter per second squared and theta s Indicating the heading angle of the vehicle in degrees.
Secondly, describing the motion state of the surrounding traffic participants by using the relative motion state information of the own vehicle and the surrounding traffic participants:
Figure BDA0003829935270000041
in the formula (I), the compound is shown in the specification,
Figure BDA0003829935270000042
respectively represent the relative distance, the relative speed and the acceleration of the own vehicle and the ith traffic participant, and the units are respectively meter, meter per second and meter per second squared.
In the existing state space definition method, a fixed coding method is often used, that is, the number of considered surrounding traffic participants is fixed. However, in an actual urban traffic scenario, the number and location of traffic participants around a commercial vehicle are constantly changing, and special consideration needs to be given to side collisions caused by sudden obstacles and blind visual zones. Although the fixed coding method can realize effective state representation, the number of considered traffic participants is limited (the minimum information quantity required for representing a scene is used), and the influence of all the surrounding traffic participants on the driving safety of a commercial vehicle cannot be accurately and comprehensively described.
Finally, for a more visual description of the selfThe invention rasterizes the road area, divides the road area into a plurality of a multiplied by b grid areas, abstracts the road area and the vehicle target into a grid map, namely an 'existing' grid map S for describing the relative position relationship 3 (t) of (d). Where a denotes the length of the mesh region and b denotes the width of the mesh region.
The "presence" grid map contains four attributes including grid coordinates, whether a vehicle is present, the category of the corresponding vehicle, and the distance to the left and right lane lines. The grid where no traffic participant exists is set as '0', the grid where the traffic participant exists is set as '1', and the position distribution of the grid and the grid where the own vehicle is located is used for describing the relative distance between the two vehicles.
(2) Defining an action space
Defining an action space with lateral and longitudinal driving actions:
A t =[a left ,a straight ,a right ,a accel ,a cons ,a decel ] (5)
in the formula, A t Represents the motion space at time t, a left ,a straight ,a right Respectively representing left turn, straight run and right turn, a accel ,a cons ,a decel Respectively, acceleration, uniform velocity, and deceleration.
(3) Defining a reward function
R t =r 1 +r 2 +r 3 (6)
In the formula, R t A reward function representing the time t, r 1 ,r 2 ,r 3 Representing a forward collision reward function, a backward collision reward function, and a lateral collision reward function, respectively, may be obtained by equations (7), (8), and (9).
Figure BDA0003829935270000051
Figure BDA0003829935270000052
Figure BDA0003829935270000053
Wherein TTC represents the time when the host vehicle collides with the front obstacle, and is obtained by dividing the distance between the host vehicle and the front obstacle by the relative velocity, and TTC represents the time when the host vehicle collides with the front obstacle thr Representing a distance-to-collision time threshold, RTTC representing a backward collision time, RTTC thr Represents a threshold value of backward collision time, and the unit is second and x lat Indicating the distance, x, between the vehicle and the traffic participants on both sides min Represents the minimum lateral safety distance, and the unit is meter and beta 123 And respectively representing the weight coefficients of the forward collision avoidance reward function, the backward collision avoidance reward function and the lateral collision avoidance reward function.
Substep 2: construction of deep dual-Q network-based decision sub-network
In consideration of the fact that a Deep Double-Q Network (DDQN) uses an experience multiplexing pool, the utilization efficiency of data is improved, parameter oscillation or divergence can be avoided, and negative learning effects caused by over-estimation in a Q learning Network can be reduced. Therefore, the safe driving strategy under the marginal scene is learned by utilizing the deep double-Q network.
Unlike the processing of a state space with fixed dimensions, the processing of feature information covering all surrounding traffic participants needs to have stronger feature extraction capability. Considering that the attention mechanism can capture richer characteristic information (the dependency relationship between the own vehicle and the surrounding traffic participants), the invention designs a strategy network based on the multi-head attention mechanism. In addition, considering that the driving decision is only related to the motion states of the own vehicle and the surrounding traffic participants and is not influenced by the sequence of each traffic participant in the state space, the invention utilizes a position coding method (the document: vaswani, ashish, et al. "Attention all you needed." Advances in Neural Information Processing systems.2017.) to construct the arrangement invariance into a decision sub-network.
The attention layer can be expressed as:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O (10)
where Multihead (Q, K, V) represents a multi-head attention value, Q represents a query vector, K represents a key vector, and dimensions are d k V represents a vector of values with dimension d v ,W O Representing a parameter matrix, head, to be learned h Denotes the h-th head in multi-head attention, and in the present invention, h =2, can be calculated by the following formula:
Figure BDA0003829935270000061
Figure BDA0003829935270000062
wherein Attention (Q, K, V) represents the Attention matrix of the output,
Figure BDA0003829935270000063
representing the parameter matrix to be learned.
And constructing a decision sub-network based on the deep double-Q network, which is described in detail as follows.
First, a state space S t Respectively connected to encoder 1, encoder 2 and encoder 3. The encoder 1 is composed of two full-connected layers and outputs the motion state code of the vehicle. The encoder 2 has the same structure as the encoder 1, and outputs a relative motion state code. The encoder 3 consists of two convolutional layers and outputs occupy the trellis diagram encoding.
The number of neurons in the full junction layer is 64, and the activation functions are Tanh functions. The convolution kernels of the convolutional layers are all 3 × 3, and the step lengths are all 2.
Secondly, the dependency relationship between the self-vehicle and the surrounding traffic participants is analyzed by utilizing a multi-head attention mechanism, so that the decision sub-network can notice the traffic participants which are suddenly close to the self-vehicle or conflict with the driving path of the self-vehicle, and different input sizes and arrangement invariances are built into the decision sub-network. The outputs of the encoder 1, the encoder 2 and the encoder 3 are all connected with the multi-head attention module to output an attention matrix. Again, the output attention matrix is connected to the decoder 1. The encoder 1 consists of one fully connected layer.
The number of neurons in the full connection layer is 64, and the activation function is a Sigmoid function.
Substep 3: building a decision sub-network based on generation of confrontation mimic learning
Under the complex urban traffic environment with open and multi-traffic target interference, an accurate and comprehensive reward function is difficult to construct, and particularly, the influence of uncertainty (such as a sudden obstacle, a traffic participant in a visual blind area and the like) on driving safety is difficult to quantitatively describe. In order to reduce the influence of uncertainty of traffic environment and driving condition on safe driving decision and improve the effectiveness and reliability of driving decision, the invention utilizes the driving strategy in the sample data which is generated to resist and imitate a learning sub-network, learn a driving behavior data set and generalization thereof, and further imitate safe driving behaviors under different driving conditions and driving conditions. The generation confrontation imitation learning sub-network consists of a generator and an arbiter, and the generator network and the arbiter network are respectively constructed by utilizing a deep neural network. The specific description is as follows:
(1) Construct generators
A generator network is constructed. The input of the generator is the state space and the output is the probability value f = pi (· | s; θ) of each action in the action space, where θ represents a parameter of the generator network. First, the state space is connected with the full connection layer FC in sequence 1 And FC 2 Are connected to obtain a characteristic F 1 . State space in turn with full connection layer FC 3 And FC 4 Obtaining a characteristic F 2 . At the same time, the state space is sequentially connected with the convolution layer C 1 And a convolution layer C 2 Are connected to obtain the characteristic F 3 . Then, the feature F is processed 1 、F 2 And F 3 Sequentially combined with a merging layer and a full connecting layer FC 5 And the Softmax activation function to obtain an output f = π (· | s; θ).
Wherein, the full connection layer FC 1 、FC 2 、FC 3 、FC 4 And FC 5 The number of neurons in (1) is 64, and the number of convolution layers C 1 Has a convolution kernel of 3 x 3 and a step size of 2, and a convolution layer C 2 Has a convolution kernel of 3 × 3 with a step size of 1.
(2) Construction discriminator
And constructing a discriminator network. The input of the discriminator is a state space, and the output is a vector
Figure BDA0003829935270000071
Dimension 6, where φ represents a parameter of the arbiter network. First, the state space is connected with the full connection layer FC in sequence 6 And FC 7 Are connected to obtain the characteristic F 4 . State space in turn with full connection layer FC 8 And FC 9 Obtaining a characteristic F 5 . At the same time, the state space is sequentially connected with the convolution layer C 3 And a convolution layer C 4 Are connected to obtain a characteristic F 6 . Then, the feature F is processed 4 、F 5 And F 6 Sequentially combined with a merging layer and a full connection layer FC 10 Connected with Sigmoid activation function to obtain output
Figure BDA0003829935270000072
Wherein the full connection layer FC 6 、FC 7 、FC 8 、FC 9 And FC 10 The number of neurons in the layer is 64, and the layer is convolutional 3 Has a convolution kernel of 3 x 3 and a step size of 2, and a convolution layer C 4 Has a convolution kernel of 3 × 3 with a step size of 1.
Step three: model for training safe driving decision of commercial vehicle
First, training is based on generating a sub-network of decisions against mock learning. The goal of generating the antagonistic mock learning subnetwork is to learn a generator network such that the arbiter cannot distinguish between driving actions generated by the generator and actions in the driving behavior data set. The method specifically comprises the following substeps:
substep 1: in a driving behavior data set
Figure BDA0003829935270000081
In (1), initializing the generator network parameter θ 0 Sum discriminator network parameter omega 0
And substep 2: l iterative solutions are performed, each iteration comprising sub-steps 2.1 to 2.2, in particular:
substep 2.1: updating the discriminator parameter ω using the gradient equation described by equation (13) i →ω i+1
Figure BDA0003829935270000082
In the formula (I), the compound is shown in the specification,
Figure BDA0003829935270000083
a gradient function representing a neural network loss function with a parameter ω;
substep 2.2: setting a reward function
Figure BDA0003829935270000084
Updating generator parameter θ using trust domain policy optimization algorithm i →θ i+1
Firstly, on the basis of the network training result, continuing training and constructing a decision sub-network based on the DDQN, specifically comprising the following sub-steps:
substep 3: initializing the capacity of an experience multiplexing pool D to be N;
substep 4: the Q value corresponding to the initialization action is a random value;
substep 5: performing M iterative solutions, each iteration comprising substeps 5.1 to 5.2, in particular:
substep 5.1: initialization state s 0 Initialization of the policy parameter phi 0
Substep 5.2: performing T iterative solutions, each iteration comprising sub-steps 5.21 to 5.27, in particular:
substep 5.21: randomly selecting a driving action;
substep 5.22: otherwise, select a t =max a Q * (φ(s t ),a;θ);
In the formula, Q * (. Represents an optimal action cost function, a t Represents the operation at time t;
substep 5.23: performing action a t Obtaining the reward value r at time t t And state s at time t +1 t+1
Substep 5.24: store samples (phi) in an empirical multiplexing pool D t ,a t ,r tt+1 );
Substep 5.25: randomly draw a small number of samples (phi) from the empirical multiplexing pool D j ,a j ,r jj+1 );
Substep 5.26: the iteration target is calculated using the formula:
Figure BDA0003829935270000085
in the formula (I), the compound is shown in the specification,
Figure BDA0003829935270000086
representing the weight of the target network at time t; γ represents a discount factor; argmax (·) denotes a variable that causes the objective function to have a maximum value, y i Representing an iteration target at time i, and p (s, a) representing a motion distribution;
substep 5.27: using the following formula in (y) i -Q(φ j ,a j ;θ)) 2 The gradient is decreased:
Figure BDA0003829935270000091
in the formula (I), the compound is shown in the specification,
Figure BDA0003829935270000092
with the parameter θ i The gradient function of the neural network loss function of (1), epsilon represents the probability of randomly selecting an action under an epsilon-greedy exploration strategy; theta i Parameter, L, representing the iteration at time i ii ) Represents the loss function at time i, Q (s, a; theta.theta. i ) Representing the action cost function of the target network and a 'representing all possible actions of state s'.
After the training of the safe driving decision model of the commercial vehicle is finished, the state space information acquired by the sensor is input into the safe driving decision model, so that advanced driving decisions such as steering, straight going, acceleration and deceleration and the like can be output in real time, and the running safety of the commercial vehicle in the urban low-speed environment can be effectively guaranteed.
Has the advantages that: compared with a general driving decision method, the method provided by the invention has the characteristics of more effectiveness and reliability, and is specifically embodied as follows:
(1) The method provided by the invention can simulate the safe driving behavior of a human driver, provides a more reasonable and safe driving strategy for the commercial vehicle in the low-speed environment of the city, realizes the safe driving decision of the large commercial vehicle with high human-like level, and can effectively ensure the driving safety of the vehicle.
(2) The method provided by the invention comprehensively considers the influence of factors such as visual blind areas, sudden obstacles, different driving conditions and the like on driving safety, and carries out strategy learning and training in normal driving scenes and marginal scenes, thereby further improving the effectiveness and reliability of driving decisions.
(3) The method provided by the invention introduces a multi-head attention mechanism, considers the dynamic interaction between the self-vehicle and all surrounding traffic participants, and can process the input variable (the number of the surrounding traffic participants is dynamically changed) safe driving decision.
(4) The method provided by the invention does not need to consider complex vehicle dynamics equations and vehicle body parameters, the calculation method is simple and clear, the safe driving decision strategy of a large-scale operation vehicle can be output in real time, and the used sensor has low cost and is convenient for large-scale popularization.
Drawings
FIG. 1 is a technical roadmap for the present invention;
FIG. 2 is a schematic diagram of a policy network structure based on a multi-head attention mechanism designed by the present invention;
FIG. 3 is a schematic diagram of a generator network structure designed by the present invention;
fig. 4 is a schematic diagram of a network structure of the discriminator designed by the invention.
Detailed Description
The technical scheme of the invention is further explained by combining the drawings and the embodiment.
The invention provides a decision-making method for safe driving of a large-scale commercial vehicle with a high humanoid level aiming at an open urban traffic environment interfered by multiple traffic targets. Firstly, the safe driving behaviors of human drivers in the urban traffic environment are collected, and a safe driving behavior data set is constructed and formed. And secondly, constructing a multi-head attention-based decision model for safe driving of the commercial vehicle. The model contains two sub-networks of deep double-Q network and generative confrontation mimic learning. The deep double-Q network learns the safe driving strategy under the edge scene such as a dangerous scene and a conflict scene in an unsupervised learning mode; and generating an antagonistic imitation learning sub-network to imitate safe driving behaviors under different driving conditions and driving conditions. And finally, training a safe driving decision model to obtain driving strategies under different driving conditions and driving conditions, and realizing high-level decision output of safe driving behaviors of the commercial vehicle. The method provided by the invention can simulate the safe driving behavior of human drivers, considers the influence of factors such as visual blind areas and suddenly encountered obstacles on the driving safety, provides a more reasonable and safe driving strategy for large-scale commercial vehicles, and realizes the safe driving decision of the commercial vehicles under the urban traffic environment. The technical route of the invention is shown in figure 1, and the specific steps are as follows:
the method comprises the following steps: collecting safe driving behaviors of human drivers in urban traffic environment
In order to realize a driving decision comparable to that of a human driver, safe driving behaviors under different driving conditions and driving conditions are collected in a mode of actual road testing and driving simulation, and a data set representing the safe driving behaviors of the human driver is constructed. The method specifically comprises the following 5 sub-steps:
substep 1: and constructing a multi-dimensional target information synchronous acquisition system by using a millimeter wave radar, a 128-line laser radar, a vision sensor, a Beidou sensor and an inertial sensor.
Substep 2: under the real urban environment, a plurality of drivers drive operating vehicles carrying the multi-dimensional target information synchronous acquisition system in sequence.
Substep 3: the method comprises the steps of collecting and processing relevant data of various driving behaviors such as lane change, lane keeping, car following, acceleration and deceleration and the like of a driver, and obtaining multi-source heterogeneous description data of the driving behaviors, such as the distances of obstacles in different directions measured by a radar or a vision sensor, the positions, the speeds, the accelerations, the yaw velocities and the like measured by a Beidou sensor and an inertial sensor, the steering wheel turning angles measured by a vehicle-mounted sensor and the like.
Substep 4: in order to simulate safe driving behaviors under marginal scenes such as dangerous scenes, conflict scenes and the like, a virtual city scene based on hardware-in-the-loop simulation is built, and the built city traffic scenes comprise the following three types:
(1) During the driving process of the vehicle, a transversely approaching traffic participant (namely, a sudden obstacle) appears in front of the vehicle;
(2) In the steering process of the vehicle, static traffic participants exist in the vision blind area of the vehicle;
(3) During the turning process of the vehicle, moving traffic participants exist in the vision blind area of the vehicle.
In the above traffic scenarios, there are a variety of road network structures (straight roads, curves, and intersections) and a variety of traffic participants (commercial vehicles, passenger vehicles, non-motor vehicles, and pedestrians).
A plurality of drivers drive the operating vehicles in the virtual scene through real controllers (a steering wheel, an accelerator and a brake pedal), and information such as the transverse and longitudinal position, the transverse and longitudinal speed, the transverse and longitudinal acceleration, the relative distance and the relative speed with surrounding traffic participants and the like of the vehicles is collected.
Substep 5: based on data collected by a real city environment and a driving simulation environment, a driving behavior data set for safe driving decision learning is constructed and formed, and the driving behavior data set can be specifically expressed as follows:
Figure BDA0003829935270000111
wherein X represents a binary group covering state and action, namely a constructed data set representing safe driving behavior of the human driver,(s) j ,a j ) Represents a "state-action" pair at time j, where s j Indicates the state at time j, a j Indicating the action at time j, i.e. human driver based on state s j The actions made, n represents the number of "state-action" pairs in the database.
Step two: construction of multi-head attention-based decision model for safe driving of commercial vehicle
In order to realize the safe driving decision of the large-scale commercial vehicle in the urban low-speed environment, the invention comprehensively considers the influence of factors such as a visual blind area, an emergent obstacle, a driving condition and the like on the driving safety, and establishes a commercial vehicle safe driving decision model. In consideration of the fact that the perception capability of deep learning and the decision-making capability of the deep learning are combined by the deep reinforcement learning, the traffic environment is explored in an unsupervised learning mode, and the deep reinforcement learning is utilized to learn the safe driving strategy under the boundary scene such as a dangerous scene and a conflict scene. In addition, the invention simulates the safe driving behavior of the human driver under different driving conditions and driving conditions by means of the imitation learning in consideration of the ability of the imitation learning to have the imitation chart. Therefore, the constructed safe driving decision model is composed of two parts, which are described in detail as follows:
substep 1: defining basic parameters of a safe driving decision model
Firstly, converting the safe driving decision problem under the low-speed environment of the city into a limited Markov decision process. Secondly, basic parameters of a safe driving decision model are defined.
(1) Defining a state space
In order to describe the motion state of the own vehicle and the nearby traffic participants, the invention constructs a state space by using time series data and an occupancy grid map. The specific description is as follows:
S t =[S 1 (t),S 2 (t),S 3 (t)] (2)
in the formula, S t Representing the state space at time t, S 1 (t) and S 2 (t) a state space relating time t to time series data, S 3 (t) represents the state space associated with the occupied trellis diagram at time t.
Firstly, the motion state of the self-vehicle is described by using continuous position, speed, acceleration and course angle information:
S 1 (t)=[p x ,p y ,v x ,v y ,a x ,a ys ] (3)
in the formula, p x ,p y Respectively represents the transverse position and the longitudinal position of the bicycle, and the units are meter and v x ,v y Respectively, the transverse speed and the longitudinal speed of the self-vehicle, and the unit is meter per second, a x ,a y Respectively represents the lateral acceleration and the longitudinal acceleration of the bicycle, and the unit is meter per second squared and theta s Indicating the heading angle of the vehicle in degrees.
Secondly, describing the motion state of the surrounding traffic participants by using the relative motion state information of the own vehicle and the surrounding traffic participants:
Figure BDA0003829935270000121
in the formula (I), the compound is shown in the specification,
Figure BDA0003829935270000122
respectively represent the relative distance, the relative speed and the acceleration of the own vehicle and the ith traffic participant, and the units are respectively meter, meter per second and meter per second squared.
In the existing state space definition method, a fixed coding method is often used, that is, the number of considered surrounding traffic participants is fixed. However, in an actual urban traffic scenario, the number and location of traffic participants around a commercial vehicle are constantly changing, and special consideration needs to be given to side collisions caused by sudden obstacles and blind visual zones. Although the fixed coding method can realize effective state representation, the number of considered traffic participants is limited (the minimum information quantity required for representing a scene is used), and the influence of all the surrounding traffic participants on the driving safety of a commercial vehicle cannot be accurately and comprehensively described.
Finally, in order to describe the relative position relationship between the vehicle and the surrounding traffic participants more vividly and improve the reliability and effectiveness of decision making, the invention grids the road area, divides the road area into a plurality of a multiplied by b grid areas, abstracts the road area and the vehicle target into a grid map, namely an 'existing' grid map S for describing the relative position relationship 3 (t) of (d). Where a denotes the length of the mesh region and b denotes the width of the mesh region.
The "presence" grid map contains four attributes including grid coordinates, whether a vehicle is present, the category of the corresponding vehicle, and the distance to the left and right lane lines. The grid where no traffic participant exists is set as '0', the grid where the traffic participant exists is set as '1', and the position distribution of the grid and the grid where the own vehicle is located is used for describing the relative distance between the two vehicles.
(2) Defining an action space
Defining an action space with lateral and longitudinal driving actions:
A t =[a left ,a straight ,a right ,a accel ,a cons ,a decel ] (5)
in the formula, A t Represents the motion space at time t, a left ,a straight ,a right Respectively representing left turn, straight run and right turn, a accel ,a cons ,a decel Respectively, acceleration, uniform velocity, and deceleration.
(3) Defining a reward function
R t =r 1 +r 2 +r 3 (6)
In the formula, R t A reward function representing time t, r 1 ,r 2 ,r 3 Respectively indicating forward collision avoidanceThe reward function, the backward collision prevention reward function, and the lateral collision prevention reward function can be obtained by equations (7), (8), and (9).
Figure BDA0003829935270000131
Figure BDA0003829935270000132
Figure BDA0003829935270000133
Wherein TTC represents the time when the host vehicle collides with the front obstacle, and is obtained by dividing the distance between the host vehicle and the front obstacle by the relative velocity, and TTC represents the time when the host vehicle collides with the front obstacle thr Representing a distance-to-collision time threshold, RTTC representing a backward collision time, RTTC thr Represents a threshold value of backward collision time, and the unit is second and x lat Indicating the distance, x, from the vehicle to the two side traffic participants min Represents the minimum lateral safety distance, and the unit is meter and beta 123 And respectively representing the weight coefficients of the forward collision prevention reward function, the backward collision prevention reward function and the lateral collision prevention reward function.
And substep 2: constructing DDQN-based decision sub-networks
In consideration of the fact that a Deep Double-Q Network (DDQN) uses an experience multiplexing pool, the utilization efficiency of data is improved, parameter oscillation or divergence can be avoided, and negative learning effects caused by over-estimation in a Q learning Network can be reduced. Therefore, the safe driving strategy under the marginal scene is learned by utilizing the deep double-Q network.
Unlike the processing of a state space with fixed dimensions, the processing of feature information covering all surrounding traffic participants needs to have stronger feature extraction capability. Considering that the attention mechanism can capture richer characteristic information (the dependency relationship between the own vehicle and the surrounding traffic participants), the invention designs a strategy network based on the multi-head attention mechanism. In addition, considering that the driving decision is only related to the motion states of the own vehicle and the surrounding traffic participants and is not influenced by the sequence of each traffic participant in the state space, the invention utilizes a position coding method (the document: vaswani, ashish, et al. "Attention all you needed." Advances in Neural Information Processing systems.2017.) to construct the arrangement invariance into a decision sub-network.
The attention layer may be expressed as:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O (10)
where Multihead (Q, K, V) represents a multi-head attention value, Q represents a query vector, K represents a key vector, and dimensions are d k V represents a vector of values with dimension d v ,W O Representing a parameter matrix, head, to be learned h Denotes the h-th head in multi-head attention, and in the present invention, h =2, can be calculated by the following formula:
Figure BDA0003829935270000134
Figure BDA0003829935270000141
wherein Attention (Q, K, V) represents the Attention matrix of the output,
Figure BDA0003829935270000142
representing the parameter matrix to be learned.
A decision sub-network based on a deep double-Q network is constructed, as shown in fig. 2, and described in detail below.
First, a state space S t Respectively connected to encoder 1, encoder 2 and encoder 3. The encoder 1 is composed of two full-connected layers and outputs the motion state code of the vehicle. The encoder 2 has the same structure as the encoder 1, and outputs a relative motion state code. The encoder 3 consists of two convolutional layers and outputs a trellis diagram encoding.
The number of neurons in the full junction layer is 64, and the activation functions are Tanh functions. The convolution kernels of the convolutional layers are all 3 × 3, and the step lengths are all 2.
Secondly, the dependency relationship between the self-vehicle and the surrounding traffic participants is analyzed by utilizing a multi-head attention mechanism, so that the decision sub-network can notice the traffic participants which are suddenly close to the self-vehicle or conflict with the driving path of the self-vehicle, and different input sizes and arrangement invariances are built into the decision sub-network. The outputs of the encoder 1, the encoder 2 and the encoder 3 are all connected with the multi-head attention module to output an attention matrix. Again, the output attention matrix is connected to the decoder 1. The encoder 1 consists of one fully connected layer.
The number of neurons in the full connection layer is 64, and the activation function is a Sigmoid function.
Substep 3: building a decision sub-network based on generative confrontation mock learning
Under the complex urban traffic environment with open and multi-traffic target interference, it is difficult to construct an accurate and comprehensive reward function, and especially difficult to quantitatively describe the influence of various uncertainties (such as a sudden obstacle, traffic participants in a visual blind area, and the like) on driving safety. In order to reduce the influence of uncertainty of traffic environment and driving condition on safe driving decision and improve the effectiveness and reliability of driving decision, the invention utilizes the driving strategy in the sample data which is generated to resist and imitate a learning sub-network, learn a driving behavior data set and generalization thereof, and further imitate safe driving behaviors under different driving conditions and driving conditions. The generation confrontation imitation learning sub-network consists of a generator and an arbiter, and the generator network and the arbiter network are respectively constructed by utilizing a deep neural network. The specific description is as follows:
(1) Construct generators
A generator network as shown in figure 3 is constructed. The input of the generator is the state space and the output is the probability value f = pi (· | s; θ) of each action in the action space, where θ represents a parameter of the generator network. First, the state space is connected with the full connection layer FC in sequence 1 And FC 2 Are connected to obtain the characteristic F 1 . State space in turn with full connection layer FC 3 And FC 4 Obtaining a characteristic F 2 . At the same time, the state space is sequentially connected with the convolution layer C 1 And a convolution layer C 2 Are connected to obtain the characteristic F 3 . Then, the feature F 1 、F 2 And F 3 Sequentially combined with a merging layer and a full connection layer FC 5 And the Softmax activation function to obtain an output f = pi (· | s; theta).
Wherein, the full connection layer FC 1 、FC 2 、FC 3 、FC 4 And FC 5 The number of neurons in (1) is 64, and the number of convolution layers C 1 Has a convolution kernel of 3 x 3 and a step size of 2, and a convolution layer C 2 Has a convolution kernel of 3 × 3 with a step size of 1.
(2) Constructing discriminators
A network of discriminators as shown in figure 4 is constructed. The input of the discriminator is the state space and the output is the vector
Figure BDA0003829935270000151
Dimension 6, where φ represents a parameter of the arbiter network. First, the state space is connected with the full connection layer FC in sequence 6 And FC 7 Are connected to obtain a characteristic F 4 . State space sequentially connected with full connection layer FC 8 And FC 9 Obtaining a characteristic F 5 . At the same time, the state space is sequentially connected with the convolution layer C 3 And a convolution layer C 4 Are connected to obtain a characteristic F 6 . Then, the feature F is processed 4 、F 5 And F 6 Sequentially combined with a merging layer and a full connection layer FC 10 Connected with Sigmoid activation function to obtain output
Figure BDA0003829935270000152
Wherein, the full connection layer FC 6 、FC 7 、FC 8 、FC 9 And FC 10 The number of neurons in the layer is 64, and the layer is convolutional 3 With a convolution kernel of 3 x 3 and a step size of 2, convolution layer C 4 Has a convolution kernel of 3 × 3 with a step size of 1.
Step three: decision model for training safe driving of commercial vehicle
First, training is based on generating a sub-network of decisions against mock learning. The goal of generating the confrontational mimic learning subnetwork is to learn a generator network so that the arbiter cannot distinguish between the driving actions generated by the generator and the actions in the driving behavior data set. The method specifically comprises the following substeps:
substep 1: in a driving behavior data set
Figure BDA0003829935270000153
In (1), initializing the generator network parameter θ 0 Sum arbiter network parameter ω 0
Substep 2: l iterative solutions are performed, each iteration comprising sub-steps 2.1 to 2.2, in particular:
substep 2.1: updating the discriminator parameter ω using the gradient equation described by equation (13) i →ω i+1
Figure BDA0003829935270000154
In the formula (I), the compound is shown in the specification,
Figure BDA0003829935270000155
a gradient function representing a neural network loss function with a parameter ω;
substep 2.2: setting a reward function
Figure BDA0003829935270000156
Updating generator parameter θ using trust domain policy optimization algorithm i →θ i+1
Firstly, on the basis of the network training result, continuing training and constructing a decision sub-network based on the DDQN, specifically comprising the following sub-steps:
substep 3: initializing the capacity of an experience multiplexing pool D to be N;
substep 4: initializing a Q value corresponding to the action as a random value;
substep 5: performing M iterative solutions, each iteration comprising substeps 5.1 to 5.2, in particular:
substep 5.1: initialization state s 0 Initialization strategy parameter phi 0
Substep 5.2: performing T iterative solutions, each iteration comprising sub-steps 5.21 to 5.27, in particular:
substep 5.21: randomly selecting a driving action;
substep 5.22: otherwise, select a t =max a Q * (φ(s t ),a;θ);
In the formula, Q * (. Represents an optimal action cost function, a t An operation at time t;
substep 5.23: performing action a t Obtaining the reward value r at time t t And state s at time t +1 t+1
Substep 5.24: store samples (phi) in the empirical multiplexing pool D t ,a t ,r tt+1 );
Substep 5.25: randomly draw a small number of samples (phi) from the empirical multiplexing pool D j ,a j ,r jj+1 );
Substep 5.26: the iteration target is calculated using the following equation:
Figure BDA0003829935270000161
in the formula (I), the compound is shown in the specification,
Figure BDA0003829935270000162
representing the weight of the target network at the time t; gamma represents a discount factor; argmax (·) denotes a variable that maximizes the objective function, y i Represents the iteration target at time i, and p (s, a) represents the motion distribution;
substep 5.27: using the following formula in (y) i -Q(φ j ,a j ;θ)) 2 The gradient is decreased:
Figure BDA0003829935270000163
in the formula (I), the compound is shown in the specification,
Figure BDA0003829935270000164
with the expression parameter theta i The gradient function of the neural network loss function of (1), epsilon represents the probability of randomly selecting an action under an epsilon-greedy exploration strategy; theta i Parameter, L, representing the iteration at time i ii ) Represents the loss function at time i, Q (s, a; theta i ) Representing the action cost function of the target network and a 'representing all possible actions of state s'.
After the training of the safe driving decision model of the commercial vehicle is finished, the state space information acquired by the sensor is input into the safe driving decision model, so that advanced driving decisions such as steering, straight going, acceleration and deceleration and the like can be output in real time, and the running safety of the commercial vehicle in the urban low-speed environment can be effectively guaranteed.

Claims (1)

1. A decision-making method for safe driving of a large-scale commercial vehicle in an urban low-speed environment comprises the steps of firstly, collecting safe driving behaviors of human drivers in an urban traffic environment, and constructing and forming a safe driving behavior data set; secondly, constructing a decision model for safe driving of the commercial vehicle based on multi-head attention; the model comprises two sub-networks of deep double-Q network and generation confrontation simulation learning; the deep double-Q network learns the safe driving strategy under the edge scene such as a dangerous scene and a conflict scene in an unsupervised learning mode; generating an antagonistic imitation learning sub-network to imitate the safe driving behavior of a human driver under different driving conditions and driving conditions; finally, training a safe driving decision model to obtain driving strategies under different driving conditions and driving working conditions, and realizing high-level decision output of safe driving behaviors of the commercial vehicle; the method is characterized in that:
the method comprises the following steps: collecting safe driving behaviors of human drivers in urban traffic environment
Collecting safe driving behaviors under different driving conditions and driving conditions in a mode of actual road testing and driving simulation, and further constructing a data set representing the safe driving behaviors of human drivers; the method specifically comprises the following 4 sub-steps:
substep 1: constructing a multi-dimensional target information synchronous acquisition system by using a millimeter wave radar, a 128-line laser radar, a vision sensor, a Beidou sensor and an inertial sensor;
substep 2: under a real urban environment, a plurality of drivers sequentially drive operating vehicles carrying a multi-dimensional target information synchronous acquisition system, acquire and process relevant data of various driving behaviors of the drivers such as lane change, lane keeping, vehicle following and acceleration and deceleration, and acquire multi-source heterogeneous description data of the driving behaviors, wherein the data comprises a plurality of obstacle distances in different directions measured by a radar or a vision sensor, positions, speeds, accelerations and yaw angular velocities measured by a Beidou sensor and an inertial sensor, and steering wheel corners measured by a vehicle-mounted sensor;
substep 3: in order to simulate safe driving behaviors in edge scenes such as dangerous scenes, conflict scenes and the like, a virtual city scene based on hardware-in-the-loop simulation is built; the constructed urban traffic scene comprises the following three categories:
(1) In the driving process of the vehicle, transversely approaching traffic participants appear in front of the vehicle, namely, the traffic participants meet obstacles suddenly;
(2) In the steering process of the vehicle, static traffic participants exist in a visual blind area of the vehicle;
(3) In the steering process of the vehicle, moving traffic participants exist in the visual blind area of the vehicle;
in the traffic scene, various road network structures exist, including straight roads, curved roads and crossroads, and various types of traffic participants, including commercial vehicles, passenger vehicles, non-motor vehicles and pedestrians;
a plurality of drivers drive a commercial vehicle in a virtual scene through a real controller with a steering wheel, an accelerator and a brake pedal, and acquire the transverse and longitudinal position, the transverse and longitudinal speed, the transverse and longitudinal acceleration, the relative distance with surrounding traffic participants and the relative speed information of the own vehicle;
substep 4: based on data collected by a real city environment and a driving simulation environment, a driving behavior data set for safe driving decision learning is constructed and formed, and the data set is specifically represented as follows:
Figure FDA0003829935260000021
wherein X represents a binary group covering state and action, namely a constructed data set representing safe driving behavior of the human driver,(s) j ,a j ) Represents a "state-action" pair at time j, where s j Indicates the state at time j, a j Indicating the action at time j, i.e. human driver based on state s j The actions taken, n represents the number of "state-action" pairs in the database;
step two: construction of multi-head attention-based safe driving decision model for commercial vehicle
Learning a safe driving strategy under the edge scenes of a dangerous scene and a conflict scene by utilizing deep reinforcement learning; in addition, considering the ability of imitating the sample of the imitation learning, the safe driving behavior of the human driver under different driving conditions and driving conditions is simulated by utilizing the imitation learning; therefore, the constructed safe driving decision model consists of two parts, which are specifically described as follows:
substep 1: defining basic parameters of a safe driving decision model
Firstly, converting a safe driving decision problem in an urban low-speed environment into a limited Markov decision process; secondly, defining basic parameters of a safe driving decision model;
(1) Defining a state space
In order to describe the motion state of the own vehicle and the nearby traffic participants, a state space is constructed by using the time sequence data and the occupancy grid map; the specific description is as follows:
S t =[S 1 (t),S 2 (t),S 3 (t)] (2)
in the formula, S t Representing the state space at time t, S 1 (t) a state space of the own vehicle related to the time-series data at time t, S 2 (t) represents tState space of surrounding traffic participants, S, temporally correlated to time series data 3 (t) represents the state space associated with the occupied raster map at time t;
firstly, the motion state of the self-vehicle is described by using continuous position, speed, acceleration and course angle information:
S 1 (t)=[p x ,p y ,v x ,v y ,a x ,a ys ] (3)
in the formula, p x ,p y Respectively represents the transverse position and the longitudinal position of the bicycle, and the units are meter and v x ,v y Respectively, the transverse speed and the longitudinal speed of the self-vehicle, and the unit is meter per second, a x ,a y Respectively represents the lateral acceleration and the longitudinal acceleration of the bicycle, and the unit is meter per second squared and theta s Representing the course angle of the self vehicle, and the unit is degree;
secondly, describing the motion state of the surrounding traffic participants by using the relative motion state information of the own vehicle and the surrounding traffic participants:
Figure FDA0003829935260000031
in the formula (I), the compound is shown in the specification,
Figure FDA0003829935260000032
a i respectively representing the relative distance, the relative speed and the acceleration of the vehicle and the ith traffic participant, wherein the units are meter, meter per second and meter per second of square;
finally, the road area is rasterized, divided into a plurality of grid areas of p × q, and the road area and the vehicle object are abstracted into a grid map, namely an 'existing' grid map S for describing the relative position relationship 3 (t); wherein p represents the length of the mesh region and q represents the width of the mesh region;
the "existing" grid map contains four attributes including grid coordinates, whether a vehicle exists, the category of the corresponding vehicle, and the distance to the left and right lane lines; the grid where no traffic participant exists is set as '0', the grid where the traffic participant exists is set as '1', and the positions of the grid and the grid where the own vehicle is located are distributed and used for describing the relative distance between the two vehicles;
(2) Defining an action space
Defining an action space with lateral and longitudinal driving actions:
A t =[a left ,a straight ,a right ,a accel ,a cons ,a decel ] (5)
in the formula, A t Represents the motion space at time t, a left ,a straight ,a right Respectively representing left turn, straight run and right turn, a accel ,a cons ,a decel Respectively representing acceleration, uniform speed and deceleration;
(3) Defining a reward function
R t =r 1 +r 2 +r 3 (6)
In the formula, R t A reward function representing time t, r 1 ,r 2 ,r 3 Respectively representing a forward collision avoidance reward function, a backward collision avoidance reward function and a lateral collision avoidance reward function, which can be obtained through an equation (7), an equation (8) and an equation (9);
Figure FDA0003829935260000033
Figure FDA0003829935260000034
Figure FDA0003829935260000035
wherein TTC represents the time when the host vehicle collides with the front obstacle, and is obtained by dividing the distance between the host vehicle and the front obstacle by the relative velocity, and TTC represents the time when the host vehicle collides with the front obstacle thr Indicating a distance collisionTime threshold, RTTC denotes the time of the backward collision, RTTC thr Represents a threshold value of backward collision time, and the unit is second and x lat Indicating the distance, x, from the vehicle to the two side traffic participants min Represents the minimum lateral safety distance, and the unit is meter and beta 123 Respectively representing the weight coefficients of a forward collision prevention reward function, a backward collision prevention reward function and a lateral collision prevention reward function;
and substep 2: construction of a deep dual-Q network-based decision sub-network
A safe driving strategy under an edge scene is learned by utilizing a deep double-Q network;
designing a strategy network based on a multi-head attention mechanism; in addition, the permutation invariance is constructed into a decision sub-network by using a position coding method;
the attention layer can be expressed as:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O (10)
where MultiHead (Q, K, V) represents a multi-head attention value, Q represents a query vector, K represents a key vector, and dimensions are d k V represents a vector of values with dimension d v ,W O Representing a parameter matrix, head, to be learned h Denotes the h-th head in multi-head attention, in the present invention, h =2, calculated by the following formula:
Figure FDA0003829935260000041
Figure FDA0003829935260000042
wherein Attention (Q, K, V) represents the Attention matrix of the output,
Figure FDA0003829935260000043
a parameter matrix representing the need for learning;
constructing a decision sub-network, which is described in detail as follows;
first, a state space S t Respectively connected with the encoder 1, the encoder 2 and the encoder 3; the encoder 1 consists of two full-connection layers and outputs the motion state code of the vehicle; the structure of the encoder 2 is the same as that of the encoder 1, and relative motion state encoding is output; the encoder 3 is composed of two convolution layers and outputs the occupying grid pattern code;
wherein the number of neurons of the full-junction layer is 64, and the activation functions are Tanh functions; convolution kernels of the convolution layers are all 3 multiplied by 3, and step lengths are all 2;
secondly, analyzing the dependency relationship between the self-vehicle and the surrounding traffic participants by using a multi-head attention mechanism, so that the decision sub-network can notice the traffic participants which are suddenly close to the self-vehicle or conflict with the driving path of the self-vehicle, and different input sizes and arrangement invariances are constructed into the decision sub-network; the outputs of the encoder 1, the encoder 2 and the encoder 3 are all connected with the multi-head attention module to output an attention matrix; thirdly, connecting the output attention matrix with the decoder 1; the encoder 1 consists of a fully connected layer;
the number of neurons in the full connection layer is 64, and the activation function is a Sigmoid function;
substep 3: building a decision sub-network based on generation of confrontation mimic learning
The method comprises the steps that a confrontation simulation learning sub-network is generated, a driving behavior data set and a driving strategy in generalized sample data of the driving behavior data set are learned, and then safe driving behaviors of human drivers under different driving conditions and driving working conditions are simulated; the generation confrontation imitation learning sub-network consists of a generator and a discriminator, and a generator network and a discriminator network are respectively constructed by utilizing a deep neural network; the specific description is as follows:
(1) Construct generators
Constructing a generator network as shown in FIG. 3; the input of the generator is state space, and the output is probability value f = pi (· | s; theta) of each action in the action space, wherein theta represents a parameter of the generator network; first, the state space is connected with the full connection layer FC in sequence 1 And FC 2 Are connected to obtain a characteristic F 1 (ii) a State spaceIn turn with full connection layer FC 3 And FC 4 Obtaining a characteristic F 2 (ii) a At the same time, the state space is sequentially linked with the convolution layer C 1 And a convolution layer C 2 Are connected to obtain a characteristic F 3 (ii) a Then, the feature F 1 、F 2 And F 3 Sequentially combined with a merging layer and a full connecting layer FC 5 Connecting the Softmax activation function to obtain an output f = pi (· | s; theta);
wherein, the full connection layer FC 1 、FC 2 、FC 3 、FC 4 And FC 5 The number of neurons in the layer is 64, and the layer is convolutional 1 Has a convolution kernel of 3 x 3 and a step size of 2, and a convolution layer C 2 The convolution kernel of (a) is 3 x 3, the step length is 1;
(2) Construction discriminator
Constructing a discriminator network; the input of the discriminator is the state space and the output is the vector
Figure FDA0003829935260000051
The dimension is 6, wherein φ represents a parameter of the arbiter network; first, the state space is connected with the full connection layer FC in sequence 6 And FC 7 Are connected to obtain a characteristic F 4 (ii) a State space in turn with full connection layer FC 8 And FC 9 Obtaining a characteristic F 5 (ii) a At the same time, the state space is sequentially connected with the convolution layer C 3 And a convolution layer C 4 Are connected to obtain the characteristic F 6 (ii) a Then, the feature F 4 、F 5 And F 6 Sequentially combined with a merging layer and a full connecting layer FC 10 Connected with Sigmoid activation function to obtain output
Figure FDA0003829935260000052
Wherein the full connection layer FC 6 、FC 7 、FC 8 、FC 9 And FC 10 The number of neurons in the layer is 64, and the layer is convolutional 3 Has a convolution kernel of 3 x 3 and a step size of 2, and a convolution layer C 4 The convolution kernel of (a) is 3 x 3, the step length is 1;
step three: decision model for training safe driving of commercial vehicle
First, training is based on generating a decision sub-network for countering mock learning; the goal of generating the confrontational mimic learning subnetwork is to learn a generator network so that the arbiter cannot distinguish between driving actions generated by the generator and actions in the driving behavior data set; the method specifically comprises the following substeps:
substep 1: in-driving behavior data set
Figure FDA0003829935260000061
In (1), initializing the generator network parameter θ 0 Sum discriminator network parameter omega 0
Substep 2: performing L iterative solutions, each iteration comprising substeps 2.1 to 2.2, in particular:
substep 2.1: update of the discriminator parameter ω using the gradient formula described by equation (13) i →ω i+1
Figure FDA0003829935260000062
In the formula + ω A gradient function representing a neural network loss function with a parameter ω;
substep 2.2: setting a reward function
Figure FDA0003829935260000063
Updating generator parameter θ using trust domain policy optimization algorithm i →θ i+1
Firstly, on the basis of the network training result, continuing training and constructing a decision sub-network based on the DDQN, specifically comprising the following sub-steps:
substep 3: initializing the capacity of an experience multiplexing pool D to be N;
and substep 4: initializing a Q value corresponding to the action as a random value;
substep 5: performing M iterative solutions, each iteration comprising substeps 5.1 to 5.2, in particular:
substep 5.1: initialization state s 0 Initialization strategy parameter phi 0
Substep 5.2: performing T iterative solutions, each iteration comprising sub-steps 5.21 to 5.27, in particular:
substep 5.21: randomly selecting a driving action;
substep 5.22: otherwise, select a t =max a Q * (φ(s t ),a;θ);
In the formula, Q * (. Represents an optimal action cost function, a t Represents the operation at time t;
substep 5.23: performing action a t Obtaining the reward value r at time t t And state s at time t +1 t+1
Substep 5.24: store samples (phi) in the empirical multiplexing pool D t ,a t ,r tt+1 );
Substep 5.25: randomly draw a small number of samples (phi) from the empirical multiplexing pool D j ,a j ,r jj+1 );
Substep 5.26: the iteration target is calculated using the following equation:
Figure FDA0003829935260000064
in the formula (I), the compound is shown in the specification,
Figure FDA0003829935260000065
representing the weight of the target network at time t; gamma represents a discount factor; arg max (·) denotes a variable that maximizes the objective function, y i Represents the iteration target at time i, and p (s, a) represents the motion distribution;
substep 5.27: using the following formula in (y) i -Q(φ j ,a j ;θ)) 2 The gradient is decreased:
Figure FDA0003829935260000071
in the formula (I), the compound is shown in the specification,
Figure FDA0003829935260000072
with the parameter θ i The gradient function of the neural network loss function of (1), epsilon represents the probability of randomly selecting an action under an epsilon-greedy exploration strategy; theta i Parameter representing the iteration at time i, L ii ) Represents the loss function at time i, Q (s, a; theta i ) An action cost function representing the target network, a 'represents all possible actions of the state s';
after the training of the safe driving decision model of the commercial vehicle is completed, the state space information acquired by the sensor is input into the safe driving decision model, so that high-level driving decisions such as steering, straight going, acceleration and deceleration and the like can be output in real time, and the running safety of the commercial vehicle in the urban low-speed environment can be effectively guaranteed.
CN202211070514.5A 2022-09-02 2022-09-02 Decision-making method for safe driving of large-scale commercial vehicle in urban low-speed environment Pending CN115257819A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211070514.5A CN115257819A (en) 2022-09-02 2022-09-02 Decision-making method for safe driving of large-scale commercial vehicle in urban low-speed environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211070514.5A CN115257819A (en) 2022-09-02 2022-09-02 Decision-making method for safe driving of large-scale commercial vehicle in urban low-speed environment

Publications (1)

Publication Number Publication Date
CN115257819A true CN115257819A (en) 2022-11-01

Family

ID=83755148

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211070514.5A Pending CN115257819A (en) 2022-09-02 2022-09-02 Decision-making method for safe driving of large-scale commercial vehicle in urban low-speed environment

Country Status (1)

Country Link
CN (1) CN115257819A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115731690A (en) * 2022-11-18 2023-03-03 北京理工大学 Unmanned public transportation cluster decision method based on graph neural network reinforcement learning
CN117048365A (en) * 2023-10-12 2023-11-14 江西五十铃汽车有限公司 Automobile torque control method, system, storage medium and equipment
CN117246345A (en) * 2023-11-06 2023-12-19 镁佳(武汉)科技有限公司 Method, device, equipment and medium for controlling generating type vehicle

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115731690A (en) * 2022-11-18 2023-03-03 北京理工大学 Unmanned public transportation cluster decision method based on graph neural network reinforcement learning
CN115731690B (en) * 2022-11-18 2023-11-28 北京理工大学 Unmanned public transportation cluster decision-making method based on graphic neural network reinforcement learning
CN117048365A (en) * 2023-10-12 2023-11-14 江西五十铃汽车有限公司 Automobile torque control method, system, storage medium and equipment
CN117048365B (en) * 2023-10-12 2024-01-26 江西五十铃汽车有限公司 Automobile torque control method, system, storage medium and equipment
CN117246345A (en) * 2023-11-06 2023-12-19 镁佳(武汉)科技有限公司 Method, device, equipment and medium for controlling generating type vehicle

Similar Documents

Publication Publication Date Title
Ye et al. Prediction-based eco-approach and departure at signalized intersections with speed forecasting on preceding vehicles
CN114312830B (en) Intelligent vehicle coupling decision model and method considering dangerous driving conditions
Huang et al. Toward safe and personalized autonomous driving: Decision-making and motion control with DPF and CDT techniques
CN114407931B (en) Safe driving decision method for automatic driving operation vehicle of high class person
CN115257819A (en) Decision-making method for safe driving of large-scale commercial vehicle in urban low-speed environment
Kim et al. Modeling of driver's collision avoidance maneuver based on controller switching model
CN114270360A (en) Yield behavior modeling and prediction
CN110956851B (en) Intelligent networking automobile cooperative scheduling lane changing method
Hang et al. Driving conflict resolution of autonomous vehicles at unsignalized intersections: A differential game approach
CN116134292A (en) Tool for performance testing and/or training an autonomous vehicle planner
Sun et al. DDPG-based decision-making strategy of adaptive cruising for heavy vehicles considering stability
Yuan et al. Multi-reward architecture based reinforcement learning for highway driving policies
Yu et al. Modeling and simulation of overtaking behavior involving environment
CN110320916A (en) Consider the autonomous driving vehicle method for planning track and system of occupant's impression
CN115257789A (en) Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment
Ashwin et al. Deep reinforcement learning for autonomous vehicles: lane keep and overtaking scenarios with collision avoidance
CN112784867A (en) Training deep neural networks using synthetic images
CN114802306A (en) Intelligent vehicle integrated decision-making system based on man-machine co-driving concept
Duan et al. Encoding distributional soft actor-critic for autonomous driving in multi-lane scenarios
Siboo et al. An empirical study of ddpg and ppo-based reinforcement learning algorithms for autonomous driving
Yuan et al. Evolutionary decision-making and planning for autonomous driving based on safe and rational exploration and exploitation
Dubey et al. Autonomous braking and throttle system: A deep reinforcement learning approach for naturalistic driving
CN113353102B (en) Unprotected left-turn driving control method based on deep reinforcement learning
Wu et al. An integrated decision and motion planning framework for automated driving on highway
Bahram Interactive maneuver prediction and planning for highly automated driving functions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination