CN116853245A - PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control - Google Patents
PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control Download PDFInfo
- Publication number
- CN116853245A CN116853245A CN202310945210.7A CN202310945210A CN116853245A CN 116853245 A CN116853245 A CN 116853245A CN 202310945210 A CN202310945210 A CN 202310945210A CN 116853245 A CN116853245 A CN 116853245A
- Authority
- CN
- China
- Prior art keywords
- vehicle
- vehicles
- lane
- driving
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 241000156302 Porcine hemagglutinating encephalomyelitis virus Species 0.000 title claims abstract description 17
- 238000005457 optimization Methods 0.000 title claims abstract description 15
- 230000003044 adaptive effect Effects 0.000 title claims description 19
- 230000001133 acceleration Effects 0.000 claims abstract description 40
- 230000006870 function Effects 0.000 claims abstract description 37
- 230000008859 change Effects 0.000 claims abstract description 27
- 230000002787 reinforcement Effects 0.000 claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 13
- 238000013528 artificial neural network Methods 0.000 claims description 36
- 230000009471 action Effects 0.000 claims description 30
- 239000000446 fuel Substances 0.000 claims description 30
- 239000003795 chemical substances by application Substances 0.000 claims description 18
- 238000007726 management method Methods 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 9
- 230000003993 interaction Effects 0.000 claims description 9
- 238000005265 energy consumption Methods 0.000 claims description 8
- 238000013459 approach Methods 0.000 claims description 7
- 238000011217 control strategy Methods 0.000 claims description 7
- 230000000694 effects Effects 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 claims description 6
- 230000006872 improvement Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 230000008521 reorganization Effects 0.000 claims description 6
- 230000008901 benefit Effects 0.000 claims description 4
- 230000006855 networking Effects 0.000 claims description 4
- 208000019901 Anxiety disease Diseases 0.000 claims description 3
- 206010021033 Hypomenorrhoea Diseases 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 claims description 3
- 230000036506 anxiety Effects 0.000 claims description 3
- 230000006399 behavior Effects 0.000 claims description 3
- 230000033228 biological regulation Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 238000005315 distribution function Methods 0.000 claims description 3
- 230000005611 electricity Effects 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000033001 locomotion Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000012847 principal component analysis method Methods 0.000 claims description 3
- 230000006798 recombination Effects 0.000 claims description 3
- 238000005215 recombination Methods 0.000 claims description 3
- 238000011084 recovery Methods 0.000 claims description 3
- 230000001172 regenerating effect Effects 0.000 claims description 3
- 230000001105 regulatory effect Effects 0.000 claims description 3
- 230000008844 regulatory mechanism Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 230000035699 permeability Effects 0.000 description 1
- 239000003208 petroleum Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W30/00—Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
- B60W30/14—Adaptive cruise control
- B60W30/16—Control of distance between vehicles, e.g. keeping a distance to preceding vehicle
- B60W30/165—Automatically following the path of a preceding lead vehicle, e.g. "electronic tow-bar"
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W20/00—Control systems specially adapted for hybrid vehicles
Landscapes
- Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Control Of Driving Devices And Active Controlling Of Vehicle (AREA)
Abstract
The invention provides a PHEV hybrid vehicle group optimization control method for queue management and self-adaptive cruise control, which comprises the following steps of; step one, determining a number type driving style by clustering driving data of various working conditions through a plurality of characteristic parameters; step two, determining a lane change safety area of the vehicle according to the acquired vehicle state information and surrounding environment information; step three, in the self-adaptive cruise control, vehicle spacing, vehicle speed and acceleration are restrained based on safety and comfort, and braking energy is reasonably recovered during braking; when approaching the intersection, the queues are recombined again, so that the vehicle groups pass through orderly; step four, training sample data in the environment based on a Soft Actor-Critic reinforcement learning algorithm, and continuously and iteratively updating according to a set loss function to finally obtain an optimal strategy capable of enabling a vehicle group to reorganize a queue on roads with no more than three lanes according to driving styles of different drivers; the invention can reasonably plan the vehicle queues according to the driving intention of the driver, so that the vehicle queues can efficiently run.
Description
Technical Field
The invention relates to the field of intelligent vehicle driving, in particular to a PHEV hybrid vehicle group optimization control method for queue management and self-adaptive cruise control.
Background
The transportation sector is one of the main sources of emissions and fuel consumption. With the rapid growth of petroleum consumption, the scarcity of fossil energy sources is becoming more and more serious. It is important to reduce fuel consumption and emissions, especially in urban areas. Alternative fuel automobiles and networking and autopilot technology are expected to be two effective approaches. Alternative fuel vehicles such as hybrid vehicles and electric vehicles are considered as a further direction of vehicle development. The advanced intelligent driving technology not only can help to save fuel, but also can assist driving, and improves driving safety and comfort.
Most of the current efforts have focused on tailoring fuel economy control strategies for one vehicle without regard to the impact on other vehicles. The overall automated vehicle regime is expected to evolve, so that in the upcoming transition period, the automated driving vehicle will share the same network with the caravans at different permeabilities, which will be a dynamic mixed traffic environment. Partial implementations have different effects on traffic flow characteristics than full implementations of autopilot systems. Efficient operation and control of intelligent networked and common vehicle mixed traffic is a complex challenge.
In view of the above, the present invention aims to provide a PHEV hybrid vehicle group optimization control method based on queue management and adaptive cruise control, which can reasonably plan a vehicle queue according to the driving intention of a driver, so that a fleet can efficiently run.
Disclosure of Invention
The PHEV hybrid vehicle group optimization control method for queue management and self-adaptive cruise control provided by the invention can reasonably plan vehicle queues according to the driving intention of a driver, so that a vehicle team can efficiently run.
The invention adopts the following technical scheme.
The PHEV hybrid vehicle group optimization control method for queue management and self-adaptive cruise control comprises the following steps that the hybrid vehicle group is a vehicle group consisting of intelligent network-connected vehicles and conventional autonomous vehicles for manual driving;
step one, determining a number type driving style by clustering driving data of various working conditions through a plurality of characteristic parameters;
step two, determining a lane change safety area of the vehicle according to the acquired vehicle state information and surrounding environment information;
step three, in the self-adaptive cruise control, vehicle spacing, vehicle speed and acceleration are restrained based on safety and comfort, and braking energy is reasonably recovered during braking; when approaching the intersection, the queues are recombined again, so that the vehicle groups pass through orderly;
And fourthly, training sample data in the environment based on a Soft Actor-Critic reinforcement learning algorithm, and continuously and iteratively updating according to a set loss function to finally obtain an optimal strategy capable of enabling a vehicle group to reorganize the queues according to driving styles of different drivers on roads with no more than three lanes.
The vehicles at the head of the vehicle queue of the vehicle group are intelligent networking vehicles, and the length of the queue is limited to not more than 8 vehicles;
in the first step, the specific classification method of different driving styles is to reduce the dimension of the characteristic parameters from the historical data of a large number of manual driving vehicles by a principal component analysis method, and then three classified driving styles are obtained by adopting K-mean algorithm clustering, wherein the different driving styles are represented in the following characteristic parameters: average longitudinal speedMaximum longitudinal vehicle speed v max Minimum longitudinal vehicle speed v min Longitudinal vehicle speed standard deviation epsilon v Longitudinal acceleration average->Maximum value of longitudinal acceleration a xmax Minimum value of longitudinal acceleration a xmin Standard deviation sigma of longitudinal acceleration x Lateral acceleration average->Maximum value of transverse acceleration a ymax Minimum value of lateral acceleration a ymin Standard deviation sigma of transverse acceleration y Headway THW, time to collision parameter TTC: minimum head space DHW min The method comprises the steps of carrying out a first treatment on the surface of the Longitudinal vehicle speed standard deviation epsilon v The calculation is as follows
The headway THW is the time required for the head of the main vehicle to reach the tail position of the front vehicle under the current vehicle speed running, the collision time parameter TTC is the time required for the collision between the main vehicle and the front vehicle under the current state running, and the calculation formula is as follows:
v rel =v p -υ f a formula IV;
wherein ,drel V is the relative distance between two vehicles p Is the speed of the main vehicle, v f The vehicle speed is the front vehicle speed;
the distance between the two vehicles is the larger the distance between the main vehicle head and the front vehicle head of the same lane, and the smaller the collision accident possibility of the two vehicles is; conversely, the greater the possibility of accident of the two vehicles; the smaller the head space is, the more aggressive the driver drives are reflected, and therefore, the minimum head space is selected as a driving style characteristic parameter index.
Step two, the driving road of the vehicle queue is three lanes, lane changing operation is involved in the vehicle group recombination process, surrounding vehicles and environment information are acquired by the network-connected vehicles through V2V communication and vehicle-mounted sensors, and the information comprises vehicle speed, vehicle body length, torque, power, vehicle distance, vehicle position information and intersection signal lamp phase;
the method for detecting the lane change safety area used in the lane change operation is as follows:
The method is based on vehicle kinematics, wherein when the vehicle is at a specific position, the vertex coordinate [ x ] of the right front end direction of the vehicle p1 (t),y p1 (t)]At time t (t m <t<t n ) Expressed as:
wherein :υp (t)、θ p (t) the host vehicle speed and yaw angle of the vehicle, respectively; t is t m T is the initial time point of lane change of the main vehicle n Then the time point is finished;
similarly, the left front vertex [ x ] of the main vehicle at time t p2 (t),y p2 (t)]Left rear vertex [ x p3 (t),y p3 (t)]Right rear vertex [ x p4 (t),y p4 (t)]The coordinates are respectively:
wherein: a is the length of the vehicle; b is the vehicle width;
in the course of vehicle lane changing, the host vehicle can obtain the reasonable safe area of lane changing of vehicle according to the condition analysis that the vehicle does not collide with surrounding vehicles;
if the main vehicle changes lanes leftwards at a certain time point in the future, the condition that the main vehicle collides with the front vehicle is that the speed of the main vehicle is larger than that of the front vehicle, the vehicle distance is gradually shortened, and the situation that the right front vertex of the main vehicle collides with the left rear vertex of the front vehicle occurs;
let the collision point be S 1 The collision time point isImpact point S 1 Coordinates of (c)Expressed as:
in the formula :vf (t) is the speed of the front vehicle of the current lane at the moment t, D 1 The distance between the main vehicle and the front vehicle of the lane;
if collision occurs with the rear vehicle of the lane change target lane, the speed of the main vehicle is smaller than that of the rear vehicle, the distance between the two vehicles is reduced along with time, and the lane change is performed at the moment, and the left Fang Ding point of the main vehicle collides with the rear vehicle of the target lane; let the collision point be S 2 The collision time point isThe collision point S is based on the structural dimensions of the vehicle and the theory of autonomous vehicle travel 2 The coordinates of (c) are expressed as:
in the formula :vr (t) the speed of the vehicle behind the target lane at the time t, D 1 The relative distance between the autonomous vehicle and the rear vehicle of the target lane is l, and the lane width is l;
according to the collision pointS 1 、S 2 The point coordinates determine a lane change security domain of the vehicle to avoid collisions; the right lane of the vehicle is changed, and the same is true; the intelligent network-connected vehicle judges whether the security domain meets the channel change condition according to the detected surrounding vehicle information.
In the third step, each intelligent network vehicle is used as an intelligent agent, n network vehicles in the vehicle group are regarded as n intelligent agents, the number n of network vehicles in the vehicle group is limited within the range allowed by calculation, and the n parallel intelligent agents control the n network vehicles to realize interaction; n agents share the same neural network and parameters; the improvement of the driving state of any networked vehicle is helpful to the rewarding gain of the vehicle group through the parameter sharing structure of the neural network algorithm;
the intelligent network-connected vehicles in the vehicle queue interact with vehicles in the adjacent vehicle queue, meanwhile, the intelligent network-connected vehicles in the vehicle queue and the manual driving vehicles also keep interaction cooperation, and the interaction cooperation method comprises the following steps:
In the method A, when the vehicle is on a common lane, the vehicle queues are cooperatively controlled by self-adaptive cruising and regenerative braking, so that a reasonable vehicle distance is kept between vehicles, namely, a safe longitudinal gap is required to be continuously kept between two continuous vehicles; the deviation from the safe distance, namely the distance error is as small as possible to reduce collision risk, and the advantages of low oil consumption and high traffic throughput of the vehicle queue are exerted, so that the vehicle distance between the intelligent network vehicle and the common vehicle is required to be larger for compatibility with the running randomness of the common vehicle, and when the vehicle brakes, part of braking energy is recovered by the motor;
and B, when the vehicles approach the intersection with the signal lamp, splitting and recombining all vehicles in the vehicle queue to reduce energy consumption and running time, so that part of the vehicles in the queue sequentially pass through the intersection before the green light signal is cut off, and the rest vehicles wait before stopping the line.
In the fourth step, a Soft Actor-Critic reinforcement learning algorithm, namely a SAC algorithm, is an Off-policy model-free non-strategy deep reinforcement learning algorithm combining maximized entropy learning with an Actor-Critic framework; the learning content of the SAC reinforcement learning algorithm comprises a state s, an action a, a reward r and an environment model rho; the states comprise fuel consumption, battery state of charge, speed, acceleration, yaw angle and distance of the vehicle, the actions are torque and steering angle, and the rewards are fuel consumption, running time and comfort, and are self-adaptive cruise cost functions;
In the fourth step, the SAC algorithm trains and learns sample data from the environment and continuously updates and optimizes the sample data to finally obtain an optimal strategy, so that intelligent network vehicles in the mixed vehicle group can be distributed in different lanes according to the driving style of a driver, and the mixed vehicles in the same lane form queues with different lengths;
the driving styles of different vehicles are classified into aggressive, robust and discreet types; when the vehicle train is driving on a road and lanes are allocated to different vehicles, the aggressive style vehicles tend to be arranged in the leftmost lane, the robust vehicles tend to be in the middle lane, and the discreet vehicles in the rightmost lane; the degree of gain for the vehicle group; the SAC algorithm adjusts the final allocation result of the vehicle driving lane depending on the degree of gain to the vehicle group.
The SAC algorithm consists of 1 actor neural network and 4 critic neural networks; the input of the actor neural network is a state, and the output is an action probability distribution parameter P (x);
the 4 critic neural networks are divided into state value estimation v critic and v critic target networks, and the action-state value estimation Q1 and Q2 critic neural networks; the input of the Q critic neural network is a state, and the output is the value of the state; wherein the output of the V critic neural network is V(s), representing an estimate of state value; the output of the Q critic neural network is Q (s, a), representing an estimate of the action-state versus value; n agents share the same neural network and parameters; the improvement of the driving state of any networked vehicle is helpful to the rewarding gain of the vehicle group through the parameter sharing structure of the neural network algorithm;
Entropy in the algorithm is defined as:
wherein x follows the probability density function P (x) distribution; the introduction of the maximum entropy enables the output of the actions to be more dispersed, and excessive concentration of the output actions is avoided, so that the exploration capacity of an algorithm, the learning capacity of a new task and the stability are improved; the optimal strategy in the SAC algorithm framework is expressed as:
pi represents the policy adopted by the agent, a is an action, S represents a state, and r represents a reward; alpha refers to a temperature parameter, and determines the relative importance of the rewarding entropy, so that the randomness of the optimal strategy is ensured;
the state space S of SAC is defined as:
wherein ,for driving style, SOC is battery state of charge, v p Is the vehicle speed, a p Is the acceleration of the vehicle, t dri Refers to the driving time, theta is the yaw angle, d des Refers to the distance from the front vehicle;
the action space a is defined as: a= { T p ,δ p Formula thirteen;
wherein Tp For the torque, delta of the host vehicle p The steering wheel angle of the vehicle;
the bonus function is defined as:
R={ω 1 ·m fuel c fuel +ω 2 ·P batt c elec +ω 3 ·(t dri -t ref )+ω 4 ·P rec +ω 5 ·J min equation fourteen;
ω 1 、ω 2 、ω 3 、ω 4 、ω 5 is a proportionality coefficient, m fuel Indicating the fuel consumption of the current intelligent network-connected vehicle, c fuel Is the price of fuel, P batt Finger motor power, c clec Price of electricity, t ref Is the reference travel time, P rec Refer to the braking energy recovery power, J min Is an adaptive cruise comprehensive cost function;
The vehicle driving action is independently executed by each network-connected vehicle, and the corresponding rewards are jointly optimized by collecting the control experience of the network-connected vehicle into a concentrated playback buffer zone;
for a specific state s t And action a t Soft value function Q of algorithm soft (s t ,a t ) The expression is as follows:
wherein, gamma E [0,1] is a scale factor;
to avoid overestimation in maximizing Q and further overestimation in calculating targets using target networks, the SAC algorithm introduces two online networks Q 1 and Q2 The parameters are e respectively 1 and e2 And two target networks v and vtarget, the parameters are respectively and />Selecting the minimum function value output by the target network as the target value of the target frame; updating soft-valued network parameters by minimizing a loss function as follows:
representing the strategy by using a Gaussian distribution in a random strategy, namely mapping states into a mean value and a variance of the Gaussian distribution through parameters, and obtaining actions by sampling from the Gaussian distribution; if in state S t As input, outputting a gaussian distribution with mean and standard deviation; then motion a is obtained using a re-parameterization technique t The formula is:
in the formula ,εt For noise signals sampled from a standard normal distribution, Is the mean and standard deviation of the gaussian distribution, where μ (s t) and σ(st ) Respectively, mean and standard deviation of Gaussian distribution, τ t Is a noise signal sampled from a standard normal distribution;
the relationship of the policy function to the soft function is expressed as:
updating policy network parameters by minimizing Kullback-Leibler divergence; the smaller the Kullback-Leibler divergence, the smaller the difference between rewards corresponding to output behavior, and the better the convergence effect of the strategy; the update rules of the policy network are expressed as:
nineteenth formula;
wherein Z(st ) The distribution function is used for normalizing the distribution;
finally, the strategy network parameters are updated according to the gradient descent method, and are expressed as:
the temperature coefficient represents the importance of the algorithm on entropy, and the temperature coefficient is regulated to be important for the training effect of the SAC algorithm; the optimal temperature coefficient is different according to the reinforcement learning task and the training period; for this use is made of a temperature coefficient automatic regulation mechanism; under this mechanism, a constrained optimization problem is constructed, and the optimal temperature coefficient of each step is obtained by minimizing the objective function, which is expressed as:
wherein H0 Is a predefined minimum policy entropy threshold.
Training and learning the related vehicle group queue reorganization, self-adaptive cruising and high-efficiency traffic data of the intersection by using a SAC reinforcement learning algorithm to obtain an optimal control strategy; the SAC considers the states of the networked vehicles in the vehicle group and finds an optimal self-adaptive cruise control strategy, and the optimal strategy is fed back to the torque and the corner of each networked vehicle corresponding to the self-adaptive cruise control to control the running track of the vehicle; in the self-adaptive cruise control in the step three, the required vehicle distance is influenced by the driving style of a driver, the road commute efficiency and the vehicle safety, and the distance between the network-connected vehicle and the manually-driven vehicle is larger than the vehicle distance between the network-connected vehicles by considering the uncertainty of the driving intention of the manually-driven vehicle; if the distance between vehicles is too narrow, the commute efficiency is improved, but anxiety of drivers may cause collision accidents; in contrast, a larger vehicle distance is a guarantee of vehicle safety, but road commute efficiency is poor, and side vehicles are easy to insert;
The constant time headway CTH was used for the vehicle spacing algorithm as follows:
d des =τ h v h +d 0 formula twenty-two;
wherein τh Is the nominal headway, d 0 Is a safe stopping distance;
in terms of car following safety, there are the following constraint formulas:
d min <d<d max
Δd=d—d des
Δv min <Δv<Δv max
Δv=v p -v f
where d is the actual distance between the autonomous vehicle and the preceding vehicle, d min and dmax Is the minimum and maximum vehicle distance; deltav is the speed difference between the autonomous vehicle and the preceding vehicle, deltav min and Δvmax Is the minimum and maximum speed difference;
the comfort constraint formula is as follows; Δa=a p -a f ;a f The acceleration is the front vehicle acceleration;
the adaptive cruise integrated cost function is:
J min =ω 6 Δd 2 +ω 7 Δv 2 +ω 8 Δa 2 the formula twenty-third.
When the brake force distribution is limited in accordance with the ECE regulations, the following brake force distribution strategy is adopted;
when the braking strength z is less than z 1 When the braking force is provided by the front axle only; when the braking strength z 1 <z<z 2 When the front and rear axle braking forces are distributed along the ECE stipulated line; when the braking strength z 2 <z<z 3 When the front axle braking force is unchanged, the rear axle braking force is increased; when the braking strength z 3 When the braking force is less than z, the motor stops braking, and the braking force of the front shaft and the rear shaft is distributed along the beta line; during the whole braking process, if the motor braking force is insufficient, the hydraulic braking force compensates the loss of the total braking force; the brake force distribution is formulated as follows:
the boundary for z is calculated as follows:
wherein Fbf Indicating front axle braking force, F br For rear axle braking force, F b For total required braking force, L is the total wheelbase, k is the rear wheelbase, h g Is the height of the mass center, T bmax Is the maximum value of motor braking moment, beta is the braking force distribution coefficient,r is a correction coefficient of the rotating mass w For the radius of the wheel, i t Is the vehicle transmission ratio, η is the transmission efficiency.
And fourthly, when the vehicle group approaches an intersection with a signal lamp, the optimal strategy adjusts the driving torque, the rotation angle and the braking force of each network-connected vehicle according to the timing of traffic light signals, and queue reorganization and queue length planning are carried out, so that part of vehicles form queues to pass through the intersection in a green light period, and the rest vehicles wait before stopping the line, thereby reducing the energy consumption of the whole vehicle group and realizing better economy and passing efficiency.
Compared with the prior art, the invention has the following advantages: the SAC reinforcement learning algorithm fusing driving styles adopted by the invention has stronger stability compared with the common reinforcement learning algorithm. Meanwhile, the method can combine different vehicle queues according to different driving styles, realize self-adaptive cruising, recover redundant braking energy, and control the vehicle groups to orderly pass through intersections with signal lamps, so that the vehicle groups can save energy consumption and improve the overall traffic efficiency.
The method adopted by the invention can reasonably split and reorganize the vehicle queues according to different driving styles, aims at reducing energy consumption and driving time and improving driving safety and comfort, and optimizes traffic running efficiency.
Drawings
The invention is described in further detail below with reference to the attached drawings and detailed description:
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a schematic diagram of the intelligent network vehicle lane change collision critical point and safety area planning (intelligent network vehicle lane change collision critical point and safety area planning /) according to the present invention;
FIG. 3 is a flow chart of a queue combination based on the SoftActor-Critic algorithm of the present invention (full dark gray for intelligent networked vehicles, half gray for manual driven vehicles).
Detailed Description
As shown in the figure, the PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control, wherein the hybrid vehicle group is a vehicle group consisting of intelligent network-connected vehicles and conventional autonomous vehicles for manual driving, and the control method comprises the following steps of; the following steps are only used as distinction between the steps, and are not strictly executed according to the step sequence in practice;
step one, determining a number type driving style by clustering driving data of various working conditions through a plurality of characteristic parameters;
Step two, determining a lane change safety area of the vehicle according to the acquired vehicle state information and surrounding environment information;
step three, in the self-adaptive cruise control, vehicle spacing, vehicle speed and acceleration are restrained based on safety and comfort, and braking energy is reasonably recovered during braking; when approaching the intersection, the queues are recombined again, so that the vehicle groups pass through orderly;
and fourthly, training sample data in the environment based on a software-Critic reinforcement learning algorithm, and continuously and iteratively updating according to a set loss function to finally obtain an optimal strategy capable of enabling a vehicle group to reorganize the queues according to driving styles of different drivers on roads with no more than three lanes.
The vehicles at the head of the vehicle queue of the vehicle group are intelligent networking vehicles, and the length of the queue is limited to not more than 8 vehicles;
in the first step, the specific classification method of different driving styles is to manually drive vehicles from a large number of vehiclesThe history data is subjected to feature parameter dimension reduction by a principal component analysis method, and then three classified driving styles are obtained by adopting K-mean algorithm clustering, wherein different driving styles are reflected in the following feature parameters: average longitudinal speedMaximum longitudinal vehicle speed v max Minimum longitudinal vehicle speed v min Longitudinal vehicle speed standard deviation epsilon v Longitudinal acceleration average->Maximum value of longitudinal acceleration a xmax Minimum value of longitudinal acceleration a xmin Standard deviation sigma of longitudinal acceleration x Lateral acceleration average->Maximum value of transverse acceleration a ymax Minimum value of lateral acceleration a ymin Standard deviation sigma of transverse acceleration y Headway THW, time to collision parameter TTC: minimum head space DHW min The method comprises the steps of carrying out a first treatment on the surface of the Longitudinal vehicle speed standard deviation epsilon v The calculation is as follows
The headway THW is the time required for the head of the main vehicle to reach the tail position of the front vehicle under the current vehicle speed running, the collision time parameter TTC is the time required for the collision between the main vehicle and the front vehicle under the current state running, and the calculation formula is as follows:
v rel =v p -v f equation four;
wherein ,drel V is the relative distance between two vehicles p The speed v of the main vehicle f The vehicle speed is the front vehicle speed;
the distance between the two vehicles is the larger the distance between the main vehicle head and the front vehicle head of the same lane, and the smaller the collision accident possibility of the two vehicles is; conversely, the greater the possibility of accident of the two vehicles; the smaller the head space is, the more aggressive the driver drives are reflected, and therefore, the minimum head space is selected as a driving style characteristic parameter index.
Step two, the driving road of the vehicle queue is three lanes, lane changing operation is involved in the vehicle group recombination process, surrounding vehicles and environment information are acquired by the network-connected vehicles through V2V communication and vehicle-mounted sensors, and the information comprises vehicle speed, vehicle body length, torque, power, vehicle distance, vehicle position information and intersection signal lamp phase;
The method for detecting the lane change safety area used in the lane change operation is as follows:
as shown in FIG. 2, the method is based on the kinematics of the vehicle, in which the vertex coordinates [ x ] of the right front end direction of the vehicle are when the vehicle is at a certain position p1 (t),y p1 (t)]At time t (t m <t<t n ) Expressed as:
wherein :vp (t)、θ p (t) the host vehicle speed and yaw angle of the vehicle, respectively; t is t m T is the initial time point of lane change of the main vehicle n Then the time point is finished;
similarly, the left front vertex [ x ] of the main vehicle at time t p2 (t),y p2 (t)]Left rear vertex [ x p3 (t),y p3 (t)]Right rear vertex [ x p4 (t),y p4 (t)]The coordinates are respectively:
wherein: a is the length of the vehicle; b is the vehicle width;
in the course of vehicle lane changing, the host vehicle can obtain the reasonable safe area of lane changing of vehicle according to the condition analysis that the vehicle does not collide with surrounding vehicles;
if the main vehicle changes lanes leftwards at a certain time point in the future, the condition that the main vehicle collides with the front vehicle is that the speed of the main vehicle is larger than that of the front vehicle, the vehicle distance is gradually shortened, and the situation that the right front vertex of the main vehicle collides with the left rear vertex of the front vehicle occurs;
let the collision point be S 1 The collision time point isImpact point S 1 Coordinates of (c)Expressed as:
in the formula :vf (t) is the speed of the front vehicle of the current lane at the moment t, D 1 The distance between the main vehicle and the front vehicle of the lane; as shown in fig. 2, if the vehicle collides with the rear vehicle of the lane change target lane, the speed of the main vehicle needs to be smaller than that of the rear vehicle, the distance between the two vehicles is reduced with time, and the lane change is performed at this time, so that the left Fang Ding point of the main vehicle collides with the rear vehicle of the target lane; let the collision point be S 2 The collision time point isThen according to the vehicle structureSize and autonomous vehicle kinematics theory, collision point S 2 The coordinates of (c) are expressed as:
in the formula :vr (t) the speed of the vehicle behind the target lane at the time t, D 1 The relative distance between the autonomous vehicle and the rear vehicle of the target lane is l, and the lane width is l;
according to the collision point S 1 、S 2 The point coordinates determine a lane change security domain of the vehicle to avoid collisions; the right lane of the vehicle is changed, and the same is true; the intelligent network-connected vehicle judges whether the security domain meets the channel change condition according to the detected surrounding vehicle information.
In the third step, each intelligent network vehicle is used as an intelligent agent, n network vehicles in the vehicle group are regarded as n intelligent agents, the number n of network vehicles in the vehicle group is limited within the range allowed by calculation, and the n parallel intelligent agents control the n network vehicles to realize interaction; n agents share the same neural network and parameters; the improvement of the driving state of any networked vehicle is helpful to the rewarding gain of the vehicle group through the parameter sharing structure of the neural network algorithm;
the intelligent network-connected vehicles in the vehicle queue interact with vehicles in the adjacent vehicle queue, meanwhile, the intelligent network-connected vehicles in the vehicle queue and the manual driving vehicles also keep interaction cooperation, and the interaction cooperation method comprises the following steps:
In the method A, when the vehicle is on a common lane, the vehicle queues are cooperatively controlled by self-adaptive cruising and regenerative braking, so that a reasonable vehicle distance is kept between vehicles, namely, a safe longitudinal gap is required to be continuously kept between two continuous vehicles; the deviation from the safe distance, namely the distance error is as small as possible to reduce collision risk, and the advantages of low oil consumption and high traffic throughput of the vehicle queue are exerted, so that the vehicle distance between the intelligent network vehicle and the common vehicle is required to be larger for compatibility with the running randomness of the common vehicle, and when the vehicle brakes, part of braking energy is recovered by the motor;
and B, when the vehicles approach the intersection with the signal lamp, splitting and recombining all vehicles in the vehicle queue to reduce energy consumption and running time, so that part of the vehicles in the queue sequentially pass through the intersection before the green light signal is cut off, and the rest vehicles wait before stopping the line.
In the fourth step, as shown in fig. 3, the software-Critic reinforcement learning algorithm, namely the SAC algorithm, is an Off-policy model-free non-strategy deep reinforcement learning algorithm combining maximized entropy learning with an Actor-Critic framework; the learning content of the SAC reinforcement learning algorithm comprises a state s, an action a, a reward r and an environment model rho; the states comprise fuel consumption, battery state of charge, speed, acceleration, yaw angle and distance of the vehicle, the actions are torque and steering angle, and the rewards are fuel consumption, running time and comfort, and are self-adaptive cruise cost functions;
In the fourth step, the SAC algorithm trains and learns sample data from the environment and continuously updates and optimizes the sample data to finally obtain an optimal strategy, so that intelligent network vehicles in the mixed vehicle group can be distributed in different lanes according to the driving style of a driver, and the mixed vehicles in the same lane form queues with different lengths;
the driving styles of different vehicles are classified into aggressive, robust and discreet types; when the vehicle train is driving on a road and lanes are allocated to different vehicles, the aggressive style vehicles tend to be arranged in the leftmost lane, the robust vehicles tend to be in the middle lane, and the discreet vehicles in the rightmost lane; the degree of gain for the vehicle group; the SAC algorithm adjusts the final allocation result of the vehicle driving lane depending on the degree of gain to the vehicle group.
The SAC algorithm consists of 1 actor neural network and 4 critic neural networks; the input of the actor neural network is a state, and the output is an action probability distribution parameter P (x);
the 4 critic neural networks are divided into state value estimation v critic and v critic target networks, and the action-state value estimation Q1 and Q2 critic neural networks; the input of the Q critic neural network is a state, and the output is the value of the state; wherein the output of the V critic neural network is V(s), representing an estimate of state value; the output of the Q critic neural network is Q (s, a), representing an estimate of the action-state versus value; n agents share the same neural network and parameters; the improvement of the driving state of any networked vehicle is helpful to the rewarding gain of the vehicle group through the parameter sharing structure of the neural network algorithm;
Entropy in the algorithm is defined as:
wherein x follows the probability density function P (x) distribution; the introduction of the maximum entropy enables the output of the actions to be more dispersed, and excessive concentration of the output actions is avoided, so that the exploration capacity of an algorithm, the learning capacity of a new task and the stability are improved; the optimal strategy in the SAC algorithm framework is expressed as:
pi represents the policy adopted by the agent, a is an action, S represents a state, and r represents a reward; alpha refers to a temperature parameter, and determines the relative importance of the rewarding entropy, so that the randomness of the optimal strategy is ensured;
the state space S of SAC is defined as:
wherein ,for driving style, soc is battery state of charge, v p Is the vehicle speed, a p Is the acceleration of the vehicle, t dri Refers to the driving time, theta is the yaw angle, d des Refers to the distance from the front vehicle; />
The action space a is defined as: a= { T p ,δ p Formula thirteen;
wherein Tp For the torque, delta of the host vehicle p The steering wheel angle of the vehicle;
the bonus function is defined as:
R={ω 1 ·m fuel c fuel +ω 2 ·P batt c elec +ω 3 ·(t dri -t ref )+ω 4 ·P rec +ω 5 ·J min equation fourteen;
ω 1 、ω 2 、ω 3 、ω 4 、ω 5 is a proportionality coefficient, m fuel Indicating the fuel consumption of the current intelligent network-connected vehicle, c fuel Is the price of fuel, P batt Finger motor power, c elec Price of electricity, t ref Is the reference travel time, P rec Refer to the braking energy recovery power, J min Is an adaptive cruise comprehensive cost function;
The vehicle driving action is independently executed by each network-connected vehicle, and the corresponding rewards are jointly optimized by collecting the control experience of the network-connected vehicle into a concentrated playback buffer zone;
for a specific state s t And action a t Soft value function Q of algorithm sft (s t ,a t ) The expression is as follows:
wherein, gamma E [0,1] is a scale factor;
to avoid overestimation in maximizing Q and further overestimation in calculating targets using target networks, the SAC algorithm introduces two online networks Q 1 and Q2 The parameters are e respectively 1 and e2 And two target networks and a target, the parameters are respectively and />Selecting the minimum function value output by the target network as the target value of the target frame; updating soft-valued network parameters by minimizing a loss function as follows:
representing the strategy by using a Gaussian distribution in a random strategy, namely mapping states into a mean value and a variance of the Gaussian distribution through parameters, and obtaining actions by sampling from the Gaussian distribution; if in state S t As input, outputting a gaussian distribution with mean and standard deviation; then motion a is obtained using a re-parameterization technique t The formula is:
in the formula ,εt For noise signals sampled from a standard normal distribution, Is the mean and standard deviation of the gaussian distribution, where μ (s t) and σ(st ) Respectively, mean and standard deviation of Gaussian distribution, τ t Is a noise signal sampled from a standard normal distribution;
the relationship of the policy function to the soft function is expressed as:
updating policy network parameters by minimizing Kullback-Leibler divergence; the smaller the Kullback-Leibler divergence, the smaller the difference between rewards corresponding to output behavior, and the better the convergence effect of the strategy; the update rules of the policy network are expressed as:
wherein Z(st ) The distribution function is used for normalizing the distribution;
finally, the strategy network parameters are updated according to the gradient descent method, and are expressed as:
the temperature coefficient represents the importance of the algorithm on entropy, and the temperature coefficient is regulated to be important for the training effect of the SAC algorithm; the optimal temperature coefficient is different according to the reinforcement learning task and the training period; for this use is made of a temperature coefficient automatic regulation mechanism; under this mechanism, a constrained optimization problem is constructed, and the optimal temperature coefficient of each step is obtained by minimizing the objective function, which is expressed as:
wherein H0 Is a predefined minimum policy entropy threshold.
Training and learning the related vehicle group queue reorganization, self-adaptive cruising and high-efficiency traffic data of the intersection by using a SAC reinforcement learning algorithm to obtain an optimal control strategy; the SAC considers the states of the networked vehicles in the vehicle group and finds an optimal self-adaptive cruise control strategy, and the optimal strategy is fed back to the torque and the corner of each networked vehicle corresponding to the self-adaptive cruise control to control the running track of the vehicle;
In the self-adaptive cruise control in the step three, the required vehicle distance is influenced by the driving style of a driver, the road commute efficiency and the vehicle safety, and the distance between the network-connected vehicle and the manually-driven vehicle is larger than the vehicle distance between the network-connected vehicles by considering the uncertainty of the driving intention of the manually-driven vehicle; if the distance between vehicles is too narrow, the commute efficiency is improved, but anxiety of drivers may cause collision accidents; in contrast, a larger vehicle distance is a guarantee of vehicle safety, but road commute efficiency is poor, and side vehicles are easy to insert;
the constant time headway CTH was used for the vehicle spacing algorithm as follows:
d des =τ h v h +d 0 formula twenty-two;
wherein τh Is the nominal headway, d 0 Is a safe stopping distance;
in terms of car following safety, there are the following constraint formulas:
d min <d<d max
Δd=d-d des
Δv min <Δv<Δv max
Δv=v p -v f
where d is the actual distance between the autonomous vehicle and the preceding vehicle, d min and dmax Is the minimum and maximum vehicle distance; deltav is the speed difference between the autonomous vehicle and the preceding vehicle, deltav min and Δvmax Is the minimum and maximum speed difference;
the comfort constraint formula is as follows; Δa=a p -a f ;a f The acceleration is the front vehicle acceleration;
the adaptive cruise integrated cost function is:
J min =ω 6 Δd 2 +ω 7 Δv 2 +ω 8 Δa 2 the formula twenty-third.
When the brake force distribution is limited in accordance with the ECE regulations, the following brake force distribution strategy is adopted;
When the braking strength z is less than z 1 When the braking force is provided by the front axle only; when the braking strength z 1 <z<z 2 When the front and rear axle braking forces are distributed along the ECE stipulated line; when the braking strength z 2 <z<z 3 When the front axle braking force is unchanged, the rear axle braking force is increased; when the braking strength z 3 When the braking force is less than z, the motor stops braking, and the braking force of the front shaft and the rear shaft is distributed along the beta line; during the whole braking process, if the motor braking force is insufficient, the hydraulic braking force compensates the loss of the total braking force; the brake force distribution is formulated as follows:
the boundary for z is calculated as follows:
wherein Fbf Indicating front axle braking force, F br For rear axle braking force, F b For total required braking force, L is the total wheelbase, k is the rear wheelbase, h g Is the height of the mass center, T bmax Is the maximum value of motor braking moment, beta is the braking force distribution coefficient,r is a correction coefficient of the rotating mass w For the radius of the wheel, i t Is the vehicle transmission ratio, η is the transmission efficiency.
And fourthly, when the vehicle group approaches an intersection with a signal lamp, the optimal strategy adjusts the driving torque, the rotation angle and the braking force of each network-connected vehicle according to the timing of traffic light signals, and queue reorganization and queue length planning are carried out, so that part of vehicles form queues to pass through the intersection in a green light period, and the rest vehicles wait before stopping the line, thereby reducing the energy consumption of the whole vehicle group and realizing better economy and passing efficiency.
Claims (9)
1. The PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control is characterized in that:
the mixed vehicle group is a vehicle group consisting of intelligent network-connected vehicles and conventional autonomous vehicles driven manually, and the control method comprises the following steps of;
step one, determining a number type driving style by clustering driving data of various working conditions through a plurality of characteristic parameters;
step two, determining a lane change safety area of the vehicle according to the acquired vehicle state information and surrounding environment information;
step three, in the self-adaptive cruise control, vehicle spacing, vehicle speed and acceleration are restrained based on safety and comfort, and braking energy is reasonably recovered during braking; when approaching the intersection, the queues are recombined again, so that the vehicle groups pass through orderly;
and fourthly, training sample data in the environment based on a software-Critic reinforcement learning algorithm, and continuously and iteratively updating according to a set loss function to finally obtain an optimal strategy capable of enabling a vehicle group to reorganize the queues according to driving styles of different drivers on roads with no more than three lanes.
2. The PHEV hybrid vehicle cluster-optimization control method for queue management and adaptive cruise control of claim 1, wherein: the vehicles at the head of the vehicle queue of the vehicle group are intelligent networking vehicles, and the length of the queue is limited to not more than 8 vehicles;
In the first step, the specific classification method of different driving styles is to reduce the dimension of the characteristic parameters from the historical data of a large number of manual driving vehicles by a principal component analysis method, and then three classified driving styles are obtained by adopting K-mean algorithm clustering, wherein the different driving styles are represented in the following characteristic parameters: average longitudinal speedMaximum longitudinal vehicle speed v max Minimum longitudinal vehicle speed v min Longitudinal vehicle speed standard deviation epsilon υ Longitudinal acceleration average->Maximum value of longitudinal acceleration a x max Minimum value of longitudinal acceleration a x min Standard deviation sigma of longitudinal acceleration x Lateral acceleration average->Maximum value of transverse acceleration a y max Minimum value of lateral acceleration a y min Standard deviation sigma of transverse acceleration y Headway THW, collision time parameter TTC, minimum headway DHW min The method comprises the steps of carrying out a first treatment on the surface of the Longitudinal vehicle speed standard deviation epsilon υ The calculation is as follows
The headway THW is the time required for the head of the main vehicle to reach the tail position of the front vehicle under the current vehicle speed running, the collision time parameter TTC is the time required for the collision between the main vehicle and the front vehicle under the current state running, and the calculation formula is as follows:
υ rel =υ p -υ f a formula IV;
wherein ,drel Is the relative distance between two vehicles, v p Is the speed of the main vehicle, v f The vehicle speed is the front vehicle speed;
the distance between the two vehicles is the larger the distance between the main vehicle head and the front vehicle head of the same lane, and the smaller the collision accident possibility of the two vehicles is; conversely, the greater the possibility of accident of the two vehicles; the smaller the head space is, the more aggressive the driver drives are reflected, and therefore, the minimum head space is selected as a driving style characteristic parameter index.
3. The PHEV hybrid vehicle cluster-optimization control method for queue management and adaptive cruise control of claim 1, wherein: step two, the driving road of the vehicle queue is three lanes, lane changing operation is involved in the vehicle group recombination process, surrounding vehicles and environment information are acquired by the network-connected vehicles through V2V communication and vehicle-mounted sensors, and the information comprises vehicle speed, vehicle body length, torque, power, vehicle distance, vehicle position information and intersection signal lamp phase;
the method for detecting the lane change safety area used in the lane change operation is as follows:
the method is based on vehicle kinematics, wherein when the vehicle is at a specific position, the vertex coordinate [ x ] of the right front end direction of the vehicle p1 (t),y p1 (t)]At time t (t m <t<t n ) Expressed as:
wherein :υp (t)、θ p (t) the host vehicle speed and yaw angle of the vehicle, respectively; t is t m T is the initial time point of lane change of the main vehicle n Then the time point is finished;
similarly, the left front vertex [ x ] of the main vehicle at time t p2 (t),y p2 (t)]Left rear vertex [ x p3 (t),y p3 (t)]Right rear vertex [ x p4 (t),y p4 (t)]The coordinates are respectively:
wherein: a is the length of the vehicle; b is the vehicle width;
in the course of vehicle lane changing, the host vehicle can obtain the reasonable safe area of lane changing of vehicle according to the condition analysis that the vehicle does not collide with surrounding vehicles;
If the main vehicle changes lanes leftwards at a certain time point in the future, the condition that the main vehicle collides with the front vehicle is that the speed of the main vehicle is larger than that of the front vehicle, the vehicle distance is gradually shortened, and the situation that the right front vertex of the main vehicle collides with the left rear vertex of the front vehicle occurs;
let the collision point be S 1 The collision time point isImpact point S 1 Coordinates of->Expressed as:
in the formula :υf (t) is the speed of the front vehicle of the current lane at the moment t, D 1 The distance between the main vehicle and the front vehicle of the lane; if collision occurs with the rear vehicle of the lane change target lane, the speed of the main vehicle is smaller than that of the rear vehicle, the distance between the two vehicles is reduced along with time, and the lane change is performed at the moment, and the left Fang Ding point of the main vehicle collides with the rear vehicle of the target lane; let the collision point be S 2 The collision time point isThe collision point S is based on the vehicle structural dimensions and the autonomous vehicle kinematics theory 2 The coordinates of (c) are expressed as:
in the formula :υr (t) the speed of the vehicle behind the target lane at the time t, D 1 The relative distance between the autonomous vehicle and the rear vehicle of the target lane is l, and the lane width is l;
according to the collision point S 1 、S 2 The point coordinates determine a lane change security domain of the vehicle to avoid collisions; the right lane of the vehicle is changed, and the same is true; the intelligent network-connected vehicle judges whether the security domain meets the channel change condition according to the detected surrounding vehicle information.
4. The PHEV hybrid vehicle cluster-optimization control method for queue management and adaptive cruise control of claim 1, wherein: in the third step, each intelligent network vehicle is used as an intelligent agent, n network vehicles in the vehicle group are regarded as n intelligent agents, the number n of network vehicles in the vehicle group is limited within the range allowed by calculation, and the n parallel intelligent agents control the n network vehicles to realize interaction; n agents share the same neural network and parameters; the improvement of the driving state of any networked vehicle is helpful to the rewarding gain of the vehicle group through the parameter sharing structure of the neural network algorithm;
the intelligent network-connected vehicles in the vehicle queue interact with vehicles in the adjacent vehicle queue, meanwhile, the intelligent network-connected vehicles in the vehicle queue and the manual driving vehicles also keep interaction cooperation, and the interaction cooperation method comprises the following steps:
in the method A, when the vehicle is on a common lane, the vehicle queues are cooperatively controlled by self-adaptive cruising and regenerative braking, so that a reasonable vehicle distance is kept between vehicles, namely, a safe longitudinal gap is required to be continuously kept between two continuous vehicles; the deviation from the safe distance, namely the distance error is as small as possible to reduce collision risk, and the advantages of low oil consumption and high traffic throughput of the vehicle queue are exerted, so that the vehicle distance between the intelligent network vehicle and the common vehicle is required to be larger for compatibility with the running randomness of the common vehicle, and when the vehicle brakes, part of braking energy is recovered by the motor;
And B, when the vehicles approach the intersection with the signal lamp, splitting and recombining all vehicles in the vehicle queue to reduce energy consumption and running time, so that part of the vehicles in the queue sequentially pass through the intersection before the green light signal is cut off, and the rest vehicles wait before stopping the line.
5. The PHEV hybrid vehicle group optimal control method for queue management and adaptive cruise control of claim 4, wherein: in the fourth step, the software-Critic reinforcement learning algorithm, namely SAC algorithm, is an Off-policy model-free non-strategy deep reinforcement learning algorithm combining maximized entropy learning with an Actor-Critic framework; the learning content of the SAC reinforcement learning algorithm comprises a state s, an action a, a reward r and an environment model rho; the states comprise fuel consumption, battery state of charge, speed, acceleration, yaw angle and distance of the vehicle, the actions are torque and steering angle, and the rewards are fuel consumption, running time and comfort, and are self-adaptive cruise cost functions; in the fourth step, the SAC algorithm trains and learns sample data from the environment and continuously updates and optimizes the sample data to finally obtain an optimal strategy, so that intelligent network vehicles in the mixed vehicle group can be distributed in different lanes according to the driving style of a driver, and the mixed vehicles in the same lane form queues with different lengths;
The driving styles of different vehicles are classified into aggressive, robust and discreet types; when the vehicle train is driving on a road and lanes are allocated to different vehicles, the aggressive style vehicles tend to be arranged in the leftmost lane, the robust vehicles tend to be in the middle lane, and the discreet vehicles in the rightmost lane; the degree of gain for the vehicle group; the SAC algorithm adjusts the final allocation result of the vehicle driving lane depending on the degree of gain to the vehicle group.
6. The PHEV hybrid vehicle group optimal control method for queue management and adaptive cruise control of claim 5, wherein: the SAC algorithm consists of 1 actor neural network and 4 critic neural networks; the input of the actor neural network is a state, and the output is an action probability distribution parameter P (x);
the 4 critic neural networks are divided into state value estimation v critic and v critic target networks, and the action-state value estimation Q1 and Q2 critic neural networks; the input of the Q critic neural network is a state, and the output is the value of the state; wherein the output of the V critic neural network is V(s), which represents the estimation of state value; the output of the Q critic neural network is Q (s, a), representing an estimate of the action-state versus value; n agents share the same neural network and parameters; the improvement of the driving state of any networked vehicle is helpful to the rewarding gain of the vehicle group through the parameter sharing structure of the neural network algorithm;
Entropy in the algorithm is defined as:
wherein x follows the probability density function P (x) distribution; the introduction of the maximum entropy enables the output of the actions to be more dispersed, and excessive concentration of the output actions is avoided, so that the exploration capacity of an algorithm, the learning capacity of a new task and the stability are improved; the optimal strategy in the SAC algorithm framework is expressed as:
pi represents the policy adopted by the agent, a is an action, s represents a state, and r represents a reward; alpha refers to a temperature parameter, and determines the relative importance of the rewarding entropy, so that the randomness of the optimal strategy is ensured;
the state space S of SAC is defined as:
wherein ,for driving style, soc is battery state of charge, v p Is the vehicle speed, a p Is the acceleration of the vehicle, t dri Refers to the driving time, theta is the yaw angle, d des Refers to the distance from the front vehicle;
the action space a is defined as: a= { T p ,δ p Formula thirteen;
wherein Tp For the torque, delta of the host vehicle p The steering wheel angle of the vehicle;
the bonus function is defined as:
R={ω 1 ·m fuel c fuel +ω 2 ·P batt c elec +ω 3 ·(t dri -t ref )+ω 4 ·P rec +ω 5 ·J min }
formula fourteen;
ω 1 、ω 2 、ω 3 、ω 4 、ω 5 is a proportionality coefficient, m fuel Indicating the fuel consumption of the current intelligent network-connected vehicle, c fuel Is the price of fuel, P batt Finger motor power, c elec Price of electricity, t ref Is the reference travel time, P rec Refer to the braking energy recovery power, J min Is an adaptive cruise comprehensive cost function;
The vehicle driving action is independently executed by each network-connected vehicle, and the corresponding rewards are jointly optimized by collecting the control experience of the network-connected vehicle into a concentrated playback buffer zone;
for a specific state s t And action a t Soft value function Q of algorithm soft (s t ,a t ) The expression is as follows:
fifteen equations;
wherein, gamma E [0,1] is a scale factor;
to avoid overestimation in maximizing Q and further overestimation in calculating targets using target networks, the SAC algorithm introduces two online networks Q 1 and Q2 The parameters are e respectively 1 and e2 And two target networks upsilon and upsilon target, the parameters are respectivelyAndselecting the minimum function value output by the target network as the target value of the target frame; updating soft-valued network parameters by minimizing a loss function as follows:
representing the strategy by using a Gaussian distribution in a random strategy, namely mapping states into a mean value and a variance of the Gaussian distribution through parameters, and obtaining actions by sampling from the Gaussian distribution; if in state s t As input, outputting a gaussian distribution with mean and standard deviation; then motion a is obtained using a re-parameterization technique t The formula is:
in the formula ,εt For noise signals sampled from a standard normal distribution, Is the mean and standard deviation of the gaussian distribution, where μ (s t) and σ(st ) Respectively, mean and standard deviation of Gaussian distribution, τ t Is a noise signal sampled from a standard normal distribution;
the relationship of the policy function to the soft function is expressed as:
updating policy network parameters by minimizing Kullback-Leibler divergence; the smaller the Kullback-Leibler divergence, the smaller the difference between rewards corresponding to output behavior, and the better the convergence effect of the strategy; the update rules of the policy network are expressed as:
wherein Z(st ) The distribution function is used for normalizing the distribution;
finally, the strategy network parameters are updated according to the gradient descent method, and are expressed as:
the temperature coefficient represents the importance of the algorithm on entropy, and the temperature coefficient is regulated to be important for the training effect of the SAC algorithm; the optimal temperature coefficient is different according to the reinforcement learning task and the training period; for this use is made of a temperature coefficient automatic regulation mechanism; under this mechanism, a constrained optimization problem is constructed, and the optimal temperature coefficient of each step is obtained by minimizing the objective function, which is expressed as:
wherein H0 Is a predefined minimum policy entropy threshold.
7. The PHEV hybrid vehicle group optimal control method for queue management and adaptive cruise control of claim 5, wherein: training and learning the related vehicle group queue reorganization, self-adaptive cruising and high-efficiency traffic data of the intersection by using a SAC reinforcement learning algorithm to obtain an optimal control strategy; the SAC considers the states of the networked vehicles in the vehicle group and finds an optimal self-adaptive cruise control strategy, and the optimal strategy is fed back to the torque and the corner of each networked vehicle corresponding to the self-adaptive cruise control to control the running track of the vehicle;
In the self-adaptive cruise control in the step three, the required vehicle distance is influenced by the driving style of a driver, the road commute efficiency and the vehicle safety, and the distance between the network-connected vehicle and the manually-driven vehicle is larger than the vehicle distance between the network-connected vehicles by considering the uncertainty of the driving intention of the manually-driven vehicle; if the distance between vehicles is too narrow, the commute efficiency is improved, but anxiety of drivers may cause collision accidents; in contrast, a larger vehicle distance is a guarantee of vehicle safety, but road commute efficiency is poor, and side vehicles are easy to insert;
the constant time headway CTH was used for the vehicle spacing algorithm as follows:
d des =τ h υ h +d 0 formula twenty-two;
wherein τh Is the nominal headway, d 0 Is a safe stopping distance;
in terms of car following safety, there are the following constraint formulas:
d min <d<d max
Δd=d-d des
Δυ min <Δυ<Δυ max
Δυ=υ p -υ f
where d is the actual distance between the autonomous vehicle and the preceding vehicle, d min and dmax Is the minimum and maximum vehicle distance; deltav is the speed difference between the autonomous vehicle and the preceding vehicle, deltav min and Δυmax Is the minimum and maximum speed difference;
the comfort constraint formula is as follows; Δa=a p -a f ;a f The acceleration is the front vehicle acceleration;
the adaptive cruise integrated cost function is:
J min =ω 6 Δd 2 +ω 7 Δυ 2 +ω 8 Δa 2 the formula twenty-third.
8. The PHEV hybrid vehicle group optimal control method for queue management and adaptive cruise control of claim 5, wherein: when the brake force distribution is limited in accordance with the ECE regulations, the following brake force distribution strategy is adopted;
When the braking strength z is less than z 1 When the braking force is provided by the front axle only; when the braking strength z 1 <z<z 2 When the front and rear axle braking forces are distributed along the ECE stipulated line; when the braking strength z 2 <z<z 3 When the front axle braking force is unchanged, the rear axle braking force is increased; when the braking strength z 3 When the braking force is less than z, the motor stops braking, and the braking force of the front shaft and the rear shaft is distributed along the beta line; at the whole processDuring the braking process, if the motor braking force is insufficient, the hydraulic braking force compensates the loss of the total braking force; the brake force distribution is formulated as follows:
the boundary for z is calculated as follows:
wherein Fbf Indicating front axle braking force, F br For rear axle braking force, F b For total required braking force, L is the total wheelbase, k is the rear wheelbase, h g Is the height of the mass center, T bmax For the maximum value of motor braking moment, beta is the braking force distribution coefficient, theta is the correction coefficient of rotating mass, r w For the radius of the wheel, i t Is the vehicle transmission ratio, η is the transmission efficiency.
9. The PHEV hybrid vehicle group optimal control method for queue management and adaptive cruise control of claim 5, wherein: and fourthly, when the vehicle group approaches an intersection with a signal lamp, the optimal strategy adjusts the driving torque, the rotation angle and the braking force of each network-connected vehicle according to the timing of traffic light signals, and queue reorganization and queue length planning are carried out, so that part of vehicles form queues to pass through the intersection in a green light period, and the rest vehicles wait before stopping the line, thereby reducing the energy consumption of the whole vehicle group and realizing better economy and passing efficiency.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310945210.7A CN116853245A (en) | 2023-07-31 | 2023-07-31 | PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310945210.7A CN116853245A (en) | 2023-07-31 | 2023-07-31 | PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116853245A true CN116853245A (en) | 2023-10-10 |
Family
ID=88223446
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310945210.7A Pending CN116853245A (en) | 2023-07-31 | 2023-07-31 | PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116853245A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117711182A (en) * | 2023-12-21 | 2024-03-15 | 交通运输部公路科学研究所 | Intelligent network-connected vehicle track collaborative optimization method for intersection environment |
-
2023
- 2023-07-31 CN CN202310945210.7A patent/CN116853245A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117711182A (en) * | 2023-12-21 | 2024-03-15 | 交通运输部公路科学研究所 | Intelligent network-connected vehicle track collaborative optimization method for intersection environment |
CN117711182B (en) * | 2023-12-21 | 2024-06-11 | 交通运输部公路科学研究所 | Intelligent network-connected vehicle track collaborative optimization method for intersection environment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107117170B (en) | A kind of real-time prediction cruise control system driven based on economy | |
CN109501799B (en) | Dynamic path planning method under condition of Internet of vehicles | |
CN110930697B (en) | Rule-based intelligent networked vehicle cooperative convergence control method | |
CN111746539B (en) | Intelligent network-connected automobile strict and safe lane-changing enqueueing control method | |
CN113753026B (en) | Decision-making method for preventing rollover of large commercial vehicle by considering road adhesion condition | |
CN109410561A (en) | A kind of even heterogeneous formation travel control method of highway vehicle | |
CN113788021A (en) | Adaptive following cruise control method combined with preceding vehicle speed prediction | |
CN113489793B (en) | Expressway double-lane cooperative control method in mixed traffic scene | |
CN116853245A (en) | PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control | |
CN111959492A (en) | HEV energy management hierarchical control method considering lane change behavior in networking environment | |
CN113593275B (en) | Intersection internet automatic driving method based on bus signal priority | |
CN116740945B (en) | Method and system for multi-vehicle collaborative grouping intersection of expressway confluence region in mixed running environment | |
CN113886764A (en) | Intelligent vehicle multi-scene track planning method based on Frenet coordinate system | |
CN115273450A (en) | Lane changing method for vehicles entering formation under network connection automatic driving environment | |
CN115257789A (en) | Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment | |
Zheng et al. | Research on control target of truck platoon based on maximizing fuel saving rate | |
CN112758105A (en) | Automatic driving fleet following running control method, device and system | |
CN113459829B (en) | Intelligent energy management method for double-motor electric vehicle based on road condition prediction | |
CN114537420B (en) | Urban bus rapid transit energy-saving driving control method based on dynamic planning | |
CN115712950A (en) | Automatic driving decision-making method for semi-trailer | |
Choi et al. | Coordinated steering angle and yaw moment distribution to increase vehicle regenerative energy in autonomous driving | |
Xia et al. | Ecological cooperative adaptive cruise control of over‐actuated electric vehicles with in‐wheel motor in traffic flow | |
Shahram et al. | Utilizing Speed Information Forecast in Energy Optimization of an Electric Vehicle with Adaptive Cruise Controller | |
Yan et al. | Velocity Trajectory Planning of Electric Vehicles with Consideration of the Passenger's Individual Preferences | |
Xu et al. | Look-ahead Horizon based Energy Optimization for Connected Hybrid Electric Vehicles |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |