CN116853245A

CN116853245A - PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control

Info

Publication number: CN116853245A
Application number: CN202310945210.7A
Authority: CN
Inventors: 林歆悠; 陈显康; 黄强; 叶锦泽; 黄家旺
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2023-07-31
Filing date: 2023-07-31
Publication date: 2023-10-10

Abstract

The invention provides a PHEV hybrid vehicle group optimization control method for queue management and self-adaptive cruise control, which comprises the following steps of; step one, determining a number type driving style by clustering driving data of various working conditions through a plurality of characteristic parameters; step two, determining a lane change safety area of the vehicle according to the acquired vehicle state information and surrounding environment information; step three, in the self-adaptive cruise control, vehicle spacing, vehicle speed and acceleration are restrained based on safety and comfort, and braking energy is reasonably recovered during braking; when approaching the intersection, the queues are recombined again, so that the vehicle groups pass through orderly; step four, training sample data in the environment based on a Soft Actor-Critic reinforcement learning algorithm, and continuously and iteratively updating according to a set loss function to finally obtain an optimal strategy capable of enabling a vehicle group to reorganize a queue on roads with no more than three lanes according to driving styles of different drivers; the invention can reasonably plan the vehicle queues according to the driving intention of the driver, so that the vehicle queues can efficiently run.

Description

PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control

Technical Field

The invention relates to the field of intelligent vehicle driving, in particular to a PHEV hybrid vehicle group optimization control method for queue management and self-adaptive cruise control.

Background

The transportation sector is one of the main sources of emissions and fuel consumption. With the rapid growth of petroleum consumption, the scarcity of fossil energy sources is becoming more and more serious. It is important to reduce fuel consumption and emissions, especially in urban areas. Alternative fuel automobiles and networking and autopilot technology are expected to be two effective approaches. Alternative fuel vehicles such as hybrid vehicles and electric vehicles are considered as a further direction of vehicle development. The advanced intelligent driving technology not only can help to save fuel, but also can assist driving, and improves driving safety and comfort.

Most of the current efforts have focused on tailoring fuel economy control strategies for one vehicle without regard to the impact on other vehicles. The overall automated vehicle regime is expected to evolve, so that in the upcoming transition period, the automated driving vehicle will share the same network with the caravans at different permeabilities, which will be a dynamic mixed traffic environment. Partial implementations have different effects on traffic flow characteristics than full implementations of autopilot systems. Efficient operation and control of intelligent networked and common vehicle mixed traffic is a complex challenge.

In view of the above, the present invention aims to provide a PHEV hybrid vehicle group optimization control method based on queue management and adaptive cruise control, which can reasonably plan a vehicle queue according to the driving intention of a driver, so that a fleet can efficiently run.

Disclosure of Invention

The PHEV hybrid vehicle group optimization control method for queue management and self-adaptive cruise control provided by the invention can reasonably plan vehicle queues according to the driving intention of a driver, so that a vehicle team can efficiently run.

The invention adopts the following technical scheme.

The PHEV hybrid vehicle group optimization control method for queue management and self-adaptive cruise control comprises the following steps that the hybrid vehicle group is a vehicle group consisting of intelligent network-connected vehicles and conventional autonomous vehicles for manual driving;

step one, determining a number type driving style by clustering driving data of various working conditions through a plurality of characteristic parameters;

step two, determining a lane change safety area of the vehicle according to the acquired vehicle state information and surrounding environment information;

step three, in the self-adaptive cruise control, vehicle spacing, vehicle speed and acceleration are restrained based on safety and comfort, and braking energy is reasonably recovered during braking; when approaching the intersection, the queues are recombined again, so that the vehicle groups pass through orderly;

And fourthly, training sample data in the environment based on a Soft Actor-Critic reinforcement learning algorithm, and continuously and iteratively updating according to a set loss function to finally obtain an optimal strategy capable of enabling a vehicle group to reorganize the queues according to driving styles of different drivers on roads with no more than three lanes.

The vehicles at the head of the vehicle queue of the vehicle group are intelligent networking vehicles, and the length of the queue is limited to not more than 8 vehicles;

in the first step, the specific classification method of different driving styles is to reduce the dimension of the characteristic parameters from the historical data of a large number of manual driving vehicles by a principal component analysis method, and then three classified driving styles are obtained by adopting K-mean algorithm clustering, wherein the different driving styles are represented in the following characteristic parameters: average longitudinal speedMaximum longitudinal vehicle speed v _max Minimum longitudinal vehicle speed v _min Longitudinal vehicle speed standard deviation epsilon _v Longitudinal acceleration average->Maximum value of longitudinal acceleration a _xmax Minimum value of longitudinal acceleration a _xmin Standard deviation sigma of longitudinal acceleration _x Lateral acceleration average->Maximum value of transverse acceleration a _ymax Minimum value of lateral acceleration a _ymin Standard deviation sigma of transverse acceleration _y Headway THW, time to collision parameter TTC: minimum head space DHW _min The method comprises the steps of carrying out a first treatment on the surface of the Longitudinal vehicle speed standard deviation epsilon _v The calculation is as follows

The headway THW is the time required for the head of the main vehicle to reach the tail position of the front vehicle under the current vehicle speed running, the collision time parameter TTC is the time required for the collision between the main vehicle and the front vehicle under the current state running, and the calculation formula is as follows:

v _rel ＝v _p -υ _f a formula IV;

wherein ,d_rel V is the relative distance between two vehicles _p Is the speed of the main vehicle, v _f The vehicle speed is the front vehicle speed;

the distance between the two vehicles is the larger the distance between the main vehicle head and the front vehicle head of the same lane, and the smaller the collision accident possibility of the two vehicles is; conversely, the greater the possibility of accident of the two vehicles; the smaller the head space is, the more aggressive the driver drives are reflected, and therefore, the minimum head space is selected as a driving style characteristic parameter index.

Step two, the driving road of the vehicle queue is three lanes, lane changing operation is involved in the vehicle group recombination process, surrounding vehicles and environment information are acquired by the network-connected vehicles through V2V communication and vehicle-mounted sensors, and the information comprises vehicle speed, vehicle body length, torque, power, vehicle distance, vehicle position information and intersection signal lamp phase;

the method for detecting the lane change safety area used in the lane change operation is as follows:

The method is based on vehicle kinematics, wherein when the vehicle is at a specific position, the vertex coordinate [ x ] of the right front end direction of the vehicle _p1 (t)，y _p1 (t)]At time t (t _m ＜t＜t _n ) Expressed as:

wherein ：υ_p (t)、θ _p (t) the host vehicle speed and yaw angle of the vehicle, respectively; t is t _m T is the initial time point of lane change of the main vehicle _n Then the time point is finished;

similarly, the left front vertex [ x ] of the main vehicle at time t _p2 (t)，y _p2 (t)]Left rear vertex [ x _p3 (t)，y _p3 (t)]Right rear vertex [ x _p4 (t)，y _p4 (t)]The coordinates are respectively:

wherein: a is the length of the vehicle; b is the vehicle width;

in the course of vehicle lane changing, the host vehicle can obtain the reasonable safe area of lane changing of vehicle according to the condition analysis that the vehicle does not collide with surrounding vehicles;

if the main vehicle changes lanes leftwards at a certain time point in the future, the condition that the main vehicle collides with the front vehicle is that the speed of the main vehicle is larger than that of the front vehicle, the vehicle distance is gradually shortened, and the situation that the right front vertex of the main vehicle collides with the left rear vertex of the front vehicle occurs;

let the collision point be S ₁ The collision time point isImpact point S ₁ Coordinates of (c)Expressed as:

in the formula ：v_f (t) is the speed of the front vehicle of the current lane at the moment t, D ₁ The distance between the main vehicle and the front vehicle of the lane;

if collision occurs with the rear vehicle of the lane change target lane, the speed of the main vehicle is smaller than that of the rear vehicle, the distance between the two vehicles is reduced along with time, and the lane change is performed at the moment, and the left Fang Ding point of the main vehicle collides with the rear vehicle of the target lane; let the collision point be S ₂ The collision time point isThe collision point S is based on the structural dimensions of the vehicle and the theory of autonomous vehicle travel ₂ The coordinates of (c) are expressed as:

in the formula ：v_r (t) the speed of the vehicle behind the target lane at the time t, D ₁ The relative distance between the autonomous vehicle and the rear vehicle of the target lane is l, and the lane width is l;

according to the collision pointS ₁ 、S ₂ The point coordinates determine a lane change security domain of the vehicle to avoid collisions; the right lane of the vehicle is changed, and the same is true; the intelligent network-connected vehicle judges whether the security domain meets the channel change condition according to the detected surrounding vehicle information.

In the third step, each intelligent network vehicle is used as an intelligent agent, n network vehicles in the vehicle group are regarded as n intelligent agents, the number n of network vehicles in the vehicle group is limited within the range allowed by calculation, and the n parallel intelligent agents control the n network vehicles to realize interaction; n agents share the same neural network and parameters; the improvement of the driving state of any networked vehicle is helpful to the rewarding gain of the vehicle group through the parameter sharing structure of the neural network algorithm;

the intelligent network-connected vehicles in the vehicle queue interact with vehicles in the adjacent vehicle queue, meanwhile, the intelligent network-connected vehicles in the vehicle queue and the manual driving vehicles also keep interaction cooperation, and the interaction cooperation method comprises the following steps:

In the method A, when the vehicle is on a common lane, the vehicle queues are cooperatively controlled by self-adaptive cruising and regenerative braking, so that a reasonable vehicle distance is kept between vehicles, namely, a safe longitudinal gap is required to be continuously kept between two continuous vehicles; the deviation from the safe distance, namely the distance error is as small as possible to reduce collision risk, and the advantages of low oil consumption and high traffic throughput of the vehicle queue are exerted, so that the vehicle distance between the intelligent network vehicle and the common vehicle is required to be larger for compatibility with the running randomness of the common vehicle, and when the vehicle brakes, part of braking energy is recovered by the motor;

and B, when the vehicles approach the intersection with the signal lamp, splitting and recombining all vehicles in the vehicle queue to reduce energy consumption and running time, so that part of the vehicles in the queue sequentially pass through the intersection before the green light signal is cut off, and the rest vehicles wait before stopping the line.

In the fourth step, a Soft Actor-Critic reinforcement learning algorithm, namely a SAC algorithm, is an Off-policy model-free non-strategy deep reinforcement learning algorithm combining maximized entropy learning with an Actor-Critic framework; the learning content of the SAC reinforcement learning algorithm comprises a state s, an action a, a reward r and an environment model rho; the states comprise fuel consumption, battery state of charge, speed, acceleration, yaw angle and distance of the vehicle, the actions are torque and steering angle, and the rewards are fuel consumption, running time and comfort, and are self-adaptive cruise cost functions;

In the fourth step, the SAC algorithm trains and learns sample data from the environment and continuously updates and optimizes the sample data to finally obtain an optimal strategy, so that intelligent network vehicles in the mixed vehicle group can be distributed in different lanes according to the driving style of a driver, and the mixed vehicles in the same lane form queues with different lengths;

the driving styles of different vehicles are classified into aggressive, robust and discreet types; when the vehicle train is driving on a road and lanes are allocated to different vehicles, the aggressive style vehicles tend to be arranged in the leftmost lane, the robust vehicles tend to be in the middle lane, and the discreet vehicles in the rightmost lane; the degree of gain for the vehicle group; the SAC algorithm adjusts the final allocation result of the vehicle driving lane depending on the degree of gain to the vehicle group.

The SAC algorithm consists of 1 actor neural network and 4 critic neural networks; the input of the actor neural network is a state, and the output is an action probability distribution parameter P (x);

the 4 critic neural networks are divided into state value estimation v critic and v critic target networks, and the action-state value estimation Q1 and Q2 critic neural networks; the input of the Q critic neural network is a state, and the output is the value of the state; wherein the output of the V critic neural network is V(s), representing an estimate of state value; the output of the Q critic neural network is Q (s, a), representing an estimate of the action-state versus value; n agents share the same neural network and parameters; the improvement of the driving state of any networked vehicle is helpful to the rewarding gain of the vehicle group through the parameter sharing structure of the neural network algorithm;

Entropy in the algorithm is defined as:

wherein x follows the probability density function P (x) distribution; the introduction of the maximum entropy enables the output of the actions to be more dispersed, and excessive concentration of the output actions is avoided, so that the exploration capacity of an algorithm, the learning capacity of a new task and the stability are improved; the optimal strategy in the SAC algorithm framework is expressed as:

pi represents the policy adopted by the agent, a is an action, S represents a state, and r represents a reward; alpha refers to a temperature parameter, and determines the relative importance of the rewarding entropy, so that the randomness of the optimal strategy is ensured;

the state space S of SAC is defined as:

wherein ,for driving style, SOC is battery state of charge, v _p Is the vehicle speed, a _p Is the acceleration of the vehicle, t _dri Refers to the driving time, theta is the yaw angle, d _des Refers to the distance from the front vehicle;

the action space a is defined as: a= { T _p ，δ _p Formula thirteen;

wherein T_p For the torque, delta of the host vehicle _p The steering wheel angle of the vehicle;

the bonus function is defined as:

R＝{ω ₁ ·m _fuel c _fuel +ω ₂ ·P _batt c _elec +ω ₃ ·(t _dri -t _ref )+ω ₄ ·P _rec +ω ₅ ·J _min equation fourteen;

ω ₁ 、ω ₂ 、ω ₃ 、ω ₄ 、ω ₅ is a proportionality coefficient, m _fuel Indicating the fuel consumption of the current intelligent network-connected vehicle, c _fuel Is the price of fuel, P _batt Finger motor power, c _clec Price of electricity, t _ref Is the reference travel time, P _rec Refer to the braking energy recovery power, J _min Is an adaptive cruise comprehensive cost function;

The vehicle driving action is independently executed by each network-connected vehicle, and the corresponding rewards are jointly optimized by collecting the control experience of the network-connected vehicle into a concentrated playback buffer zone;

for a specific state s _t And action a _t Soft value function Q of algorithm _soft (s _t ，a _t ) The expression is as follows:

wherein, gamma E [0,1] is a scale factor;

to avoid overestimation in maximizing Q and further overestimation in calculating targets using target networks, the SAC algorithm introduces two online networks Q ₁ and Q₂ The parameters are e respectively ₁ and e₂ And two target networks v and vtarget, the parameters are respectively and />Selecting the minimum function value output by the target network as the target value of the target frame; updating soft-valued network parameters by minimizing a loss function as follows:

representing the strategy by using a Gaussian distribution in a random strategy, namely mapping states into a mean value and a variance of the Gaussian distribution through parameters, and obtaining actions by sampling from the Gaussian distribution; if in state S _t As input, outputting a gaussian distribution with mean and standard deviation; then motion a is obtained using a re-parameterization technique _t The formula is:

in the formula ,ε_t For noise signals sampled from a standard normal distribution, Is the mean and standard deviation of the gaussian distribution, where μ (s _t) and σ(s_t ) Respectively, mean and standard deviation of Gaussian distribution, τ _t Is a noise signal sampled from a standard normal distribution;

the relationship of the policy function to the soft function is expressed as:

updating policy network parameters by minimizing Kullback-Leibler divergence; the smaller the Kullback-Leibler divergence, the smaller the difference between rewards corresponding to output behavior, and the better the convergence effect of the strategy; the update rules of the policy network are expressed as:

nineteenth formula;

wherein Z(s_t ) The distribution function is used for normalizing the distribution;

finally, the strategy network parameters are updated according to the gradient descent method, and are expressed as:

the temperature coefficient represents the importance of the algorithm on entropy, and the temperature coefficient is regulated to be important for the training effect of the SAC algorithm; the optimal temperature coefficient is different according to the reinforcement learning task and the training period; for this use is made of a temperature coefficient automatic regulation mechanism; under this mechanism, a constrained optimization problem is constructed, and the optimal temperature coefficient of each step is obtained by minimizing the objective function, which is expressed as:

wherein H₀ Is a predefined minimum policy entropy threshold.

Training and learning the related vehicle group queue reorganization, self-adaptive cruising and high-efficiency traffic data of the intersection by using a SAC reinforcement learning algorithm to obtain an optimal control strategy; the SAC considers the states of the networked vehicles in the vehicle group and finds an optimal self-adaptive cruise control strategy, and the optimal strategy is fed back to the torque and the corner of each networked vehicle corresponding to the self-adaptive cruise control to control the running track of the vehicle; in the self-adaptive cruise control in the step three, the required vehicle distance is influenced by the driving style of a driver, the road commute efficiency and the vehicle safety, and the distance between the network-connected vehicle and the manually-driven vehicle is larger than the vehicle distance between the network-connected vehicles by considering the uncertainty of the driving intention of the manually-driven vehicle; if the distance between vehicles is too narrow, the commute efficiency is improved, but anxiety of drivers may cause collision accidents; in contrast, a larger vehicle distance is a guarantee of vehicle safety, but road commute efficiency is poor, and side vehicles are easy to insert;

The constant time headway CTH was used for the vehicle spacing algorithm as follows:

d _des ＝τ _h v _h +d ₀ formula twenty-two;

wherein τ_h Is the nominal headway, d ₀ Is a safe stopping distance;

in terms of car following safety, there are the following constraint formulas:

d _min ＜d＜d _max

Δd＝d—d _des

Δv _min ＜Δv＜Δv _max

Δv＝v _p -v _f

where d is the actual distance between the autonomous vehicle and the preceding vehicle, d _min and d_max Is the minimum and maximum vehicle distance; deltav is the speed difference between the autonomous vehicle and the preceding vehicle, deltav _min and Δv_max Is the minimum and maximum speed difference;

the comfort constraint formula is as follows; Δa=a _p -a _f ；a _f The acceleration is the front vehicle acceleration;

the adaptive cruise integrated cost function is:

J _min ＝ω ₆ Δd ² +ω ₇ Δv ² +ω ₈ Δa ² the formula twenty-third.

When the brake force distribution is limited in accordance with the ECE regulations, the following brake force distribution strategy is adopted;

when the braking strength z is less than z ₁ When the braking force is provided by the front axle only; when the braking strength z ₁ ＜z＜z ₂ When the front and rear axle braking forces are distributed along the ECE stipulated line; when the braking strength z ₂ ＜z＜z ₃ When the front axle braking force is unchanged, the rear axle braking force is increased; when the braking strength z ₃ When the braking force is less than z, the motor stops braking, and the braking force of the front shaft and the rear shaft is distributed along the beta line; during the whole braking process, if the motor braking force is insufficient, the hydraulic braking force compensates the loss of the total braking force; the brake force distribution is formulated as follows:

the boundary for z is calculated as follows:

wherein F_bf Indicating front axle braking force, F _br For rear axle braking force, F _b For total required braking force, L is the total wheelbase, k is the rear wheelbase, h _g Is the height of the mass center, T _bmax Is the maximum value of motor braking moment, beta is the braking force distribution coefficient,r is a correction coefficient of the rotating mass _w For the radius of the wheel, i _t Is the vehicle transmission ratio, η is the transmission efficiency.

And fourthly, when the vehicle group approaches an intersection with a signal lamp, the optimal strategy adjusts the driving torque, the rotation angle and the braking force of each network-connected vehicle according to the timing of traffic light signals, and queue reorganization and queue length planning are carried out, so that part of vehicles form queues to pass through the intersection in a green light period, and the rest vehicles wait before stopping the line, thereby reducing the energy consumption of the whole vehicle group and realizing better economy and passing efficiency.

Compared with the prior art, the invention has the following advantages: the SAC reinforcement learning algorithm fusing driving styles adopted by the invention has stronger stability compared with the common reinforcement learning algorithm. Meanwhile, the method can combine different vehicle queues according to different driving styles, realize self-adaptive cruising, recover redundant braking energy, and control the vehicle groups to orderly pass through intersections with signal lamps, so that the vehicle groups can save energy consumption and improve the overall traffic efficiency.

The method adopted by the invention can reasonably split and reorganize the vehicle queues according to different driving styles, aims at reducing energy consumption and driving time and improving driving safety and comfort, and optimizes traffic running efficiency.

Drawings

The invention is described in further detail below with reference to the attached drawings and detailed description:

FIG. 1 is a schematic flow chart of the present invention;

FIG. 2 is a schematic diagram of the intelligent network vehicle lane change collision critical point and safety area planning (intelligent network vehicle lane change collision critical point and safety area planning /) according to the present invention;

FIG. 3 is a flow chart of a queue combination based on the SoftActor-Critic algorithm of the present invention (full dark gray for intelligent networked vehicles, half gray for manual driven vehicles).

Detailed Description

As shown in the figure, the PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control, wherein the hybrid vehicle group is a vehicle group consisting of intelligent network-connected vehicles and conventional autonomous vehicles for manual driving, and the control method comprises the following steps of; the following steps are only used as distinction between the steps, and are not strictly executed according to the step sequence in practice;

and fourthly, training sample data in the environment based on a software-Critic reinforcement learning algorithm, and continuously and iteratively updating according to a set loss function to finally obtain an optimal strategy capable of enabling a vehicle group to reorganize the queues according to driving styles of different drivers on roads with no more than three lanes.

in the first step, the specific classification method of different driving styles is to manually drive vehicles from a large number of vehiclesThe history data is subjected to feature parameter dimension reduction by a principal component analysis method, and then three classified driving styles are obtained by adopting K-mean algorithm clustering, wherein different driving styles are reflected in the following feature parameters: average longitudinal speedMaximum longitudinal vehicle speed v _max Minimum longitudinal vehicle speed v _min Longitudinal vehicle speed standard deviation epsilon _v Longitudinal acceleration average->Maximum value of longitudinal acceleration a _xmax Minimum value of longitudinal acceleration a _xmin Standard deviation sigma of longitudinal acceleration _x Lateral acceleration average->Maximum value of transverse acceleration a _ymax Minimum value of lateral acceleration a _ymin Standard deviation sigma of transverse acceleration _y Headway THW, time to collision parameter TTC: minimum head space DHW _min The method comprises the steps of carrying out a first treatment on the surface of the Longitudinal vehicle speed standard deviation epsilon _v The calculation is as follows

v _rel ＝v _p -v _f equation four；

wherein ,d_rel V is the relative distance between two vehicles _p The speed v of the main vehicle _f The vehicle speed is the front vehicle speed;

as shown in FIG. 2, the method is based on the kinematics of the vehicle, in which the vertex coordinates [ x ] of the right front end direction of the vehicle are when the vehicle is at a certain position _p1 (t)，y _p1 (t)]At time t (t _m ＜t＜t _n ) Expressed as:

wherein ：v_p (t)、θ _p (t) the host vehicle speed and yaw angle of the vehicle, respectively; t is t _m T is the initial time point of lane change of the main vehicle _n Then the time point is finished;

wherein: a is the length of the vehicle; b is the vehicle width;

in the formula ：v_f (t) is the speed of the front vehicle of the current lane at the moment t, D ₁ The distance between the main vehicle and the front vehicle of the lane; as shown in fig. 2, if the vehicle collides with the rear vehicle of the lane change target lane, the speed of the main vehicle needs to be smaller than that of the rear vehicle, the distance between the two vehicles is reduced with time, and the lane change is performed at this time, so that the left Fang Ding point of the main vehicle collides with the rear vehicle of the target lane; let the collision point be S ₂ The collision time point isThen according to the vehicle structureSize and autonomous vehicle kinematics theory, collision point S ₂ The coordinates of (c) are expressed as:

according to the collision point S ₁ 、S ₂ The point coordinates determine a lane change security domain of the vehicle to avoid collisions; the right lane of the vehicle is changed, and the same is true; the intelligent network-connected vehicle judges whether the security domain meets the channel change condition according to the detected surrounding vehicle information.

In the fourth step, as shown in fig. 3, the software-Critic reinforcement learning algorithm, namely the SAC algorithm, is an Off-policy model-free non-strategy deep reinforcement learning algorithm combining maximized entropy learning with an Actor-Critic framework; the learning content of the SAC reinforcement learning algorithm comprises a state s, an action a, a reward r and an environment model rho; the states comprise fuel consumption, battery state of charge, speed, acceleration, yaw angle and distance of the vehicle, the actions are torque and steering angle, and the rewards are fuel consumption, running time and comfort, and are self-adaptive cruise cost functions;

Entropy in the algorithm is defined as:

the state space S of SAC is defined as:

wherein ,for driving style, soc is battery state of charge, v _p Is the vehicle speed, a _p Is the acceleration of the vehicle, t _dri Refers to the driving time, theta is the yaw angle, d _des Refers to the distance from the front vehicle; />

The action space a is defined as: a= { T _p ，δ _p Formula thirteen;

the bonus function is defined as:

ω ₁ 、ω ₂ 、ω ₃ 、ω ₄ 、ω ₅ is a proportionality coefficient, m _fuel Indicating the fuel consumption of the current intelligent network-connected vehicle, c _fuel Is the price of fuel, P _batt Finger motor power, c _elec Price of electricity, t _ref Is the reference travel time, P _rec Refer to the braking energy recovery power, J _min Is an adaptive cruise comprehensive cost function;

for a specific state s _t And action a _t Soft value function Q of algorithm _sft (s _t ，a _t ) The expression is as follows:

wherein, gamma E [0,1] is a scale factor;

to avoid overestimation in maximizing Q and further overestimation in calculating targets using target networks, the SAC algorithm introduces two online networks Q ₁ and Q₂ The parameters are e respectively ₁ and e₂ And two target networks and a target, the parameters are respectively and />Selecting the minimum function value output by the target network as the target value of the target frame; updating soft-valued network parameters by minimizing a loss function as follows:

the relationship of the policy function to the soft function is expressed as:

wherein H₀ Is a predefined minimum policy entropy threshold.

Training and learning the related vehicle group queue reorganization, self-adaptive cruising and high-efficiency traffic data of the intersection by using a SAC reinforcement learning algorithm to obtain an optimal control strategy; the SAC considers the states of the networked vehicles in the vehicle group and finds an optimal self-adaptive cruise control strategy, and the optimal strategy is fed back to the torque and the corner of each networked vehicle corresponding to the self-adaptive cruise control to control the running track of the vehicle;

In the self-adaptive cruise control in the step three, the required vehicle distance is influenced by the driving style of a driver, the road commute efficiency and the vehicle safety, and the distance between the network-connected vehicle and the manually-driven vehicle is larger than the vehicle distance between the network-connected vehicles by considering the uncertainty of the driving intention of the manually-driven vehicle; if the distance between vehicles is too narrow, the commute efficiency is improved, but anxiety of drivers may cause collision accidents; in contrast, a larger vehicle distance is a guarantee of vehicle safety, but road commute efficiency is poor, and side vehicles are easy to insert;

d _des ＝τ _h v _h +d ₀ formula twenty-two;

wherein τ_h Is the nominal headway, d ₀ Is a safe stopping distance;

in terms of car following safety, there are the following constraint formulas:

d _min ＜d＜d _max

Δd＝d-d _des

Δv _min ＜Δv＜Δv _max

Δv＝v _p -v _f

the adaptive cruise integrated cost function is:

J _min ＝ω ₆ Δd ² +ω ₇ Δv ² +ω ₈ Δa ² the formula twenty-third.

the boundary for z is calculated as follows:

Claims

1. The PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control is characterized in that:

the mixed vehicle group is a vehicle group consisting of intelligent network-connected vehicles and conventional autonomous vehicles driven manually, and the control method comprises the following steps of;

2. The PHEV hybrid vehicle cluster-optimization control method for queue management and adaptive cruise control of claim 1, wherein: the vehicles at the head of the vehicle queue of the vehicle group are intelligent networking vehicles, and the length of the queue is limited to not more than 8 vehicles;

In the first step, the specific classification method of different driving styles is to reduce the dimension of the characteristic parameters from the historical data of a large number of manual driving vehicles by a principal component analysis method, and then three classified driving styles are obtained by adopting K-mean algorithm clustering, wherein the different driving styles are represented in the following characteristic parameters: average longitudinal speedMaximum longitudinal vehicle speed v _max Minimum longitudinal vehicle speed v _min Longitudinal vehicle speed standard deviation epsilon _υ Longitudinal acceleration average->Maximum value of longitudinal acceleration a _{x max} Minimum value of longitudinal acceleration a _{x min} Standard deviation sigma of longitudinal acceleration _x Lateral acceleration average->Maximum value of transverse acceleration a _{y max} Minimum value of lateral acceleration a _{y min} Standard deviation sigma of transverse acceleration _y Headway THW, collision time parameter TTC, minimum headway DHW _min The method comprises the steps of carrying out a first treatment on the surface of the Longitudinal vehicle speed standard deviation epsilon _υ The calculation is as follows

υ _rel ＝υ _p -υ _f a formula IV;

wherein ,d_rel Is the relative distance between two vehicles, v _p Is the speed of the main vehicle, v _f The vehicle speed is the front vehicle speed;

3. The PHEV hybrid vehicle cluster-optimization control method for queue management and adaptive cruise control of claim 1, wherein: step two, the driving road of the vehicle queue is three lanes, lane changing operation is involved in the vehicle group recombination process, surrounding vehicles and environment information are acquired by the network-connected vehicles through V2V communication and vehicle-mounted sensors, and the information comprises vehicle speed, vehicle body length, torque, power, vehicle distance, vehicle position information and intersection signal lamp phase;

wherein: a is the length of the vehicle; b is the vehicle width;

let the collision point be S ₁ The collision time point isImpact point S ₁ Coordinates of->Expressed as:

in the formula ：υ_f (t) is the speed of the front vehicle of the current lane at the moment t, D ₁ The distance between the main vehicle and the front vehicle of the lane; if collision occurs with the rear vehicle of the lane change target lane, the speed of the main vehicle is smaller than that of the rear vehicle, the distance between the two vehicles is reduced along with time, and the lane change is performed at the moment, and the left Fang Ding point of the main vehicle collides with the rear vehicle of the target lane; let the collision point be S ₂ The collision time point isThe collision point S is based on the vehicle structural dimensions and the autonomous vehicle kinematics theory ₂ The coordinates of (c) are expressed as:

in the formula ：υ_r (t) the speed of the vehicle behind the target lane at the time t, D ₁ The relative distance between the autonomous vehicle and the rear vehicle of the target lane is l, and the lane width is l;

4. The PHEV hybrid vehicle cluster-optimization control method for queue management and adaptive cruise control of claim 1, wherein: in the third step, each intelligent network vehicle is used as an intelligent agent, n network vehicles in the vehicle group are regarded as n intelligent agents, the number n of network vehicles in the vehicle group is limited within the range allowed by calculation, and the n parallel intelligent agents control the n network vehicles to realize interaction; n agents share the same neural network and parameters; the improvement of the driving state of any networked vehicle is helpful to the rewarding gain of the vehicle group through the parameter sharing structure of the neural network algorithm;

5. The PHEV hybrid vehicle group optimal control method for queue management and adaptive cruise control of claim 4, wherein: in the fourth step, the software-Critic reinforcement learning algorithm, namely SAC algorithm, is an Off-policy model-free non-strategy deep reinforcement learning algorithm combining maximized entropy learning with an Actor-Critic framework; the learning content of the SAC reinforcement learning algorithm comprises a state s, an action a, a reward r and an environment model rho; the states comprise fuel consumption, battery state of charge, speed, acceleration, yaw angle and distance of the vehicle, the actions are torque and steering angle, and the rewards are fuel consumption, running time and comfort, and are self-adaptive cruise cost functions; in the fourth step, the SAC algorithm trains and learns sample data from the environment and continuously updates and optimizes the sample data to finally obtain an optimal strategy, so that intelligent network vehicles in the mixed vehicle group can be distributed in different lanes according to the driving style of a driver, and the mixed vehicles in the same lane form queues with different lengths;

6. The PHEV hybrid vehicle group optimal control method for queue management and adaptive cruise control of claim 5, wherein: the SAC algorithm consists of 1 actor neural network and 4 critic neural networks; the input of the actor neural network is a state, and the output is an action probability distribution parameter P (x);

the 4 critic neural networks are divided into state value estimation v critic and v critic target networks, and the action-state value estimation Q1 and Q2 critic neural networks; the input of the Q critic neural network is a state, and the output is the value of the state; wherein the output of the V critic neural network is V(s), which represents the estimation of state value; the output of the Q critic neural network is Q (s, a), representing an estimate of the action-state versus value; n agents share the same neural network and parameters; the improvement of the driving state of any networked vehicle is helpful to the rewarding gain of the vehicle group through the parameter sharing structure of the neural network algorithm;

Entropy in the algorithm is defined as:

the state space S of SAC is defined as:

the action space a is defined as: a= { T _p ，δ _p Formula thirteen;

the bonus function is defined as:

R＝{ω ₁ ·m _fuel c _fuel +ω ₂ ·P _batt c _elec +ω ₃ ·(t _dri -t _ref )+ω ₄ ·P _rec +ω ₅ ·J _min }

formula fourteen;

fifteen equations;

wherein, gamma E [0,1] is a scale factor;

to avoid overestimation in maximizing Q and further overestimation in calculating targets using target networks, the SAC algorithm introduces two online networks Q ₁ and Q₂ The parameters are e respectively ₁ and e₂ And two target networks upsilon and upsilon target, the parameters are respectivelyAndselecting the minimum function value output by the target network as the target value of the target frame; updating soft-valued network parameters by minimizing a loss function as follows:

the relationship of the policy function to the soft function is expressed as:

wherein H₀ Is a predefined minimum policy entropy threshold.

7. The PHEV hybrid vehicle group optimal control method for queue management and adaptive cruise control of claim 5, wherein: training and learning the related vehicle group queue reorganization, self-adaptive cruising and high-efficiency traffic data of the intersection by using a SAC reinforcement learning algorithm to obtain an optimal control strategy; the SAC considers the states of the networked vehicles in the vehicle group and finds an optimal self-adaptive cruise control strategy, and the optimal strategy is fed back to the torque and the corner of each networked vehicle corresponding to the self-adaptive cruise control to control the running track of the vehicle;

d _des ＝τ _h υ _h +d ₀ formula twenty-two;

wherein τ_h Is the nominal headway, d ₀ Is a safe stopping distance;

in terms of car following safety, there are the following constraint formulas:

d _min ＜d＜d _max

Δd＝d-d _des

Δυ _min ＜Δυ＜Δυ _max

Δυ＝υ _p -υ _f

where d is the actual distance between the autonomous vehicle and the preceding vehicle, d _min and d_max Is the minimum and maximum vehicle distance; deltav is the speed difference between the autonomous vehicle and the preceding vehicle, deltav _min and Δυ_max Is the minimum and maximum speed difference;

the adaptive cruise integrated cost function is:

J _min ＝ω ₆ Δd ² +ω ₇ Δυ ² +ω ₈ Δa ² the formula twenty-third.

8. The PHEV hybrid vehicle group optimal control method for queue management and adaptive cruise control of claim 5, wherein: when the brake force distribution is limited in accordance with the ECE regulations, the following brake force distribution strategy is adopted;

When the braking strength z is less than z ₁ When the braking force is provided by the front axle only; when the braking strength z ₁ ＜z＜z ₂ When the front and rear axle braking forces are distributed along the ECE stipulated line; when the braking strength z ₂ ＜z＜z ₃ When the front axle braking force is unchanged, the rear axle braking force is increased; when the braking strength z ₃ When the braking force is less than z, the motor stops braking, and the braking force of the front shaft and the rear shaft is distributed along the beta line; at the whole processDuring the braking process, if the motor braking force is insufficient, the hydraulic braking force compensates the loss of the total braking force; the brake force distribution is formulated as follows:

the boundary for z is calculated as follows:

wherein F_bf Indicating front axle braking force, F _br For rear axle braking force, F _b For total required braking force, L is the total wheelbase, k is the rear wheelbase, h _g Is the height of the mass center, T _bmax For the maximum value of motor braking moment, beta is the braking force distribution coefficient, theta is the correction coefficient of rotating mass, r _w For the radius of the wheel, i _t Is the vehicle transmission ratio, η is the transmission efficiency.

9. The PHEV hybrid vehicle group optimal control method for queue management and adaptive cruise control of claim 5, wherein: and fourthly, when the vehicle group approaches an intersection with a signal lamp, the optimal strategy adjusts the driving torque, the rotation angle and the braking force of each network-connected vehicle according to the timing of traffic light signals, and queue reorganization and queue length planning are carried out, so that part of vehicles form queues to pass through the intersection in a green light period, and the rest vehicles wait before stopping the line, thereby reducing the energy consumption of the whole vehicle group and realizing better economy and passing efficiency.