CN116853245A - PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control - Google Patents

PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control Download PDF

Info

Publication number
CN116853245A
CN116853245A CN202310945210.7A CN202310945210A CN116853245A CN 116853245 A CN116853245 A CN 116853245A CN 202310945210 A CN202310945210 A CN 202310945210A CN 116853245 A CN116853245 A CN 116853245A
Authority
CN
China
Prior art keywords
vehicle
vehicles
lane
driving
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310945210.7A
Other languages
Chinese (zh)
Inventor
林歆悠
陈显康
黄强
叶锦泽
黄家旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202310945210.7A priority Critical patent/CN116853245A/en
Publication of CN116853245A publication Critical patent/CN116853245A/en
Pending legal-status Critical Current

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/14Adaptive cruise control
    • B60W30/16Control of distance between vehicles, e.g. keeping a distance to preceding vehicle
    • B60W30/165Automatically following the path of a preceding lead vehicle, e.g. "electronic tow-bar"
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W20/00Control systems specially adapted for hybrid vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Control Of Driving Devices And Active Controlling Of Vehicle (AREA)

Abstract

The invention provides a PHEV hybrid vehicle group optimization control method for queue management and self-adaptive cruise control, which comprises the following steps of; step one, determining a number type driving style by clustering driving data of various working conditions through a plurality of characteristic parameters; step two, determining a lane change safety area of the vehicle according to the acquired vehicle state information and surrounding environment information; step three, in the self-adaptive cruise control, vehicle spacing, vehicle speed and acceleration are restrained based on safety and comfort, and braking energy is reasonably recovered during braking; when approaching the intersection, the queues are recombined again, so that the vehicle groups pass through orderly; step four, training sample data in the environment based on a Soft Actor-Critic reinforcement learning algorithm, and continuously and iteratively updating according to a set loss function to finally obtain an optimal strategy capable of enabling a vehicle group to reorganize a queue on roads with no more than three lanes according to driving styles of different drivers; the invention can reasonably plan the vehicle queues according to the driving intention of the driver, so that the vehicle queues can efficiently run.

Description

PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control
Technical Field
The invention relates to the field of intelligent vehicle driving, in particular to a PHEV hybrid vehicle group optimization control method for queue management and self-adaptive cruise control.
Background
The transportation sector is one of the main sources of emissions and fuel consumption. With the rapid growth of petroleum consumption, the scarcity of fossil energy sources is becoming more and more serious. It is important to reduce fuel consumption and emissions, especially in urban areas. Alternative fuel automobiles and networking and autopilot technology are expected to be two effective approaches. Alternative fuel vehicles such as hybrid vehicles and electric vehicles are considered as a further direction of vehicle development. The advanced intelligent driving technology not only can help to save fuel, but also can assist driving, and improves driving safety and comfort.
Most of the current efforts have focused on tailoring fuel economy control strategies for one vehicle without regard to the impact on other vehicles. The overall automated vehicle regime is expected to evolve, so that in the upcoming transition period, the automated driving vehicle will share the same network with the caravans at different permeabilities, which will be a dynamic mixed traffic environment. Partial implementations have different effects on traffic flow characteristics than full implementations of autopilot systems. Efficient operation and control of intelligent networked and common vehicle mixed traffic is a complex challenge.
In view of the above, the present invention aims to provide a PHEV hybrid vehicle group optimization control method based on queue management and adaptive cruise control, which can reasonably plan a vehicle queue according to the driving intention of a driver, so that a fleet can efficiently run.
Disclosure of Invention
The PHEV hybrid vehicle group optimization control method for queue management and self-adaptive cruise control provided by the invention can reasonably plan vehicle queues according to the driving intention of a driver, so that a vehicle team can efficiently run.
The invention adopts the following technical scheme.
The PHEV hybrid vehicle group optimization control method for queue management and self-adaptive cruise control comprises the following steps that the hybrid vehicle group is a vehicle group consisting of intelligent network-connected vehicles and conventional autonomous vehicles for manual driving;
step one, determining a number type driving style by clustering driving data of various working conditions through a plurality of characteristic parameters;
step two, determining a lane change safety area of the vehicle according to the acquired vehicle state information and surrounding environment information;
step three, in the self-adaptive cruise control, vehicle spacing, vehicle speed and acceleration are restrained based on safety and comfort, and braking energy is reasonably recovered during braking; when approaching the intersection, the queues are recombined again, so that the vehicle groups pass through orderly;
And fourthly, training sample data in the environment based on a Soft Actor-Critic reinforcement learning algorithm, and continuously and iteratively updating according to a set loss function to finally obtain an optimal strategy capable of enabling a vehicle group to reorganize the queues according to driving styles of different drivers on roads with no more than three lanes.
The vehicles at the head of the vehicle queue of the vehicle group are intelligent networking vehicles, and the length of the queue is limited to not more than 8 vehicles;
in the first step, the specific classification method of different driving styles is to reduce the dimension of the characteristic parameters from the historical data of a large number of manual driving vehicles by a principal component analysis method, and then three classified driving styles are obtained by adopting K-mean algorithm clustering, wherein the different driving styles are represented in the following characteristic parameters: average longitudinal speedMaximum longitudinal vehicle speed v max Minimum longitudinal vehicle speed v min Longitudinal vehicle speed standard deviation epsilon v Longitudinal acceleration average->Maximum value of longitudinal acceleration a xmax Minimum value of longitudinal acceleration a xmin Standard deviation sigma of longitudinal acceleration x Lateral acceleration average->Maximum value of transverse acceleration a ymax Minimum value of lateral acceleration a ymin Standard deviation sigma of transverse acceleration y Headway THW, time to collision parameter TTC: minimum head space DHW min The method comprises the steps of carrying out a first treatment on the surface of the Longitudinal vehicle speed standard deviation epsilon v The calculation is as follows
The headway THW is the time required for the head of the main vehicle to reach the tail position of the front vehicle under the current vehicle speed running, the collision time parameter TTC is the time required for the collision between the main vehicle and the front vehicle under the current state running, and the calculation formula is as follows:
v rel =v pf a formula IV;
wherein ,drel V is the relative distance between two vehicles p Is the speed of the main vehicle, v f The vehicle speed is the front vehicle speed;
the distance between the two vehicles is the larger the distance between the main vehicle head and the front vehicle head of the same lane, and the smaller the collision accident possibility of the two vehicles is; conversely, the greater the possibility of accident of the two vehicles; the smaller the head space is, the more aggressive the driver drives are reflected, and therefore, the minimum head space is selected as a driving style characteristic parameter index.
Step two, the driving road of the vehicle queue is three lanes, lane changing operation is involved in the vehicle group recombination process, surrounding vehicles and environment information are acquired by the network-connected vehicles through V2V communication and vehicle-mounted sensors, and the information comprises vehicle speed, vehicle body length, torque, power, vehicle distance, vehicle position information and intersection signal lamp phase;
the method for detecting the lane change safety area used in the lane change operation is as follows:
The method is based on vehicle kinematics, wherein when the vehicle is at a specific position, the vertex coordinate [ x ] of the right front end direction of the vehicle p1 (t),y p1 (t)]At time t (t m <t<t n ) Expressed as:
wherein :υp (t)、θ p (t) the host vehicle speed and yaw angle of the vehicle, respectively; t is t m T is the initial time point of lane change of the main vehicle n Then the time point is finished;
similarly, the left front vertex [ x ] of the main vehicle at time t p2 (t),y p2 (t)]Left rear vertex [ x p3 (t),y p3 (t)]Right rear vertex [ x p4 (t),y p4 (t)]The coordinates are respectively:
wherein: a is the length of the vehicle; b is the vehicle width;
in the course of vehicle lane changing, the host vehicle can obtain the reasonable safe area of lane changing of vehicle according to the condition analysis that the vehicle does not collide with surrounding vehicles;
if the main vehicle changes lanes leftwards at a certain time point in the future, the condition that the main vehicle collides with the front vehicle is that the speed of the main vehicle is larger than that of the front vehicle, the vehicle distance is gradually shortened, and the situation that the right front vertex of the main vehicle collides with the left rear vertex of the front vehicle occurs;
let the collision point be S 1 The collision time point isImpact point S 1 Coordinates of (c)Expressed as:
in the formula :vf (t) is the speed of the front vehicle of the current lane at the moment t, D 1 The distance between the main vehicle and the front vehicle of the lane;
if collision occurs with the rear vehicle of the lane change target lane, the speed of the main vehicle is smaller than that of the rear vehicle, the distance between the two vehicles is reduced along with time, and the lane change is performed at the moment, and the left Fang Ding point of the main vehicle collides with the rear vehicle of the target lane; let the collision point be S 2 The collision time point isThe collision point S is based on the structural dimensions of the vehicle and the theory of autonomous vehicle travel 2 The coordinates of (c) are expressed as:
in the formula :vr (t) the speed of the vehicle behind the target lane at the time t, D 1 The relative distance between the autonomous vehicle and the rear vehicle of the target lane is l, and the lane width is l;
according to the collision pointS 1 、S 2 The point coordinates determine a lane change security domain of the vehicle to avoid collisions; the right lane of the vehicle is changed, and the same is true; the intelligent network-connected vehicle judges whether the security domain meets the channel change condition according to the detected surrounding vehicle information.
In the third step, each intelligent network vehicle is used as an intelligent agent, n network vehicles in the vehicle group are regarded as n intelligent agents, the number n of network vehicles in the vehicle group is limited within the range allowed by calculation, and the n parallel intelligent agents control the n network vehicles to realize interaction; n agents share the same neural network and parameters; the improvement of the driving state of any networked vehicle is helpful to the rewarding gain of the vehicle group through the parameter sharing structure of the neural network algorithm;
the intelligent network-connected vehicles in the vehicle queue interact with vehicles in the adjacent vehicle queue, meanwhile, the intelligent network-connected vehicles in the vehicle queue and the manual driving vehicles also keep interaction cooperation, and the interaction cooperation method comprises the following steps:
In the method A, when the vehicle is on a common lane, the vehicle queues are cooperatively controlled by self-adaptive cruising and regenerative braking, so that a reasonable vehicle distance is kept between vehicles, namely, a safe longitudinal gap is required to be continuously kept between two continuous vehicles; the deviation from the safe distance, namely the distance error is as small as possible to reduce collision risk, and the advantages of low oil consumption and high traffic throughput of the vehicle queue are exerted, so that the vehicle distance between the intelligent network vehicle and the common vehicle is required to be larger for compatibility with the running randomness of the common vehicle, and when the vehicle brakes, part of braking energy is recovered by the motor;
and B, when the vehicles approach the intersection with the signal lamp, splitting and recombining all vehicles in the vehicle queue to reduce energy consumption and running time, so that part of the vehicles in the queue sequentially pass through the intersection before the green light signal is cut off, and the rest vehicles wait before stopping the line.
In the fourth step, a Soft Actor-Critic reinforcement learning algorithm, namely a SAC algorithm, is an Off-policy model-free non-strategy deep reinforcement learning algorithm combining maximized entropy learning with an Actor-Critic framework; the learning content of the SAC reinforcement learning algorithm comprises a state s, an action a, a reward r and an environment model rho; the states comprise fuel consumption, battery state of charge, speed, acceleration, yaw angle and distance of the vehicle, the actions are torque and steering angle, and the rewards are fuel consumption, running time and comfort, and are self-adaptive cruise cost functions;
In the fourth step, the SAC algorithm trains and learns sample data from the environment and continuously updates and optimizes the sample data to finally obtain an optimal strategy, so that intelligent network vehicles in the mixed vehicle group can be distributed in different lanes according to the driving style of a driver, and the mixed vehicles in the same lane form queues with different lengths;
the driving styles of different vehicles are classified into aggressive, robust and discreet types; when the vehicle train is driving on a road and lanes are allocated to different vehicles, the aggressive style vehicles tend to be arranged in the leftmost lane, the robust vehicles tend to be in the middle lane, and the discreet vehicles in the rightmost lane; the degree of gain for the vehicle group; the SAC algorithm adjusts the final allocation result of the vehicle driving lane depending on the degree of gain to the vehicle group.
The SAC algorithm consists of 1 actor neural network and 4 critic neural networks; the input of the actor neural network is a state, and the output is an action probability distribution parameter P (x);
the 4 critic neural networks are divided into state value estimation v critic and v critic target networks, and the action-state value estimation Q1 and Q2 critic neural networks; the input of the Q critic neural network is a state, and the output is the value of the state; wherein the output of the V critic neural network is V(s), representing an estimate of state value; the output of the Q critic neural network is Q (s, a), representing an estimate of the action-state versus value; n agents share the same neural network and parameters; the improvement of the driving state of any networked vehicle is helpful to the rewarding gain of the vehicle group through the parameter sharing structure of the neural network algorithm;
Entropy in the algorithm is defined as:
wherein x follows the probability density function P (x) distribution; the introduction of the maximum entropy enables the output of the actions to be more dispersed, and excessive concentration of the output actions is avoided, so that the exploration capacity of an algorithm, the learning capacity of a new task and the stability are improved; the optimal strategy in the SAC algorithm framework is expressed as:
pi represents the policy adopted by the agent, a is an action, S represents a state, and r represents a reward; alpha refers to a temperature parameter, and determines the relative importance of the rewarding entropy, so that the randomness of the optimal strategy is ensured;
the state space S of SAC is defined as:
wherein ,for driving style, SOC is battery state of charge, v p Is the vehicle speed, a p Is the acceleration of the vehicle, t dri Refers to the driving time, theta is the yaw angle, d des Refers to the distance from the front vehicle;
the action space a is defined as: a= { T p ,δ p Formula thirteen;
wherein Tp For the torque, delta of the host vehicle p The steering wheel angle of the vehicle;
the bonus function is defined as:
R={ω 1 ·m fuel c fuel2 ·P batt c elec3 ·(t dri -t ref )+ω 4 ·P rec5 ·J min equation fourteen;
ω 1 、ω 2 、ω 3 、ω 4 、ω 5 is a proportionality coefficient, m fuel Indicating the fuel consumption of the current intelligent network-connected vehicle, c fuel Is the price of fuel, P batt Finger motor power, c clec Price of electricity, t ref Is the reference travel time, P rec Refer to the braking energy recovery power, J min Is an adaptive cruise comprehensive cost function;
The vehicle driving action is independently executed by each network-connected vehicle, and the corresponding rewards are jointly optimized by collecting the control experience of the network-connected vehicle into a concentrated playback buffer zone;
for a specific state s t And action a t Soft value function Q of algorithm soft (s t ,a t ) The expression is as follows:
wherein, gamma E [0,1] is a scale factor;
to avoid overestimation in maximizing Q and further overestimation in calculating targets using target networks, the SAC algorithm introduces two online networks Q 1 and Q2 The parameters are e respectively 1 and e2 And two target networks v and vtarget, the parameters are respectively and />Selecting the minimum function value output by the target network as the target value of the target frame; updating soft-valued network parameters by minimizing a loss function as follows:
representing the strategy by using a Gaussian distribution in a random strategy, namely mapping states into a mean value and a variance of the Gaussian distribution through parameters, and obtaining actions by sampling from the Gaussian distribution; if in state S t As input, outputting a gaussian distribution with mean and standard deviation; then motion a is obtained using a re-parameterization technique t The formula is:
in the formula ,εt For noise signals sampled from a standard normal distribution, Is the mean and standard deviation of the gaussian distribution, where μ (s t) and σ(st ) Respectively, mean and standard deviation of Gaussian distribution, τ t Is a noise signal sampled from a standard normal distribution;
the relationship of the policy function to the soft function is expressed as:
updating policy network parameters by minimizing Kullback-Leibler divergence; the smaller the Kullback-Leibler divergence, the smaller the difference between rewards corresponding to output behavior, and the better the convergence effect of the strategy; the update rules of the policy network are expressed as:
nineteenth formula;
wherein Z(st ) The distribution function is used for normalizing the distribution;
finally, the strategy network parameters are updated according to the gradient descent method, and are expressed as:
the temperature coefficient represents the importance of the algorithm on entropy, and the temperature coefficient is regulated to be important for the training effect of the SAC algorithm; the optimal temperature coefficient is different according to the reinforcement learning task and the training period; for this use is made of a temperature coefficient automatic regulation mechanism; under this mechanism, a constrained optimization problem is constructed, and the optimal temperature coefficient of each step is obtained by minimizing the objective function, which is expressed as:
wherein H0 Is a predefined minimum policy entropy threshold.
Training and learning the related vehicle group queue reorganization, self-adaptive cruising and high-efficiency traffic data of the intersection by using a SAC reinforcement learning algorithm to obtain an optimal control strategy; the SAC considers the states of the networked vehicles in the vehicle group and finds an optimal self-adaptive cruise control strategy, and the optimal strategy is fed back to the torque and the corner of each networked vehicle corresponding to the self-adaptive cruise control to control the running track of the vehicle; in the self-adaptive cruise control in the step three, the required vehicle distance is influenced by the driving style of a driver, the road commute efficiency and the vehicle safety, and the distance between the network-connected vehicle and the manually-driven vehicle is larger than the vehicle distance between the network-connected vehicles by considering the uncertainty of the driving intention of the manually-driven vehicle; if the distance between vehicles is too narrow, the commute efficiency is improved, but anxiety of drivers may cause collision accidents; in contrast, a larger vehicle distance is a guarantee of vehicle safety, but road commute efficiency is poor, and side vehicles are easy to insert;
The constant time headway CTH was used for the vehicle spacing algorithm as follows:
d des =τ h v h +d 0 formula twenty-two;
wherein τh Is the nominal headway, d 0 Is a safe stopping distance;
in terms of car following safety, there are the following constraint formulas:
d min <d<d max
Δd=d—d des
Δv min <Δv<Δv max
Δv=v p -v f
where d is the actual distance between the autonomous vehicle and the preceding vehicle, d min and dmax Is the minimum and maximum vehicle distance; deltav is the speed difference between the autonomous vehicle and the preceding vehicle, deltav min and Δvmax Is the minimum and maximum speed difference;
the comfort constraint formula is as follows; Δa=a p -a f ;a f The acceleration is the front vehicle acceleration;
the adaptive cruise integrated cost function is:
J min =ω 6 Δd 27 Δv 28 Δa 2 the formula twenty-third.
When the brake force distribution is limited in accordance with the ECE regulations, the following brake force distribution strategy is adopted;
when the braking strength z is less than z 1 When the braking force is provided by the front axle only; when the braking strength z 1 <z<z 2 When the front and rear axle braking forces are distributed along the ECE stipulated line; when the braking strength z 2 <z<z 3 When the front axle braking force is unchanged, the rear axle braking force is increased; when the braking strength z 3 When the braking force is less than z, the motor stops braking, and the braking force of the front shaft and the rear shaft is distributed along the beta line; during the whole braking process, if the motor braking force is insufficient, the hydraulic braking force compensates the loss of the total braking force; the brake force distribution is formulated as follows:
the boundary for z is calculated as follows:
wherein Fbf Indicating front axle braking force, F br For rear axle braking force, F b For total required braking force, L is the total wheelbase, k is the rear wheelbase, h g Is the height of the mass center, T bmax Is the maximum value of motor braking moment, beta is the braking force distribution coefficient,r is a correction coefficient of the rotating mass w For the radius of the wheel, i t Is the vehicle transmission ratio, η is the transmission efficiency.
And fourthly, when the vehicle group approaches an intersection with a signal lamp, the optimal strategy adjusts the driving torque, the rotation angle and the braking force of each network-connected vehicle according to the timing of traffic light signals, and queue reorganization and queue length planning are carried out, so that part of vehicles form queues to pass through the intersection in a green light period, and the rest vehicles wait before stopping the line, thereby reducing the energy consumption of the whole vehicle group and realizing better economy and passing efficiency.
Compared with the prior art, the invention has the following advantages: the SAC reinforcement learning algorithm fusing driving styles adopted by the invention has stronger stability compared with the common reinforcement learning algorithm. Meanwhile, the method can combine different vehicle queues according to different driving styles, realize self-adaptive cruising, recover redundant braking energy, and control the vehicle groups to orderly pass through intersections with signal lamps, so that the vehicle groups can save energy consumption and improve the overall traffic efficiency.
The method adopted by the invention can reasonably split and reorganize the vehicle queues according to different driving styles, aims at reducing energy consumption and driving time and improving driving safety and comfort, and optimizes traffic running efficiency.
Drawings
The invention is described in further detail below with reference to the attached drawings and detailed description:
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a schematic diagram of the intelligent network vehicle lane change collision critical point and safety area planning (intelligent network vehicle lane change collision critical point and safety area planning /) according to the present invention;
FIG. 3 is a flow chart of a queue combination based on the SoftActor-Critic algorithm of the present invention (full dark gray for intelligent networked vehicles, half gray for manual driven vehicles).
Detailed Description
As shown in the figure, the PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control, wherein the hybrid vehicle group is a vehicle group consisting of intelligent network-connected vehicles and conventional autonomous vehicles for manual driving, and the control method comprises the following steps of; the following steps are only used as distinction between the steps, and are not strictly executed according to the step sequence in practice;
step one, determining a number type driving style by clustering driving data of various working conditions through a plurality of characteristic parameters;
Step two, determining a lane change safety area of the vehicle according to the acquired vehicle state information and surrounding environment information;
step three, in the self-adaptive cruise control, vehicle spacing, vehicle speed and acceleration are restrained based on safety and comfort, and braking energy is reasonably recovered during braking; when approaching the intersection, the queues are recombined again, so that the vehicle groups pass through orderly;
and fourthly, training sample data in the environment based on a software-Critic reinforcement learning algorithm, and continuously and iteratively updating according to a set loss function to finally obtain an optimal strategy capable of enabling a vehicle group to reorganize the queues according to driving styles of different drivers on roads with no more than three lanes.
The vehicles at the head of the vehicle queue of the vehicle group are intelligent networking vehicles, and the length of the queue is limited to not more than 8 vehicles;
in the first step, the specific classification method of different driving styles is to manually drive vehicles from a large number of vehiclesThe history data is subjected to feature parameter dimension reduction by a principal component analysis method, and then three classified driving styles are obtained by adopting K-mean algorithm clustering, wherein different driving styles are reflected in the following feature parameters: average longitudinal speedMaximum longitudinal vehicle speed v max Minimum longitudinal vehicle speed v min Longitudinal vehicle speed standard deviation epsilon v Longitudinal acceleration average->Maximum value of longitudinal acceleration a xmax Minimum value of longitudinal acceleration a xmin Standard deviation sigma of longitudinal acceleration x Lateral acceleration average->Maximum value of transverse acceleration a ymax Minimum value of lateral acceleration a ymin Standard deviation sigma of transverse acceleration y Headway THW, time to collision parameter TTC: minimum head space DHW min The method comprises the steps of carrying out a first treatment on the surface of the Longitudinal vehicle speed standard deviation epsilon v The calculation is as follows
The headway THW is the time required for the head of the main vehicle to reach the tail position of the front vehicle under the current vehicle speed running, the collision time parameter TTC is the time required for the collision between the main vehicle and the front vehicle under the current state running, and the calculation formula is as follows:
v rel =v p -v f equation four;
wherein ,drel V is the relative distance between two vehicles p The speed v of the main vehicle f The vehicle speed is the front vehicle speed;
the distance between the two vehicles is the larger the distance between the main vehicle head and the front vehicle head of the same lane, and the smaller the collision accident possibility of the two vehicles is; conversely, the greater the possibility of accident of the two vehicles; the smaller the head space is, the more aggressive the driver drives are reflected, and therefore, the minimum head space is selected as a driving style characteristic parameter index.
Step two, the driving road of the vehicle queue is three lanes, lane changing operation is involved in the vehicle group recombination process, surrounding vehicles and environment information are acquired by the network-connected vehicles through V2V communication and vehicle-mounted sensors, and the information comprises vehicle speed, vehicle body length, torque, power, vehicle distance, vehicle position information and intersection signal lamp phase;
The method for detecting the lane change safety area used in the lane change operation is as follows:
as shown in FIG. 2, the method is based on the kinematics of the vehicle, in which the vertex coordinates [ x ] of the right front end direction of the vehicle are when the vehicle is at a certain position p1 (t),y p1 (t)]At time t (t m <t<t n ) Expressed as:
wherein :vp (t)、θ p (t) the host vehicle speed and yaw angle of the vehicle, respectively; t is t m T is the initial time point of lane change of the main vehicle n Then the time point is finished;
similarly, the left front vertex [ x ] of the main vehicle at time t p2 (t),y p2 (t)]Left rear vertex [ x p3 (t),y p3 (t)]Right rear vertex [ x p4 (t),y p4 (t)]The coordinates are respectively:
wherein: a is the length of the vehicle; b is the vehicle width;
in the course of vehicle lane changing, the host vehicle can obtain the reasonable safe area of lane changing of vehicle according to the condition analysis that the vehicle does not collide with surrounding vehicles;
if the main vehicle changes lanes leftwards at a certain time point in the future, the condition that the main vehicle collides with the front vehicle is that the speed of the main vehicle is larger than that of the front vehicle, the vehicle distance is gradually shortened, and the situation that the right front vertex of the main vehicle collides with the left rear vertex of the front vehicle occurs;
let the collision point be S 1 The collision time point isImpact point S 1 Coordinates of (c)Expressed as:
in the formula :vf (t) is the speed of the front vehicle of the current lane at the moment t, D 1 The distance between the main vehicle and the front vehicle of the lane; as shown in fig. 2, if the vehicle collides with the rear vehicle of the lane change target lane, the speed of the main vehicle needs to be smaller than that of the rear vehicle, the distance between the two vehicles is reduced with time, and the lane change is performed at this time, so that the left Fang Ding point of the main vehicle collides with the rear vehicle of the target lane; let the collision point be S 2 The collision time point isThen according to the vehicle structureSize and autonomous vehicle kinematics theory, collision point S 2 The coordinates of (c) are expressed as:
in the formula :vr (t) the speed of the vehicle behind the target lane at the time t, D 1 The relative distance between the autonomous vehicle and the rear vehicle of the target lane is l, and the lane width is l;
according to the collision point S 1 、S 2 The point coordinates determine a lane change security domain of the vehicle to avoid collisions; the right lane of the vehicle is changed, and the same is true; the intelligent network-connected vehicle judges whether the security domain meets the channel change condition according to the detected surrounding vehicle information.
In the third step, each intelligent network vehicle is used as an intelligent agent, n network vehicles in the vehicle group are regarded as n intelligent agents, the number n of network vehicles in the vehicle group is limited within the range allowed by calculation, and the n parallel intelligent agents control the n network vehicles to realize interaction; n agents share the same neural network and parameters; the improvement of the driving state of any networked vehicle is helpful to the rewarding gain of the vehicle group through the parameter sharing structure of the neural network algorithm;
the intelligent network-connected vehicles in the vehicle queue interact with vehicles in the adjacent vehicle queue, meanwhile, the intelligent network-connected vehicles in the vehicle queue and the manual driving vehicles also keep interaction cooperation, and the interaction cooperation method comprises the following steps:
In the method A, when the vehicle is on a common lane, the vehicle queues are cooperatively controlled by self-adaptive cruising and regenerative braking, so that a reasonable vehicle distance is kept between vehicles, namely, a safe longitudinal gap is required to be continuously kept between two continuous vehicles; the deviation from the safe distance, namely the distance error is as small as possible to reduce collision risk, and the advantages of low oil consumption and high traffic throughput of the vehicle queue are exerted, so that the vehicle distance between the intelligent network vehicle and the common vehicle is required to be larger for compatibility with the running randomness of the common vehicle, and when the vehicle brakes, part of braking energy is recovered by the motor;
and B, when the vehicles approach the intersection with the signal lamp, splitting and recombining all vehicles in the vehicle queue to reduce energy consumption and running time, so that part of the vehicles in the queue sequentially pass through the intersection before the green light signal is cut off, and the rest vehicles wait before stopping the line.
In the fourth step, as shown in fig. 3, the software-Critic reinforcement learning algorithm, namely the SAC algorithm, is an Off-policy model-free non-strategy deep reinforcement learning algorithm combining maximized entropy learning with an Actor-Critic framework; the learning content of the SAC reinforcement learning algorithm comprises a state s, an action a, a reward r and an environment model rho; the states comprise fuel consumption, battery state of charge, speed, acceleration, yaw angle and distance of the vehicle, the actions are torque and steering angle, and the rewards are fuel consumption, running time and comfort, and are self-adaptive cruise cost functions;
In the fourth step, the SAC algorithm trains and learns sample data from the environment and continuously updates and optimizes the sample data to finally obtain an optimal strategy, so that intelligent network vehicles in the mixed vehicle group can be distributed in different lanes according to the driving style of a driver, and the mixed vehicles in the same lane form queues with different lengths;
the driving styles of different vehicles are classified into aggressive, robust and discreet types; when the vehicle train is driving on a road and lanes are allocated to different vehicles, the aggressive style vehicles tend to be arranged in the leftmost lane, the robust vehicles tend to be in the middle lane, and the discreet vehicles in the rightmost lane; the degree of gain for the vehicle group; the SAC algorithm adjusts the final allocation result of the vehicle driving lane depending on the degree of gain to the vehicle group.
The SAC algorithm consists of 1 actor neural network and 4 critic neural networks; the input of the actor neural network is a state, and the output is an action probability distribution parameter P (x);
the 4 critic neural networks are divided into state value estimation v critic and v critic target networks, and the action-state value estimation Q1 and Q2 critic neural networks; the input of the Q critic neural network is a state, and the output is the value of the state; wherein the output of the V critic neural network is V(s), representing an estimate of state value; the output of the Q critic neural network is Q (s, a), representing an estimate of the action-state versus value; n agents share the same neural network and parameters; the improvement of the driving state of any networked vehicle is helpful to the rewarding gain of the vehicle group through the parameter sharing structure of the neural network algorithm;
Entropy in the algorithm is defined as:
wherein x follows the probability density function P (x) distribution; the introduction of the maximum entropy enables the output of the actions to be more dispersed, and excessive concentration of the output actions is avoided, so that the exploration capacity of an algorithm, the learning capacity of a new task and the stability are improved; the optimal strategy in the SAC algorithm framework is expressed as:
pi represents the policy adopted by the agent, a is an action, S represents a state, and r represents a reward; alpha refers to a temperature parameter, and determines the relative importance of the rewarding entropy, so that the randomness of the optimal strategy is ensured;
the state space S of SAC is defined as:
wherein ,for driving style, soc is battery state of charge, v p Is the vehicle speed, a p Is the acceleration of the vehicle, t dri Refers to the driving time, theta is the yaw angle, d des Refers to the distance from the front vehicle; />
The action space a is defined as: a= { T p ,δ p Formula thirteen;
wherein Tp For the torque, delta of the host vehicle p The steering wheel angle of the vehicle;
the bonus function is defined as:
R={ω 1 ·m fuel c fuel2 ·P batt c elec3 ·(t dri -t ref )+ω 4 ·P rec5 ·J min equation fourteen;
ω 1 、ω 2 、ω 3 、ω 4 、ω 5 is a proportionality coefficient, m fuel Indicating the fuel consumption of the current intelligent network-connected vehicle, c fuel Is the price of fuel, P batt Finger motor power, c elec Price of electricity, t ref Is the reference travel time, P rec Refer to the braking energy recovery power, J min Is an adaptive cruise comprehensive cost function;
The vehicle driving action is independently executed by each network-connected vehicle, and the corresponding rewards are jointly optimized by collecting the control experience of the network-connected vehicle into a concentrated playback buffer zone;
for a specific state s t And action a t Soft value function Q of algorithm sft (s t ,a t ) The expression is as follows:
wherein, gamma E [0,1] is a scale factor;
to avoid overestimation in maximizing Q and further overestimation in calculating targets using target networks, the SAC algorithm introduces two online networks Q 1 and Q2 The parameters are e respectively 1 and e2 And two target networks and a target, the parameters are respectively and />Selecting the minimum function value output by the target network as the target value of the target frame; updating soft-valued network parameters by minimizing a loss function as follows:
representing the strategy by using a Gaussian distribution in a random strategy, namely mapping states into a mean value and a variance of the Gaussian distribution through parameters, and obtaining actions by sampling from the Gaussian distribution; if in state S t As input, outputting a gaussian distribution with mean and standard deviation; then motion a is obtained using a re-parameterization technique t The formula is:
in the formula ,εt For noise signals sampled from a standard normal distribution, Is the mean and standard deviation of the gaussian distribution, where μ (s t) and σ(st ) Respectively, mean and standard deviation of Gaussian distribution, τ t Is a noise signal sampled from a standard normal distribution;
the relationship of the policy function to the soft function is expressed as:
updating policy network parameters by minimizing Kullback-Leibler divergence; the smaller the Kullback-Leibler divergence, the smaller the difference between rewards corresponding to output behavior, and the better the convergence effect of the strategy; the update rules of the policy network are expressed as:
wherein Z(st ) The distribution function is used for normalizing the distribution;
finally, the strategy network parameters are updated according to the gradient descent method, and are expressed as:
the temperature coefficient represents the importance of the algorithm on entropy, and the temperature coefficient is regulated to be important for the training effect of the SAC algorithm; the optimal temperature coefficient is different according to the reinforcement learning task and the training period; for this use is made of a temperature coefficient automatic regulation mechanism; under this mechanism, a constrained optimization problem is constructed, and the optimal temperature coefficient of each step is obtained by minimizing the objective function, which is expressed as:
wherein H0 Is a predefined minimum policy entropy threshold.
Training and learning the related vehicle group queue reorganization, self-adaptive cruising and high-efficiency traffic data of the intersection by using a SAC reinforcement learning algorithm to obtain an optimal control strategy; the SAC considers the states of the networked vehicles in the vehicle group and finds an optimal self-adaptive cruise control strategy, and the optimal strategy is fed back to the torque and the corner of each networked vehicle corresponding to the self-adaptive cruise control to control the running track of the vehicle;
In the self-adaptive cruise control in the step three, the required vehicle distance is influenced by the driving style of a driver, the road commute efficiency and the vehicle safety, and the distance between the network-connected vehicle and the manually-driven vehicle is larger than the vehicle distance between the network-connected vehicles by considering the uncertainty of the driving intention of the manually-driven vehicle; if the distance between vehicles is too narrow, the commute efficiency is improved, but anxiety of drivers may cause collision accidents; in contrast, a larger vehicle distance is a guarantee of vehicle safety, but road commute efficiency is poor, and side vehicles are easy to insert;
the constant time headway CTH was used for the vehicle spacing algorithm as follows:
d des =τ h v h +d 0 formula twenty-two;
wherein τh Is the nominal headway, d 0 Is a safe stopping distance;
in terms of car following safety, there are the following constraint formulas:
d min <d<d max
Δd=d-d des
Δv min <Δv<Δv max
Δv=v p -v f
where d is the actual distance between the autonomous vehicle and the preceding vehicle, d min and dmax Is the minimum and maximum vehicle distance; deltav is the speed difference between the autonomous vehicle and the preceding vehicle, deltav min and Δvmax Is the minimum and maximum speed difference;
the comfort constraint formula is as follows; Δa=a p -a f ;a f The acceleration is the front vehicle acceleration;
the adaptive cruise integrated cost function is:
J min =ω 6 Δd 27 Δv 28 Δa 2 the formula twenty-third.
When the brake force distribution is limited in accordance with the ECE regulations, the following brake force distribution strategy is adopted;
When the braking strength z is less than z 1 When the braking force is provided by the front axle only; when the braking strength z 1 <z<z 2 When the front and rear axle braking forces are distributed along the ECE stipulated line; when the braking strength z 2 <z<z 3 When the front axle braking force is unchanged, the rear axle braking force is increased; when the braking strength z 3 When the braking force is less than z, the motor stops braking, and the braking force of the front shaft and the rear shaft is distributed along the beta line; during the whole braking process, if the motor braking force is insufficient, the hydraulic braking force compensates the loss of the total braking force; the brake force distribution is formulated as follows:
the boundary for z is calculated as follows:
wherein Fbf Indicating front axle braking force, F br For rear axle braking force, F b For total required braking force, L is the total wheelbase, k is the rear wheelbase, h g Is the height of the mass center, T bmax Is the maximum value of motor braking moment, beta is the braking force distribution coefficient,r is a correction coefficient of the rotating mass w For the radius of the wheel, i t Is the vehicle transmission ratio, η is the transmission efficiency.
And fourthly, when the vehicle group approaches an intersection with a signal lamp, the optimal strategy adjusts the driving torque, the rotation angle and the braking force of each network-connected vehicle according to the timing of traffic light signals, and queue reorganization and queue length planning are carried out, so that part of vehicles form queues to pass through the intersection in a green light period, and the rest vehicles wait before stopping the line, thereby reducing the energy consumption of the whole vehicle group and realizing better economy and passing efficiency.

Claims (9)

1. The PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control is characterized in that:
the mixed vehicle group is a vehicle group consisting of intelligent network-connected vehicles and conventional autonomous vehicles driven manually, and the control method comprises the following steps of;
step one, determining a number type driving style by clustering driving data of various working conditions through a plurality of characteristic parameters;
step two, determining a lane change safety area of the vehicle according to the acquired vehicle state information and surrounding environment information;
step three, in the self-adaptive cruise control, vehicle spacing, vehicle speed and acceleration are restrained based on safety and comfort, and braking energy is reasonably recovered during braking; when approaching the intersection, the queues are recombined again, so that the vehicle groups pass through orderly;
and fourthly, training sample data in the environment based on a software-Critic reinforcement learning algorithm, and continuously and iteratively updating according to a set loss function to finally obtain an optimal strategy capable of enabling a vehicle group to reorganize the queues according to driving styles of different drivers on roads with no more than three lanes.
2. The PHEV hybrid vehicle cluster-optimization control method for queue management and adaptive cruise control of claim 1, wherein: the vehicles at the head of the vehicle queue of the vehicle group are intelligent networking vehicles, and the length of the queue is limited to not more than 8 vehicles;
In the first step, the specific classification method of different driving styles is to reduce the dimension of the characteristic parameters from the historical data of a large number of manual driving vehicles by a principal component analysis method, and then three classified driving styles are obtained by adopting K-mean algorithm clustering, wherein the different driving styles are represented in the following characteristic parameters: average longitudinal speedMaximum longitudinal vehicle speed v max Minimum longitudinal vehicle speed v min Longitudinal vehicle speed standard deviation epsilon υ Longitudinal acceleration average->Maximum value of longitudinal acceleration a x max Minimum value of longitudinal acceleration a x min Standard deviation sigma of longitudinal acceleration x Lateral acceleration average->Maximum value of transverse acceleration a y max Minimum value of lateral acceleration a y min Standard deviation sigma of transverse acceleration y Headway THW, collision time parameter TTC, minimum headway DHW min The method comprises the steps of carrying out a first treatment on the surface of the Longitudinal vehicle speed standard deviation epsilon υ The calculation is as follows
The headway THW is the time required for the head of the main vehicle to reach the tail position of the front vehicle under the current vehicle speed running, the collision time parameter TTC is the time required for the collision between the main vehicle and the front vehicle under the current state running, and the calculation formula is as follows:
υ rel =υ pf a formula IV;
wherein ,drel Is the relative distance between two vehicles, v p Is the speed of the main vehicle, v f The vehicle speed is the front vehicle speed;
the distance between the two vehicles is the larger the distance between the main vehicle head and the front vehicle head of the same lane, and the smaller the collision accident possibility of the two vehicles is; conversely, the greater the possibility of accident of the two vehicles; the smaller the head space is, the more aggressive the driver drives are reflected, and therefore, the minimum head space is selected as a driving style characteristic parameter index.
3. The PHEV hybrid vehicle cluster-optimization control method for queue management and adaptive cruise control of claim 1, wherein: step two, the driving road of the vehicle queue is three lanes, lane changing operation is involved in the vehicle group recombination process, surrounding vehicles and environment information are acquired by the network-connected vehicles through V2V communication and vehicle-mounted sensors, and the information comprises vehicle speed, vehicle body length, torque, power, vehicle distance, vehicle position information and intersection signal lamp phase;
the method for detecting the lane change safety area used in the lane change operation is as follows:
the method is based on vehicle kinematics, wherein when the vehicle is at a specific position, the vertex coordinate [ x ] of the right front end direction of the vehicle p1 (t),y p1 (t)]At time t (t m <t<t n ) Expressed as:
wherein :υp (t)、θ p (t) the host vehicle speed and yaw angle of the vehicle, respectively; t is t m T is the initial time point of lane change of the main vehicle n Then the time point is finished;
similarly, the left front vertex [ x ] of the main vehicle at time t p2 (t),y p2 (t)]Left rear vertex [ x p3 (t),y p3 (t)]Right rear vertex [ x p4 (t),y p4 (t)]The coordinates are respectively:
wherein: a is the length of the vehicle; b is the vehicle width;
in the course of vehicle lane changing, the host vehicle can obtain the reasonable safe area of lane changing of vehicle according to the condition analysis that the vehicle does not collide with surrounding vehicles;
If the main vehicle changes lanes leftwards at a certain time point in the future, the condition that the main vehicle collides with the front vehicle is that the speed of the main vehicle is larger than that of the front vehicle, the vehicle distance is gradually shortened, and the situation that the right front vertex of the main vehicle collides with the left rear vertex of the front vehicle occurs;
let the collision point be S 1 The collision time point isImpact point S 1 Coordinates of->Expressed as:
in the formula :υf (t) is the speed of the front vehicle of the current lane at the moment t, D 1 The distance between the main vehicle and the front vehicle of the lane; if collision occurs with the rear vehicle of the lane change target lane, the speed of the main vehicle is smaller than that of the rear vehicle, the distance between the two vehicles is reduced along with time, and the lane change is performed at the moment, and the left Fang Ding point of the main vehicle collides with the rear vehicle of the target lane; let the collision point be S 2 The collision time point isThe collision point S is based on the vehicle structural dimensions and the autonomous vehicle kinematics theory 2 The coordinates of (c) are expressed as:
in the formula :υr (t) the speed of the vehicle behind the target lane at the time t, D 1 The relative distance between the autonomous vehicle and the rear vehicle of the target lane is l, and the lane width is l;
according to the collision point S 1 、S 2 The point coordinates determine a lane change security domain of the vehicle to avoid collisions; the right lane of the vehicle is changed, and the same is true; the intelligent network-connected vehicle judges whether the security domain meets the channel change condition according to the detected surrounding vehicle information.
4. The PHEV hybrid vehicle cluster-optimization control method for queue management and adaptive cruise control of claim 1, wherein: in the third step, each intelligent network vehicle is used as an intelligent agent, n network vehicles in the vehicle group are regarded as n intelligent agents, the number n of network vehicles in the vehicle group is limited within the range allowed by calculation, and the n parallel intelligent agents control the n network vehicles to realize interaction; n agents share the same neural network and parameters; the improvement of the driving state of any networked vehicle is helpful to the rewarding gain of the vehicle group through the parameter sharing structure of the neural network algorithm;
the intelligent network-connected vehicles in the vehicle queue interact with vehicles in the adjacent vehicle queue, meanwhile, the intelligent network-connected vehicles in the vehicle queue and the manual driving vehicles also keep interaction cooperation, and the interaction cooperation method comprises the following steps:
in the method A, when the vehicle is on a common lane, the vehicle queues are cooperatively controlled by self-adaptive cruising and regenerative braking, so that a reasonable vehicle distance is kept between vehicles, namely, a safe longitudinal gap is required to be continuously kept between two continuous vehicles; the deviation from the safe distance, namely the distance error is as small as possible to reduce collision risk, and the advantages of low oil consumption and high traffic throughput of the vehicle queue are exerted, so that the vehicle distance between the intelligent network vehicle and the common vehicle is required to be larger for compatibility with the running randomness of the common vehicle, and when the vehicle brakes, part of braking energy is recovered by the motor;
And B, when the vehicles approach the intersection with the signal lamp, splitting and recombining all vehicles in the vehicle queue to reduce energy consumption and running time, so that part of the vehicles in the queue sequentially pass through the intersection before the green light signal is cut off, and the rest vehicles wait before stopping the line.
5. The PHEV hybrid vehicle group optimal control method for queue management and adaptive cruise control of claim 4, wherein: in the fourth step, the software-Critic reinforcement learning algorithm, namely SAC algorithm, is an Off-policy model-free non-strategy deep reinforcement learning algorithm combining maximized entropy learning with an Actor-Critic framework; the learning content of the SAC reinforcement learning algorithm comprises a state s, an action a, a reward r and an environment model rho; the states comprise fuel consumption, battery state of charge, speed, acceleration, yaw angle and distance of the vehicle, the actions are torque and steering angle, and the rewards are fuel consumption, running time and comfort, and are self-adaptive cruise cost functions; in the fourth step, the SAC algorithm trains and learns sample data from the environment and continuously updates and optimizes the sample data to finally obtain an optimal strategy, so that intelligent network vehicles in the mixed vehicle group can be distributed in different lanes according to the driving style of a driver, and the mixed vehicles in the same lane form queues with different lengths;
The driving styles of different vehicles are classified into aggressive, robust and discreet types; when the vehicle train is driving on a road and lanes are allocated to different vehicles, the aggressive style vehicles tend to be arranged in the leftmost lane, the robust vehicles tend to be in the middle lane, and the discreet vehicles in the rightmost lane; the degree of gain for the vehicle group; the SAC algorithm adjusts the final allocation result of the vehicle driving lane depending on the degree of gain to the vehicle group.
6. The PHEV hybrid vehicle group optimal control method for queue management and adaptive cruise control of claim 5, wherein: the SAC algorithm consists of 1 actor neural network and 4 critic neural networks; the input of the actor neural network is a state, and the output is an action probability distribution parameter P (x);
the 4 critic neural networks are divided into state value estimation v critic and v critic target networks, and the action-state value estimation Q1 and Q2 critic neural networks; the input of the Q critic neural network is a state, and the output is the value of the state; wherein the output of the V critic neural network is V(s), which represents the estimation of state value; the output of the Q critic neural network is Q (s, a), representing an estimate of the action-state versus value; n agents share the same neural network and parameters; the improvement of the driving state of any networked vehicle is helpful to the rewarding gain of the vehicle group through the parameter sharing structure of the neural network algorithm;
Entropy in the algorithm is defined as:
wherein x follows the probability density function P (x) distribution; the introduction of the maximum entropy enables the output of the actions to be more dispersed, and excessive concentration of the output actions is avoided, so that the exploration capacity of an algorithm, the learning capacity of a new task and the stability are improved; the optimal strategy in the SAC algorithm framework is expressed as:
pi represents the policy adopted by the agent, a is an action, s represents a state, and r represents a reward; alpha refers to a temperature parameter, and determines the relative importance of the rewarding entropy, so that the randomness of the optimal strategy is ensured;
the state space S of SAC is defined as:
wherein ,for driving style, soc is battery state of charge, v p Is the vehicle speed, a p Is the acceleration of the vehicle, t dri Refers to the driving time, theta is the yaw angle, d des Refers to the distance from the front vehicle;
the action space a is defined as: a= { T p ,δ p Formula thirteen;
wherein Tp For the torque, delta of the host vehicle p The steering wheel angle of the vehicle;
the bonus function is defined as:
R={ω 1 ·m fuel c fuel2 ·P batt c elec3 ·(t dri -t ref )+ω 4 ·P rec5 ·J min }
formula fourteen;
ω 1 、ω 2 、ω 3 、ω 4 、ω 5 is a proportionality coefficient, m fuel Indicating the fuel consumption of the current intelligent network-connected vehicle, c fuel Is the price of fuel, P batt Finger motor power, c elec Price of electricity, t ref Is the reference travel time, P rec Refer to the braking energy recovery power, J min Is an adaptive cruise comprehensive cost function;
The vehicle driving action is independently executed by each network-connected vehicle, and the corresponding rewards are jointly optimized by collecting the control experience of the network-connected vehicle into a concentrated playback buffer zone;
for a specific state s t And action a t Soft value function Q of algorithm soft (s t ,a t ) The expression is as follows:
fifteen equations;
wherein, gamma E [0,1] is a scale factor;
to avoid overestimation in maximizing Q and further overestimation in calculating targets using target networks, the SAC algorithm introduces two online networks Q 1 and Q2 The parameters are e respectively 1 and e2 And two target networks upsilon and upsilon target, the parameters are respectivelyAndselecting the minimum function value output by the target network as the target value of the target frame; updating soft-valued network parameters by minimizing a loss function as follows:
representing the strategy by using a Gaussian distribution in a random strategy, namely mapping states into a mean value and a variance of the Gaussian distribution through parameters, and obtaining actions by sampling from the Gaussian distribution; if in state s t As input, outputting a gaussian distribution with mean and standard deviation; then motion a is obtained using a re-parameterization technique t The formula is:
in the formula ,εt For noise signals sampled from a standard normal distribution, Is the mean and standard deviation of the gaussian distribution, where μ (s t) and σ(st ) Respectively, mean and standard deviation of Gaussian distribution, τ t Is a noise signal sampled from a standard normal distribution;
the relationship of the policy function to the soft function is expressed as:
updating policy network parameters by minimizing Kullback-Leibler divergence; the smaller the Kullback-Leibler divergence, the smaller the difference between rewards corresponding to output behavior, and the better the convergence effect of the strategy; the update rules of the policy network are expressed as:
wherein Z(st ) The distribution function is used for normalizing the distribution;
finally, the strategy network parameters are updated according to the gradient descent method, and are expressed as:
the temperature coefficient represents the importance of the algorithm on entropy, and the temperature coefficient is regulated to be important for the training effect of the SAC algorithm; the optimal temperature coefficient is different according to the reinforcement learning task and the training period; for this use is made of a temperature coefficient automatic regulation mechanism; under this mechanism, a constrained optimization problem is constructed, and the optimal temperature coefficient of each step is obtained by minimizing the objective function, which is expressed as:
wherein H0 Is a predefined minimum policy entropy threshold.
7. The PHEV hybrid vehicle group optimal control method for queue management and adaptive cruise control of claim 5, wherein: training and learning the related vehicle group queue reorganization, self-adaptive cruising and high-efficiency traffic data of the intersection by using a SAC reinforcement learning algorithm to obtain an optimal control strategy; the SAC considers the states of the networked vehicles in the vehicle group and finds an optimal self-adaptive cruise control strategy, and the optimal strategy is fed back to the torque and the corner of each networked vehicle corresponding to the self-adaptive cruise control to control the running track of the vehicle;
In the self-adaptive cruise control in the step three, the required vehicle distance is influenced by the driving style of a driver, the road commute efficiency and the vehicle safety, and the distance between the network-connected vehicle and the manually-driven vehicle is larger than the vehicle distance between the network-connected vehicles by considering the uncertainty of the driving intention of the manually-driven vehicle; if the distance between vehicles is too narrow, the commute efficiency is improved, but anxiety of drivers may cause collision accidents; in contrast, a larger vehicle distance is a guarantee of vehicle safety, but road commute efficiency is poor, and side vehicles are easy to insert;
the constant time headway CTH was used for the vehicle spacing algorithm as follows:
d des =τ h υ h +d 0 formula twenty-two;
wherein τh Is the nominal headway, d 0 Is a safe stopping distance;
in terms of car following safety, there are the following constraint formulas:
d min <d<d max
Δd=d-d des
Δυ min <Δυ<Δυ max
Δυ=υ pf
where d is the actual distance between the autonomous vehicle and the preceding vehicle, d min and dmax Is the minimum and maximum vehicle distance; deltav is the speed difference between the autonomous vehicle and the preceding vehicle, deltav min and Δυmax Is the minimum and maximum speed difference;
the comfort constraint formula is as follows; Δa=a p -a f ;a f The acceleration is the front vehicle acceleration;
the adaptive cruise integrated cost function is:
J min =ω 6 Δd 27 Δυ 28 Δa 2 the formula twenty-third.
8. The PHEV hybrid vehicle group optimal control method for queue management and adaptive cruise control of claim 5, wherein: when the brake force distribution is limited in accordance with the ECE regulations, the following brake force distribution strategy is adopted;
When the braking strength z is less than z 1 When the braking force is provided by the front axle only; when the braking strength z 1 <z<z 2 When the front and rear axle braking forces are distributed along the ECE stipulated line; when the braking strength z 2 <z<z 3 When the front axle braking force is unchanged, the rear axle braking force is increased; when the braking strength z 3 When the braking force is less than z, the motor stops braking, and the braking force of the front shaft and the rear shaft is distributed along the beta line; at the whole processDuring the braking process, if the motor braking force is insufficient, the hydraulic braking force compensates the loss of the total braking force; the brake force distribution is formulated as follows:
the boundary for z is calculated as follows:
wherein Fbf Indicating front axle braking force, F br For rear axle braking force, F b For total required braking force, L is the total wheelbase, k is the rear wheelbase, h g Is the height of the mass center, T bmax For the maximum value of motor braking moment, beta is the braking force distribution coefficient, theta is the correction coefficient of rotating mass, r w For the radius of the wheel, i t Is the vehicle transmission ratio, η is the transmission efficiency.
9. The PHEV hybrid vehicle group optimal control method for queue management and adaptive cruise control of claim 5, wherein: and fourthly, when the vehicle group approaches an intersection with a signal lamp, the optimal strategy adjusts the driving torque, the rotation angle and the braking force of each network-connected vehicle according to the timing of traffic light signals, and queue reorganization and queue length planning are carried out, so that part of vehicles form queues to pass through the intersection in a green light period, and the rest vehicles wait before stopping the line, thereby reducing the energy consumption of the whole vehicle group and realizing better economy and passing efficiency.
CN202310945210.7A 2023-07-31 2023-07-31 PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control Pending CN116853245A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310945210.7A CN116853245A (en) 2023-07-31 2023-07-31 PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310945210.7A CN116853245A (en) 2023-07-31 2023-07-31 PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control

Publications (1)

Publication Number Publication Date
CN116853245A true CN116853245A (en) 2023-10-10

Family

ID=88223446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310945210.7A Pending CN116853245A (en) 2023-07-31 2023-07-31 PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control

Country Status (1)

Country Link
CN (1) CN116853245A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117711182A (en) * 2023-12-21 2024-03-15 交通运输部公路科学研究所 Intelligent network-connected vehicle track collaborative optimization method for intersection environment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117711182A (en) * 2023-12-21 2024-03-15 交通运输部公路科学研究所 Intelligent network-connected vehicle track collaborative optimization method for intersection environment
CN117711182B (en) * 2023-12-21 2024-06-11 交通运输部公路科学研究所 Intelligent network-connected vehicle track collaborative optimization method for intersection environment

Similar Documents

Publication Publication Date Title
CN107117170B (en) A kind of real-time prediction cruise control system driven based on economy
CN109501799B (en) Dynamic path planning method under condition of Internet of vehicles
CN110930697B (en) Rule-based intelligent networked vehicle cooperative convergence control method
CN111746539B (en) Intelligent network-connected automobile strict and safe lane-changing enqueueing control method
CN113753026B (en) Decision-making method for preventing rollover of large commercial vehicle by considering road adhesion condition
CN109410561A (en) A kind of even heterogeneous formation travel control method of highway vehicle
CN113788021A (en) Adaptive following cruise control method combined with preceding vehicle speed prediction
CN113489793B (en) Expressway double-lane cooperative control method in mixed traffic scene
CN116853245A (en) PHEV hybrid vehicle group optimization control method for queue management and adaptive cruise control
CN111959492A (en) HEV energy management hierarchical control method considering lane change behavior in networking environment
CN113593275B (en) Intersection internet automatic driving method based on bus signal priority
CN116740945B (en) Method and system for multi-vehicle collaborative grouping intersection of expressway confluence region in mixed running environment
CN113886764A (en) Intelligent vehicle multi-scene track planning method based on Frenet coordinate system
CN115273450A (en) Lane changing method for vehicles entering formation under network connection automatic driving environment
CN115257789A (en) Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment
Zheng et al. Research on control target of truck platoon based on maximizing fuel saving rate
CN112758105A (en) Automatic driving fleet following running control method, device and system
CN113459829B (en) Intelligent energy management method for double-motor electric vehicle based on road condition prediction
CN114537420B (en) Urban bus rapid transit energy-saving driving control method based on dynamic planning
CN115712950A (en) Automatic driving decision-making method for semi-trailer
Choi et al. Coordinated steering angle and yaw moment distribution to increase vehicle regenerative energy in autonomous driving
Xia et al. Ecological cooperative adaptive cruise control of over‐actuated electric vehicles with in‐wheel motor in traffic flow
Shahram et al. Utilizing Speed Information Forecast in Energy Optimization of an Electric Vehicle with Adaptive Cruise Controller
Yan et al. Velocity Trajectory Planning of Electric Vehicles with Consideration of the Passenger's Individual Preferences
Xu et al. Look-ahead Horizon based Energy Optimization for Connected Hybrid Electric Vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination