CN115171388A

CN115171388A - Multi-intersection travel time collaborative optimization method for intelligent internet vehicle

Info

Publication number: CN115171388A
Application number: CN202210851099.0A
Authority: CN
Inventors: 许明; 高登磊
Original assignee: Liaoning Technical University
Current assignee: Liaoning Technical University
Priority date: 2022-07-20
Filing date: 2022-07-20
Publication date: 2022-10-11

Abstract

The invention belongs to the field of artificial intelligence, and particularly relates to a multi-intersection travel time collaborative optimization method for an intelligent internet vehicle; the intelligent internet vehicle is combined with a reinforcement learning method, a new reward function is provided, the average speed of vehicles in the traffic system is used as a reward value by the reward function, and punishment is carried out on the condition that the deceleration of vehicle running in the traffic system is lower than a comfort level parameter. And the vehicle is configured with an IDM following model to simulate an artificial vehicle by utilizing SUMO software, the calculation of comfort level parameters in the IDM following model is improved, and the comfort level parameters of the IDM following model are calculated according to the current traffic flow, the lane length and the expected speed of the vehicle. The invention proves that the traffic flow rate is effectively improved and the traffic stability is improved by combining the reinforcement learning method with the ICV. And the ICV after reinforcement learning can effectively reduce the frequent change of the vehicle acceleration.

Description

Multi-intersection travel time collaborative optimization method for intelligent internet vehicle

Technical Field

The invention belongs to the field of artificial intelligence, and particularly relates to a multi-intersection travel time collaborative optimization method for an intelligent internet vehicle.

Background

The Internet of vehicles sufficiently divides the concept of the Internet of things into the traffic field. The car networking connects three information subjects of people, cars and environment by a network by means of modern electronic sensing, radio communication and control technologies, and intelligent control of the cars is realized under the support of big data. The intelligent networking vehicle is a general term for all networking vehicles when the vehicle networking technology is developed to a mature stage. In China, the generation of Tesla automobiles proves the fact that intelligent internet vehicles and artificial vehicles can keep coexisting for a long time. The phenomenon increases the difficulty of intelligent networking vehicle cooperation. The intelligent internet vehicle enables the vehicle to obtain basic state information such as the position, the speed, the acceleration and the like of surrounding vehicles, and even can obtain the information through the centralized processor to adjust the state of the vehicle. At present, a multi-agent reinforcement learning method is generally adopted to solve problems in a traffic system in a mixed traffic scene of an intelligent internet vehicle and an artificial vehicle.

By utilizing the deep reinforcement learning method, a machine is used for learning human behaviors in a virtual scene, and finally, an artificial intelligence body which can learn to be excellent in various challenging tasks is generated, but the defects of low training efficiency, long time and the like of a DQN (deep reinforcement learning method) method cannot be overcome. In traffic systems, traffic congestion, which results in a lot of time waste and traffic slowness, is one of the major challenges that traffic authorities and traffic participants have to overcome. Among the numerous traffic congestion problems, it is important to solve the problem of intersection congestion. Therefore, it is necessary to solve the problem of intersection blockage in mixed traffic flow by using a multi-agent reinforcement learning method.

In the early solution of the intersection blockage problem, a centralized processing method and a distributed processing method are generally adopted. The idea of the centralized processing method is to consider a cooperative environment, directly extend the single agent algorithm, and directly learn the output of a joint action, but it is not good to give how the single agent should make a decision, thereby leading to the situation that the vehicle acceleration changes frequently. The distributed processing method is that each agent independently learns the reward function of the agent, and for each agent, other agents are part of the environment, so that the unstable state of the environment is always needed to be considered. The prior art introduces that vehicles can utilize a first-in first-out strategy to send intersection requests to a central processing unit one by one, then the central processing unit processes the requests in a centralized way, then confirmation requests are sent to the vehicles, and finally the vehicles receive confirmation information and queue up to pass through intersections. In the continuing paragraph, the solution is extended to a network of interconnected intersections with the aim of exploring the best route to guide the vehicle to the intersection to minimize its delay through the network. The idea of reservation-based signalless intersection solutions has been further developed and FIFO (first in first out) queuing strategies have been relaxed. By relaxing the FIFO, there is better performance than previous FIFO-based reservation-based schemes. The fuzzy control model is derived from a FIFO queuing strategy, when a vehicle enters an intersection (or a certain intersection), the vehicle sends a passing request to a centralized processor, the centralized processor also groups the vehicles according to the vehicle information (position, vehicle size and the like) of the intersection, the centralized processor calculates the average waiting time of each group of vehicles according to a fuzzy rule, and then the vehicles are arranged in a grouping sequence according to the grade of each group. Grouping is carried out according to fuzzy rules, vehicles send requests to the centralized controller, the requests can pass through by the passing party, but the models actually follow the mode of traffic lights for commanding traffic, so that the vehicles can pass in order, the problem of frequent change of acceleration of the vehicles, caused by ICV, cannot be completely reflected, and meanwhile, the effect of the following models on the vehicles is weakened.

Disclosure of Invention

Aiming at the defects of the prior art, the invention designs a multi-intersection travel time collaborative optimization method for an intelligent internet vehicle.

A multi-intersection travel time collaborative optimization method for an intelligent internet vehicle specifically comprises the following steps:

step 1: establishing a traffic intersection scene with three main lanes in the SUMO, two merging lanes and two merging lanes, and establishing vehicle limiting speed and lane flow direction to simulate the actual traffic intersection condition, adding unstable parameters when merging the lanes, and allowing the right-most vehicle to suddenly turn left when approaching a left-turn intersection;

and 2, step: a traffic flow parameter is set to control the number of vehicle inrush lanes per hour, and in the process of simulating traffic intersection scenes: adding a following model and a lane changing model to simulate a human-driven vehicle in the real world, so that the vehicle judges whether the vehicle needs to accelerate, decelerate and change lanes according to the state of the vehicle in front;

step 2.1: using an IDM following model, wherein the model uses the expected speed of the current vehicle and the distance between the current vehicle and the front vehicle as variables to calculate the optimal acceleration required by the current vehicle, and specifically comprises the following steps:

wherein the desired velocity v0, the desired distance s ^* Time interval T, minimum gap s ₀ An acceleration index δ, an acceleration term α, a self-speed v, and a comfort b, a vehicle travel distance, i.e., a distance s from a preceding vehicle and a speed difference Δ v of the self-vehicle compared to the preceding vehicle; a is the current vehicle acceleration;

and increasing a comfort variable b in order to improve the comfort of passengers riding the vehicle; b are different so that different desired distances s result ^* The greater the expected distance beyond the safe vehicle distance will affect the efficiency of the overall traffic system; in order to maximize the optimal acceleration of the vehicle, the comfort b is improved, and the formula is as follows:

F＝Vρ

wherein F is traffic flow, V is traffic flow speed, rho is traffic flow density, H is lane length, H is vehicle body length, and comfort level is b; the comfort level is improved according to the traffic flow and the traffic speed;

step 2.2: judging whether the current vehicle changes lanes or not through a lane changing model; four different lane changing motivations are clearly defined in the lane changing model: stratetic change Strategic lane change, cooperative change collaborative lane change, tactical change Tactical lane change, obblication change obligation lane change;

whether lane changing action can be carried out is determined by calculating lane changing requirements through the vehicle speed and the expected speed and calculating lane changing urgency according to the vehicle speed, the expected speed, the distance from the leading vehicle and the lane occupancy rate;

selecting a priority alternative lane according to the lane change requirement of the vehicle; calculating the safe speed of the current lane and combining the speed requirement of the alternative lane change; determining whether to change lanes or not according to the lane changing requirement of the vehicle and the magnitude of the lane changing emergency degree;

and 3, step 3: combining a vehicle as a multi-agent with a reinforcement learning method ppo, namely, an intelligent networking vehicle;

taking the vehicle speed, position, acceleration and expected speed as a state space and the acceleration as an action space; designing in a ppo reward function; taking the sum of the average speeds of all vehicles as reward value, setting comfort level as condition, and adding punishment to reward function if the deceleration of the vehicles is less than the comfort level; the concrete formula is as follows:

the reward function re consists of three aspects: average velocity V _averge Instantaneous brake deceleration A _real Collision penalty Z; w is a ₁ ，w ₂ ，w ₃ Weight representing average speed, collision of two vehicles and over-rapid deceleration, A _max Is the set maximum acceleration; (ii) a Under the condition of normal vehicle running, when the vehicle decelerates, the deceleration can be slowly increased in a certain area, but the front leader vehicle stops suddenly, the self vehicle brakes forcibly, and the acceleration is set as the instantaneous braking deceleration; along with the continuous convergence of the intelligent network connection vehicle, the integral average speed is improved.

The invention has the beneficial technical effects that:

with the increase of the urbanization rate, the traffic jam condition is concentrated at the intersection, and in order to improve the traffic flow rate and the traffic stability and solve the problem of traffic jam generated by mixed traffic flow formed by the intelligent internet connection vehicle and the non-internet connection vehicle at the multi-intersection, the invention provides the multi-intersection travel time collaborative optimization method of the intelligent internet connection vehicle. The intelligent internet vehicle is combined with a reinforcement learning method, a new reward function is provided, the reward function takes the average speed of vehicles in the traffic system as a reward value, and punishment is carried out on the condition that the deceleration of the vehicles in the traffic system is lower than a comfort level parameter. And the vehicle is configured with an IDM following model to simulate an artificial vehicle by utilizing SUMO software, the calculation of comfort level parameters in the IDM following model is improved, and the comfort level parameters of the IDM following model are calculated according to the current traffic flow, the lane length and the expected speed of the vehicle. The invention proves that the traffic flow rate is effectively improved and the traffic stability is improved by combining the reinforcement learning method with the ICV. And the ICV subjected to reinforcement learning is verified to be capable of effectively reducing the frequent vehicle acceleration change condition.

Drawings

Fig. 1 is a schematic diagram of an intersection environment structure simulated by a multi-intersection travel time collaborative optimization method of an intelligent internet vehicle in an embodiment of the invention:

FIG. 2 is a flowchart of a method for collaborative optimization of travel time at multiple intersections of an intelligent networked vehicle according to an embodiment of the present invention.

FIG. 3 illustrates a traffic scenario at a continuous T-junction in accordance with an embodiment of the present invention;

FIG. 4 is a result of an experiment of a manually driven vehicle simulated by a following model and a lane change model according to an embodiment of the present invention;

FIG. 5 shows experimental results of a combination of a vehicle as a multi-agent and a reinforcement learning method ppo;

FIG. 6 is a schematic diagram illustrating a relationship between traffic flow and speed according to an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the drawings and examples;

the method is realized by combining SUMO software with a FLOW framework and utilizing a Rllab reinforcement learning algorithm.

The experiment is simulated through SUMO simulation software and a FLOW secondary open source framework, the experiment operation framework is shown in figure 1, a traffic scene of a continuous T-shaped intersection is adopted in a simulation scene, the traffic scene is shown in figure 3, and the driving route of a vehicle can be temporarily changed when the vehicle is about to pass through the intersection. And introducing simulation environment and parameter setting, and simultaneously performing algorithm analysis.

A multi-intersection travel time collaborative optimization method for an intelligent internet vehicle specifically comprises the following steps as shown in the attached figure 2:

step 1: a traffic intersection scene that a main lane is three lanes, a converging lane and a converging lane are two lanes is established in an SUMO, and vehicle limiting speed and lane flow direction are established to simulate the situation of a real traffic intersection, such as 'from': edge _ { } format (i) 'to': edge _ { } format (i + 1). Adding unstable parameters when the vehicle leaves a lane, and allowing the vehicle on the rightmost side to turn left suddenly when the vehicle approaches a left-turn intersection;

simulation environment and parameter setting:

in the experiment, a public interface in a FLOW frame is connected with a Traci interface in SUMO simulation software to generate a required traffic scene, an unstable parameter of 0.1 is added when a lane is drawn out, and the situation that a rightmost vehicle turns left suddenly when approaching a left-turn intersection is allowed; then, a control strategy is generated for the ICV by utilizing reinforcement learning, training iteration is carried out for 100 times, one round is carried out for 200 time slots, each time slot is 0.2 second in length, and optimization is carried out by utilizing an optimizer in the period. And recording the iteration correlation index.

The experimental set-up of the hyper-parameters is shown in table 1. The significant digits of the same type of data need to be kept consistent.

Table 1 hyper-parameters for the experimental setup;

step 2: a traffic flow parameter is required to be set for controlling the number of vehicles rushing into lanes per hour, and in the process of simulating traffic intersection scenes: adding a following model and a lane changing model to simulate a human-driven vehicle in the real world, so that the vehicle judges whether the vehicle needs to accelerate, decelerate and change lanes according to the state of the vehicle in front;

step 2.1: using an IDM following model, wherein the model takes the expected speed of the current vehicle and the distance between the current vehicle and a leading vehicle as variables to calculate the optimal acceleration required by the current vehicle, and specifically comprises the following steps:

F＝Vρ

the presence of comfort increases the stability of the traffic system to some extent. But excessive comfort increases resource consumption for the traffic system. For a traffic system, the linear relationship between the traffic flow and the traffic flow speed is shown in fig. 6, the density of the traffic flow increases with the increase of the traffic flow speed when the initial vehicle flows into the traffic system, and the traffic flow borne by the traffic system is reduced when the increased traffic flow tends to be stable due to the increase of the traffic flow speed. Thereby improving the calculated comfort level, the optimum comfort level corresponding to the traffic flow is calculated according to the traffic flow and the traffic flow speed of the traffic system.

Step 2.2: judging whether the current vehicle changes lanes or not through a lane changing model; four different lane changing motivations are clearly defined in the lane changing model: strateric change Strategic lane change, cooperative change collaborative lane change, tactical change lane change, objective change obligation lane change;

the control of the vehicle is divided into longitudinal control and transverse control, wherein the longitudinal control selects an IDM following model, and the transverse control selects a lane changing model with LC2013 in the SUMO. In a complex multi-lane road network, most of vehicles need to change lanes in the same direction during running, so that the efficiency of the whole traffic system is improved, and the generation of frequent vehicle acceleration change can be weakened. The speed of the vehicle is mainly determined by the leading lane, and if the front vehicle needs to change the lane, the lane changing action is only executed when the target lane has enough physical space in order to prevent the collision with the front vehicle and the rear vehicle of the target lane. In the course of the simulation process,

when the vehicle must change lanes such that the next path of its travel path, called a strategic lane change, is, for example, a three-lane traffic system, where the vehicle is in the second lane, but the vehicle needs to turn after a unit time, the vehicle waits for the lane change even if it stops. When the lane change caused by the fact that the own vehicle is informed of the front congestion situation by other vehicles is called cooperative lane change, for example, the leading vehicle of the own vehicle needs to perform strategic lane change so as to stop, the own vehicle generates a lane change demand according to the acquired leading speed change, and thus the lane change is carried out. Tactical lane changes motivate the own vehicle due to the slow speed of the lead vehicle that wants to avoid following. The generation of the obligation lane change motivation is lane change which occurs without affecting other vehicles with higher speed.

Under the condition that the traffic volume threshold value is not exceeded, the lane change is more obvious, and the generation of the lane change model slows down the generation of the jam in a certain aspect, so that the jam period is shortened. The precondition for all vehicle lane changes in this simulation is that the vehicle state after the lane change is not affected. And compatibility of comfort is increased for the comfort parameters of the follow-up model calculated by the research.

And step 3: combining a vehicle as a multi-agent with a reinforcement learning method ppo, namely, an intelligent networking vehicle;

taking the vehicle speed, position, acceleration and expected speed as a state space and the acceleration as an action space; designing in a ppo reward function; taking the sum of the average speeds of all vehicles as reward value, setting comfort level as condition, and adding punishment to reward function if the deceleration of the vehicles is less than the comfort level; the specific formula is as follows:

the reward function re consists of three aspects: average velocity V _averge Instantaneous brake deceleration A _rea l, collision penalty Z; w is a ₁ ，w ₂ ，w ₃ Representing the average speed, the weight of the collision and the excessive deceleration of the two vehicles, A _max Is the set maximum acceleration; (ii) a Under the condition of normal vehicle running, when the vehicle decelerates, the deceleration can be slowly increased in a certain area, but the front leader vehicle stops suddenly, the self vehicle brakes forcibly, and the acceleration is set as the instantaneous braking deceleration; along with the continuous import of the intelligent internet vehicle, the overall average speed is improved.

In the experiment, vehicles with different proportions and the reinforcement learning algorithm are added into a traffic system, so that the problem that the vehicle speed is frequently changed can be obviously weakened by the vehicles with the reinforcement learning algorithm. And different reward functions are compared.

The experimental results show that the vehicle does not combine the reinforcement learning method in fig. 4, and the vehicle combines the reinforcement learning method in fig. 5, and firstly, the influence of the reinforcement learning algorithm on the traffic system is compared, as shown in fig. 4 and fig. 5. The image data statistics of the average speed of each vehicle in the whole driving process from the beginning to the end in the traffic system, it is obvious that the effect of the image 5 is better than that of the image 4, and the overall average speed of the vehicle is improved by 2.5 times after the vehicle is combined with the reinforcement learning method. Fig. 4 shows that the vehicle runs at a slower speed without the reinforcement learning method and the traffic congestion is serious, and after the reinforcement learning method is combined, as shown in fig. 5, the vehicle can effectively increase the traffic efficiency of the vehicle in the traffic system.

The SUMO simulation software is used for simulating the traffic state in a real scene in an experiment, the problem of intersection blockage in mixed traffic flow is solved by using a deep reinforcement learning PPO algorithm, and the stability and the traffic flow of a traffic system are efficiently improved. And the experiment adds the intelligent internet vehicle of different proportions into the traffic system, has proved the positive effect of development of intelligent internet vehicle to the wisdom city.

Claims

1. A multi-intersection travel time collaborative optimization method for an intelligent internet vehicle is characterized by specifically comprising the following steps:

step 1: establishing a traffic intersection scene with three main lanes and two merging lanes in the SUMO, and establishing vehicle limiting speed and lane flow direction to simulate the situation of a real traffic intersection, adding unstable parameters when merging the lanes, and allowing a vehicle on the rightmost side to suddenly turn left when approaching a left-turn intersection;

and 2, step: setting a traffic flow parameter for controlling the number of vehicle inrush lanes per hour, in the process of simulating traffic intersection scenes: adding a following model and a lane changing model to simulate a human-driven vehicle in the real world, so that the vehicle judges whether the vehicle needs to accelerate, decelerate and change lanes according to the state of the vehicle in front;

and step 3: combining a vehicle as a multi-agent with a reinforcement learning method ppo, namely, an intelligent networking vehicle; along with the continuous convergence of the intelligent network connection vehicle, the integral average speed is improved.

2. The method for collaborative optimization of the multi-intersection travel time of the intelligent networked vehicle according to claim 1, wherein the step 2 specifically comprises:

wherein the desired velocity v0, the desired distance s ^* Time interval T, minimum gap s ₀ Acceleration index δ, acceleration term α, self-speed v, and comfort b, vehicle travel distance, i.e. distance s from the vehicle in front and speed difference Δ v of the self-vehicle compared to the vehicle in front; a is the current vehicle acceleration;

and increasing a comfort variable b in order to improve the comfort of passengers riding the vehicle; b are different so that different desired distances s result ^* The greater the expected distance beyond the safe vehicle distance will affect the efficiency of the overall traffic system; in order to maximize the optimal acceleration of the vehicle, the comfort level b is improved, and the formula is as follows:

F＝Vρ

wherein F is traffic flow, V is traffic flow speed, rho is traffic flow density, H is lane length, H is vehicle body length, and comfort level is b; comfort is improved according to traffic flow and traffic speed;

selecting a priority alternative lane according to the lane change requirement of the vehicle; calculating the safe speed of the current lane and combining the speed requirement of the alternative lane change; and determining whether to change lanes or not according to the lane changing requirement of the vehicle and the lane changing emergency degree.

3. The method for collaborative optimization of multi-intersection travel time of the intelligent networked vehicle as claimed in claim 1, wherein the step 3 combines the vehicle as a multi-agent with a reinforcement learning method ppo specifically as follows:

taking the vehicle speed, position, acceleration and expected speed as a state space and taking the acceleration as an action space; designing in a ppo reward function; taking the sum of the average speeds of each vehicle as reward value, setting comfort level as condition, and adding punishment to reward function if the deceleration of the vehicle is less than the comfort level; the specific formula is as follows:

the reward function re consists of three aspects: average velocity V _averge Instantaneous brake deceleration A _real A collision penalty Z; w is a ₁ ，w ₂ ，w ₃ Weight representing average speed, collision of two vehicles and over-rapid deceleration, A _max Is the set maximum acceleration; (ii) aUnder the condition of normal vehicle running, when the vehicle decelerates, the deceleration can be slowly increased in a certain area, but the front leader vehicle stops suddenly, the self vehicle brakes forcibly, and the acceleration is set as the instantaneous braking deceleration; along with the continuous convergence of the intelligent network connection vehicle, the integral average speed is improved.