CN108847037B

CN108847037B - Non-global information oriented urban road network path planning method

Info

Publication number: CN108847037B
Application number: CN201810677156.1A
Authority: CN
Inventors: 胡征兵; 胡岑诺; 唐传慧; 蒋玲; 杨琳
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2018-06-27
Filing date: 2018-06-27
Publication date: 2020-11-17
Anticipated expiration: 2038-06-27
Also published as: CN108847037A

Abstract

The invention discloses a non-global information-oriented urban road network path planning method, which realizes the improvement of urban traffic network efficiency by balancing traffic network flow. Current existing path planning methods either can only calculate the optimal route for a single vehicle; or require more information and make assumptions. Aiming at the problems, the method provided by the invention intelligently learns and evaluates the congestion index of the current road section and constructs a PST model by monitoring the state change of the road section periodically, and uses A^*And the R algorithm is used for selecting a path, so that the road network state is in a flow balance state. A large number of experiments show that the method can more effectively balance traffic network flow under the scene that the number of vehicles is relatively large, alleviate traffic jam and have good performance on key indexes such as reduction of average running time and average running distance of the vehicles. In addition, even under the condition that a large number of vehicles in a road network do not accept route planning navigation, the method can also obviously improve traffic jam.

Description

Non-global information oriented urban road network path planning method

Technical Field

The invention belongs to the technical field of computer science, intelligent traffic and machine learning, and relates to a non-global information-oriented urban road network path planning method.

Background

In order to improve the utilization rate of urban road section resources, relieve road section congestion and reduce vehicle running delay time, scholars at home and abroad propose a plurality of methods. Current research on path optimization mainly includes: a. the^*The algorithm comprises an algorithm, a Dijkstra shortest path algorithm, an SPFA algorithm, a dynamic planning algorithm and the like, wherein the algorithms belong to deterministic algorithms, namely, a unique path can be calculated for a given road network state; what corresponds to a deterministic algorithm is an indeterminate algorithm, such as: PSO algorithm, genetic algorithm, ant colony optimization algorithm, neural network algorithm, etc., which are also called intelligent algorithm, can provide optimal or suboptimal route according to probability, and in addition, optimization algorithm based on current traffic condition, such as TomTom, Google navigation and the like can provide the optimal path and a plurality of suboptimal paths under the current situation. The method solves the path optimizing problem of a single vehicle, and has the following problems: the optimal route can only be calculated for individual vehicles, competition of other vehicles for road segment resources is not considered, high dynamics of a modern urban traffic network is ignored, and if congestion occurs, new congestion can be caused by providing the same alternative route for a plurality of vehicles at the same time. Therefore, many new methods are also proposed for the uncertain dynamic scene students of multiple vehicles: in order to achieve traffic network traffic balance, Khodadadi et al combine the ant colony algorithm with fuzzy logic to calculate the instantaneous state of the traffic network and distribute traffic according to the minimum travel time to improve traffic management as much as possible; the method comprises the following steps that (1) Joger and the like perform track prediction on a moving object by using a hidden Markov model based on mass moving tracks in a big data environment; liang Z et al propose active path planning based on congestion prediction; strictly equal people establish a multi-intersection path selection model to uniformly distribute traffic flow on selectable paths on the premise of ensuring the preference of vehicles, so that the application efficiency of urban traffic network road resources is maximized; wang L et al guide the vehicle to select a travel route by pricing based on the congestion level of the route. However, these methods either cannot effectively handle the scenario of high traffic; or more information and assumptions need to be made, such as the need to obtain a set of ODs for all vehicles in advance, road impedance functions, etc.

Reinforcement learning is widely applied in the field of intelligent transportation: wiering et al apply Q learning to the field of traffic signal control, establish a utility value function based on vehicles with the aim of minimizing the accumulated waiting time of signal lamps passed by all vehicles when the vehicles go out of and enter a city, and optimally combine the optimal path selection of the vehicles with the minimal single-node delay; tantawy et al propose a traffic light adaptive algorithm based on multi-Agent system modular Q reinforcement learning, and optimize a phase sequence by utilizing cooperative control of adjacent traffic lights, so that average time delay at intersections is reduced. The method mainly applies reinforcement learning to optimization of traffic lights, besides, scholars also explore the application of reinforcement learning to path planning, Basha N and the like use SARSA reinforcement learning to solve the problem of dynamic traffic routing based on a network simulated by a cell transmission model, so that a road network reaches a balanced state; to reduce traffic delay in order to avoid congestion, Arokhlo et al propose to calculate a minimum cost path from an origin to a destination based on a multi-Agent reinforcement learning method. However, either the agents can only study and decide in isolation with each other, and the overall coordination and cooperation of the agents cannot be realized; or the coordinated cooperation of the agents is realized, high space-time cost is caused, and the convergence speed of the algorithm is reduced.

In summary, most algorithms existing at present can only find the optimal path for a single vehicle based on the current road network state; or the application scene of the algorithm needs a plurality of assumptions as a premise; or perform poorly in the case of a greater density of vehicles.

Disclosure of Invention

Aiming at the three defects, the invention provides a method for evaluating the congestion index for road sections of a road network through a self-adaptive learning period, fully utilizes the continuity of the macroscopic behaviors of vehicles, and simultaneously plans the paths for all vehicles, thereby having good performance under the condition of higher vehicle density.

The technical scheme adopted by the invention is as follows: a non-global information oriented urban road network path planning method is characterized by comprising the following steps:

step 1: preprocessing road network information by using an additionally-arranged virtual edge method, introducing a multi-Agent system, and forming a module system consisting of a plurality of agents by taking each road section in a road network as a center, wherein each module system makes an independent decision;

step 2: intelligently learning and evaluating the road congestion Index by observing the state change of the road section in the area;

and step 3: constructing a PST model according to the road network congestion index;

and 4, step 4: utilizing A in PST model^*The R algorithm is used for selecting paths, so that traffic flow is distributed in the whole road network in a balanced manner;

and 5: and returning to execute the step 2 after the preset time period is reached.

Compared with the prior art, the invention has the beneficial effects that: because the urban traffic system has the characteristics of dynamicity, nondeterminiseness, complexity and the like, although the prior art achieves certain achievement, most methods can only find the optimal path for a single vehicle based on the current road network state; or the application scene of the algorithm needs a plurality of assumptions as a premise; or perform poorly in the case of a greater density of vehicles. The reinforcement learning has low requirement on prior knowledge of the environment, so that a good learning effect can be obtained in a complex system. The method can more effectively balance traffic network flow even under the scene that the number of vehicles is relatively large, relieves traffic jam, and has good performance on key indexes such as reduction of average running time and average running distance of the vehicles. In addition, even under the condition that a large number of vehicles in a road network do not accept route planning navigation, the method can also obviously improve traffic jam.

Drawings

FIG. 1 is a schematic block diagram of an embodiment of the present invention;

fig. 2 is a schematic diagram of a method for adding a virtual edge to a road network according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the developing area of economic technology in Beijing Yaozhuang according to the embodiment of the present invention;

FIG. 4 shows an embodiment of the present invention for different vehicle sizes F_SFA schematic diagram of the influence on the average travel time and the travel distance, wherein a is 2000 vehicles, b is 4000 vehicles, and c is 6000 vehicles;

FIG. 5 is a comparison of average travel times for different vehicle sizes in accordance with an embodiment of the present invention;

FIG. 6 is a comparison of average distance traveled on different vehicle scales in accordance with an embodiment of the present invention;

FIG. 7 is a graph comparing the number of road jams over time for an embodiment of the present invention;

FIG. 8 is a graph comparing average travel times at different compliance rates for embodiments of the present invention.

Detailed Description

In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.

Referring to fig. 1, the non-global information oriented urban road network path planning method provided by the present invention includes the following steps:

for the city road network G ═ (L, E), let the number of intersections in the city road network be m, L_xIndicating the x-th intersection, L ═ L₁，l₂，…，l _m1, 2, …, m; for any two adjacent intersections l_xAnd l_yIf it is from intersection l_xCan reach the intersection l_yThen there is a road section (l)_x，l_y) E, if by intersection l_yCan also reach the intersection l_xThen there is also a link (l)_y，l_x) E belongs to E; in order to embody the self weight of the intersection node, the road network information is preprocessed by using an additionally-arranged virtual edge method, and the node weight is converted into the edge weight; according to the actual structure of the intersection, one-to-many expansion is carried out on the intersection nodes, virtual edges are additionally arranged among the expansion nodes according to the driving direction inside the intersection, and the weight value of the additionally arranged virtual edges represents the driving time consumption of corresponding steering inside the intersection; after the road network information is preprocessed by the method of additionally arranging the virtual edges, two types of edges exist, namely a real edge and a virtual edge, the real edge is mapped to the road section in the road section set E, and the virtual edge is mapped to the driving direction in the intersection.

FIG. 2 shows an intersection represented by the method of adding a broken edge, and an intersection l is represented by the method of adding a broken edge_xExpand into 8 nodes { x₁，x₂…，x₈}. The 8 nodes are defined into two types according to the attribute and the action of the nodes at the intersection: crossing out point, crossing in point. An intersection out-point represents a node that can leave the intersection, such as x in FIG. 2₅，x₆，x₇，x₈All are crossing out points. An intersection in-point represents a node that can enter the intersection, such as x in FIG. 2₁，x₂，x₃，x₄Are all intersection points. Let B_x＝{x₅，x₆，x₇，x₈Denotes l_xIntersection departure set of intersections, Y_x＝{x₁，x₂，x₃，x₄Denotes l_xAnd (4) gathering intersection access points of the intersection.

For a section of road (l)_x，l_y) Suppose that the method of adding virtual edge is composed of a real edge [ x ]_u，y_v]Is represented by the formula (I) in which x_u∈B_x，y_v∈Y_yLet To (y)_v) Indicating arrival at intersection in-point y_vSet of nodes of (2), From (x)_u) Indicating a departure point x from an intersection_uThe calculation methods of the reachable node sets are respectively shown as formulas (1) and (2):

To(y_v)＝{x_u} (1)

From(x_u)＝{y_v} (2)

calculating congestion indexes Index of each road section of the road network according to the traffic flow information of each road section of the road network acquired in real time by applying reinforcement learning and adopting an MQ (modulation Q-learning) algorithm, wherein the Index reflects the expectation of the traffic flow of the road section, and the larger the Index is, the smaller the expectation of the traffic flow is;

let NB for any road segment i_iThe neighbor road section set of the road section i is represented, the k-th neighbor road section of the road section i uses NB_i[k]And (3) representing that a modular Agent system is established with the adjacent road section set by taking any road section as the center. Taking section i as an example, Agent_iRepresenting Agents on a road segment as Agents_iAs a center and a neighbor

Form a module, Agent inside the module_iRespectively and arbitrarily react

Forming a pair of Agent Q reinforcement learning;

the MQ algorithm is realized by the following steps:

at time t, Agent is obtained by road sensor_iAnd

state of (1)

Meanwhile, the Agent at the t-1 moment can be obtained_iAnd

selection of the best action

And the value of the return obtained

1. Federated state

Optimum response of

Is calculated as shown in equation (3):

wherein, Agent_iState of(s)_iExpressed in total number of vehicles on road section i; agent_iAct a of_iIs represented by a number, a_iIs used for calculating the wayCongestion index of segment i:

time (i) represents the travel time on the free-flow route section i, A_iRepresenting Agent_iA set of actions that are likely to be taken under all environmental conditions;

representing Agent_iIn a state of

Acting as a_i，

In a state of

Acting as

Q value of (1);

indicating that time t-1 is in a joint state

Lower part

Taking action

The probability of (d);

2、

the update at time t is as shown in equation (4), where α is the learning rate and γ is the discounting factor:

3. agent at time t_iOptimal action

Is calculated as shown in equation (5):

wherein, Agent_iState of(s)_iExpressed in total number of vehicles on road section i; according to the Agent_iThe action a taken calculates the congestion index (i) of the link i_iWherein time_iRepresenting travel time on a free-flow condition link i, a ∈ A_i(ii) a The flow rate of any section i should be maintained at the saturation flow rate SF_iTraffic flow and SF at the next time_iThe closer the return value is, the larger the return value is, the same as SF_iThe larger the difference, the smaller the return value.

The saturated flow SF is calculated as shown in equation (6):

SF＝leng×lane×state×speed×fmy×F_SF/100 (6)

wherein, leng represents the length of the road section, and the value range is (0, infinity); lane represents the number of lanes of the road section, and the value range is (0, infinity); the state represents the flatness degree of the road section, the larger the value is, the flatter the road section is, the value range is (0, 1)](ii) a speed represents the ratio of the highest speed limit of the road section to the highest speed limit of the road network, and the value range is (0, 1)](ii) a fmy, the greater the value, the more familiar the road section, the range is (0, 1)]；F_SFThe saturated flow coefficient of the road network is expressed, the calculation of SF is adjusted, if the number of vehicles in the whole road network is less, F is set_SFSetting F at a small value, and if the number of vehicles is large, setting F at a small value_SFSet at a larger value.

the principle of the PST model is that the whole path planning time T range is divided into a plurality of time scales according to time granularity delta, nodes obtained by road network information preprocessing are mapped to ordinate, the time scales are mapped to abscissa, and links are connected among the nodes according to conditions; because the road network signal lamp phase scheme has an influence on the travel time consumption, intersection phase and road section traffic capacity constraints must be considered simultaneously. In order to meet the challenges, the PST model integrates a signal lamp phase scheme into the PST model, and the PST model combines a road network congestion index to realize the integral consideration of the phase scheme and the road section traffic capacity constraint.

The specific implementation comprises the following substeps:

step 3.1: for time t (initial value is 0), if traffic light is given to any intersection l_xAt intersection point x_u∈Y_xGet to the intersection and go out of the point x_v∈B_xRight of passage, then:

new construction point (t, x)_u)、(t+time(x_u，x_v)，x_u) New creation of edge (t, x)_u)→(t+time(x_u，x_v)，x_u)；

time(x_u，y_v) Representing a free flow regime of x_uDirection of travel y of node_vThe travel time of the node; b is_xIs represented by_xIntersection departure set of intersections, Y_xIs represented by_xAnd (4) gathering intersection access points of the intersection. An intersection out-point represents a node that can leave the intersection, and an intersection in-point represents a node that can enter the intersection.

Step 3.2: for any node y_z∈From(x_v)，

New construction point (t + time (x)_u，x_v)+Index(x_v，y_z)，y_z) New creation of edge (t + time (x)_u，x_v)，x_u)→(t+time(x_u，x_v)+Index(x_v，y_z)，y_z)；

From(x_v) Indicating intersection out point x_vA set of nodes that can be reached directly;

denotes x_vNode to y_zCongestion index of the node.

Step 3.3: adding 1 to T, and returning to the step 3.1 if T is less than T;

step 3.4: for time t (initial value is 0), for any intersection l_xAt any intersection point x_u∈Y_xIf there is a point (t, x)_u) Then find a time s that is greater than time t and minimal such that point of presence (s, x)_u) (ii) a New edge (t, x)_u)→(s，x_u)；

Step 3.5: adding 1 to T, and if T is less than T, repeatedly executing the step 3.4;

step 3.6: traversing all the edges currently existing;

for any edge (t, x)_u)→(s，y_v) The edge (t, x)_u)→(s，y_v) The edge weight of (a) is set to s-t.

for any vehicle F ∈ F, the set of road segments Passed that the vehicle F has traveled is known^fFor any link i ∈ Passed^fTotal time sum of vehicles spent on the road section_iThe known vehicle f is a road section set from the starting node to the x node in the path planning

Order to

Representing the set of traveled road segments comprised by the path of vehicle f from the origin node to the x node, then

Calculating the following formula (7):

for vehicle f, pair

The reselection of the middle road segment is the waste of road segment resources, so in order to avoid the waste of the road segment resources, a penalty needs to be made for the selection, and a penalty function e (x) is defined as shown in formula (8):

wherein the content of the first and second substances,

is A^*The coefficients of the R-algorithm are,

the setting of the values was explored in the experimental part;

assuming that the parent node of the x node is y ═ fast (x), the recursion of the e (x) function is shown in equation (9):

wherein sum (y, x) represents that the vehicle is on the road section (l)_y，l_x) The total time spent on the process.

The valuation function f (x) is shown as equation (10):

f(x)＝g(x)+h(x)+e(x) (10)

wherein, f (x) is an evaluation function, and the shortest time when the shortest path passes through the x node is estimated; g (x) is the shortest time from the starting node to the x node, h (x) is a function of heuristically estimating the shortest time for the x node to reach the destination node, and the shortest time is obtained by dividing the Euclidean distance from the current node to the destination node by the maximum allowable speed of the road network.

Aiming at the problems that the existing path selection algorithm can only realize path optimization of a few vehicles and is easy to cause local congestion, the invention provides a method based onPST model and reinforcement learning A^*Exclusion algorithms to solve these problems. To verify A^*The effectiveness of the SR method is that a real road network in the developing area of the economic technology in beijing jizhuang (see fig. 3) is obtained from OpenStreetMap, and a vanet is used to simulate the movement of vehicles in the road network, and is an open-source micro traffic simulation software. By constructing different vehicle scale scenes, A^*The SR method is compared with 4 different path optimization algorithms:

(1) SP (shortest path algorithm);

(2) DSP (dynamic shortest path algorithm);

(3) RkSP (random k shortest path algorithm);

(4) DTA (dynamic traffic allocation algorithm).

The road network information is configured by using extensible markup language, and describes attributes of road network sections such as start coordinates, target coordinates, number of lanes, highest speed limit and the like, and the vehicle information is configured by using extensible markup language and comprises attributes of vehicles such as start coordinates, target coordinates, departure time, highest speed and the like. The road network in the developing area of economic technology of Beijing Yazhu is shown in fig. 3, all road sections are bidirectional, different road sections have different speed limits, the same road section has the same number of lanes, and each intersection of the real road network has a signal lamp.

In this embodiment:

1. explore A^*Setting parameters of SR method by means of saturation flow coefficient F of road section_SF、A^*Coefficient of R algorithm

The best parameter settings are obtained.

2. To verify A^*The SR method reduces the capacity of the number of the traffic jam road sections, designs an experiment A^*The SR method is compared with 4 path optimization algorithms under different vehicle scales.

3. To verify A^*The effectiveness of the SR method in reducing travel time and distance problems is realized by designing an experiment A^*SR method at different vehicle scalesThe following performance was compared to 4 methods.

4. By A^*Compared with the effects of other algorithms under different navigation compliance rates, the SR method further verifies the effectiveness of the algorithm in optimizing traffic.

Firstly, selecting parameters;

shown in Table 1 is A^*The SR method involves the following relevant parameters: time period T_c、A^*Coefficient of R algorithm

Road network saturation flow coefficient F_SF. The three parameters have great influence on the performance of the algorithm, and a large amount of experiments are carried out to explore the optimal setting of the parameters. Obtaining OD set randomly on the basis of real road network, wherein the departure place is concentrated in the upper left corner area of the map, the destination is concentrated in the right area of the map, and all vehicles are assumed to completely comply with A^*The SR method plans the resulting route. For brevity, T is not described in detail in this embodiment_cAnd

in the following experiment, let T_cAnd

equal to 15s and 2, respectively, because when these two parameters are set to these two values, it is possible to achieve a good result regardless of the vehicle size. According to the above for the saturation flow coefficient F_SFCan know F_SFSmaller values of (d) indicate smaller numbers of vehicles the road is allowed to accommodate; otherwise the larger the number of vehicles allowed to be accommodated. In the embodiment, a plurality of different vehicle scales are designed, and the road section saturation flow coefficient F is obtained under the scenes of different vehicle scales_SFAnd setting different parameter values, and calculating the average driving time and the average driving distance of the vehicle. As shown in fig. 4, the calculation results of the average traveling time and the average traveling distance of the vehicle when the vehicle sizes are 2000, 4000 and 6000 vehicles, respectively.

TABLE 1 relevant parameters

Order to

Is represented by F_SFOptimal settings at different vehicle scales, then from the simulation results in fig. 4 it can be seen that:

(1) in the case of a smaller vehicle size, the average travel time and the travel distance of the vehicle are followed by F_SFIncreases and decreases and then levels off. This is because when F_SFWhen the value of (A) is too small, the congestion condition is easily reached, A^*The SR method guides the vehicle to travel on routes that are relatively long, resulting in increased travel time and travel distance. When F is present_SFTo a certain value

When the road is in the road network, the road section in the road network can not reach A due to the small vehicle scale^*The SR method provides for congestion conditions so that most vehicles will still be traveling a better route. When in use

In time, the number of vehicles accommodated in the road section in the road network can not reach F any more_SFTherefore, the average travel time and the course tend to be smooth. When the number of vehicles is 2000, the number of vehicles,

(2) at medium vehicle scale, the average vehicle travel time is F_SFIncrease to

Decrease rapidly to a minimum and thereafter with F_SFThere is a tendency that the average vehicle running time increases slowly because when F is reached_SFAfter a certain value is reached, the judgment of road congestion is improvedUnder the condition of congestion, traffic flow on certain roads is relatively large under the condition of medium vehicle scale, and congestion is actually generated in a road network. When the number of vehicles is 4000,

(3) in the case of a large vehicle scale, the overall road network traffic flow is large, and therefore

Must be large at F_SFIncrease to

The front average travel time is at a higher level because of F_SFSetting too small may result in a greater probability of the road being determined to be congested, causing a concussion in the road network state. When in use

Actual congestion is likely to occur at that time, so that the average travel time rapidly increases. When the number of the vehicles is 6000,

secondly, comparing the performances of different vehicle scales;

to verify A^*SR method in reducing running time, this section will A in different vehicle scales^*The SR method is respectively compared with the DSP, the RkSP and the DTA path optimization algorithm. Fig. 5 shows the statistics of the average vehicle travel time of these algorithms under different vehicle scale scenarios. Simulation results show that^*The SR method reduces the average running time of the vehicle to different degrees under different vehicle scale scenes. The DSP algorithm will dynamically update the travel path based on real-time traffic conditions, but in some cases, assign the same route to many vehicles at the same location and destination, resulting in new congestion, so the average travel time is at a lower level with a smaller number of vehicles, and the average travel time is flat as the traffic density increasesThe average travel time increases by a large margin. The RkSP avoids the disadvantages of DSP algorithms by randomly balancing traffic flow over different routes, but apparently is not sufficient by a random method alone. DTA works better at reducing the average travel time, as expected, because it brings the road network as well as possible into equilibrium. In all algorithms, the average travel time tends to increase as the vehicle size increases, and this increasing trend corresponds to: DSP > RkSP > DTA > A^*SR。A^*The reason why the SR method is less prone to increase in average travel time is that a increases as the number of vehicles increases^*The SR method can obtain a better strategy through reinforcement learning. This increasing trend ultimately results in the average travel time achieved by all algorithms meeting when the vehicle size reaches 6000 vehicles: DSP > RkSP > DTA > A^*SR。

The average driving time and the average driving distance of the vehicle are key indexes for evaluating a path planning strategy, and experiments show that for most path planning algorithms, the average driving time and the average driving distance show strong positive correlation under the scene with few vehicles, and the average driving time and the average driving distance do not show strong correlation under the scene with high traffic density. FIG. 6 shows that^*And (3) comparing the average driving distance of the vehicle under different vehicle scale scenes by using the SR algorithm with the SP algorithm, the DSP algorithm, the RkSP algorithm and the DTA algorithm. The SP algorithm uniquely plans the shortest path for all vehicles according to the road network space distance, so that the average driving distance is the minimum no matter what level the traffic flow density is. Along with the increase of the traffic flow, the average driving distance of the DSP algorithm is increased quickly, and the main reason is that only the current road network state is considered during path planning, so that the road network state is vibrated. A. the^*The SR method takes into account the repulsive force of the vehicle to the traveled trajectory, avoiding the re-selection of the road segment once traveled, thereby greatly reducing the average traveled distance. Under the scene of low traffic density, the average driving distance of each algorithm is basically equal. With the increase of the traffic flow density, the effect of each algorithm in reducing the average travel distance is finally satisfied: SP > A^*SR＞DTA＞RkSP＞DSP。

It was concluded through experiments that as the vehicle size increased, A^*The SR method has a better effect in reducing the average travel time and the average travel distance of the vehicle than other algorithms.

III, A^*SR ability to alleviate traffic congestion;

A^*the principle of the SR method is to balance traffic flow on all road sections of a road network, improve the utilization rate of road resources and reduce the number of congestion of the road sections. In the present embodiment, the number of road jams in the road network is detected every minute, and is A as shown in FIG. 7^*And the SR algorithm and SP, DSP, RkSP and DTA algorithms are used for relieving the comparison graph of the road network congestion under the scene that the vehicle size is 5000. As can be seen from fig. 7, since the SP algorithm cannot dynamically adjust the selection of the route, the road congestion number is always at a high level, and the descending trend is very slow. DSP and DTA algorithms have the ability to re-plan paths and therefore perform better than SP algorithms. A. the^*Although the link congestion rate is higher than that of the DTA in the early stage, the SR method has a higher tendency of decreasing the link congestion rate to the middle and late stages than that of the DTA because of the adaptive learning capability. Sequencing all algorithms according to the congestion number of the maximum road section to obtain: SP > DSP > A^*SR ≧ RkSP > DTA, but A can be seen^*The progressive increase of the SR with time has a stronger ability to reduce the number of congested road segments.

Fourthly, comparing the performance performances under different following navigation probabilities;

it is impractical in real life that the vehicle fully complies with the navigation system instructions, and the probability that the vehicle complies with the navigation can greatly affect the actual effect of the algorithm and is therefore also an important factor that must be considered in designing the path planning strategy. The vehicle compliance rate in the embodiment means that each vehicle has a certain probability to choose to receive the route planning of the navigation system in each time period, otherwise, the vehicle continues to run according to the existing route. In the present embodiment, in a scene of a vehicle scale of 5000, average travel times of the algorithms at different vehicle compliance rates are calculated, and an experimental result is shown in fig. 8. From FIG. 8, it can be seen that A is the rate of vehicle compliance regardless of the degree to which A is applied^*The SR, DTA, DSP and RkSP algorithms can obviously reduce the averageThe driving time is because vehicles in the road network following the navigation instructions can always receive better routes and road segment resources are made available for vehicles not following the navigation instructions. Especially when compliance rates are low, A^*The average travel time of the SR algorithm is significantly lower than that of the other algorithms. A. the^*The SR method can achieve such a good effect because the reinforcement learning algorithm can continuously adjust the strategy according to the change of the environmental state, and the finally obtained strategy is suitable for the current compliance rate.

It should be understood that parts of the specification not set forth in detail are well within the prior art.

It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A non-global information oriented urban road network path planning method is characterized by comprising the following steps:

the method for preprocessing the road network information by using the method of adding the virtual edge comprises the following specific implementation processes: for the city road network G ═ (L, E), let the number of intersections in the city road network be m, L_xIndicating the x-th intersection, L ═ L₁，l₂，…，l_m1, 2, …, m; for any two adjacent intersections l_xAnd l_yIf it is from intersection l_xCan reach the intersection l_yThen there is a road section (l)_x，l_y) E, if by intersection l_yCan also reach the intersection l_xThen there is also a link (l)_y，l_x) E belongs to E; to embodyPreprocessing road network information by using an additionally-arranged virtual edge method according to the self weight of the intersection node, and converting the node weight into the weight of an edge; according to the actual structure of the intersection, one-to-many expansion is carried out on the intersection nodes, virtual edges are additionally arranged among the expansion nodes according to the driving direction inside the intersection, and the weight value of the additionally arranged virtual edges represents the driving time consumption of corresponding steering inside the intersection; the road network model obtained by preprocessing the road network information by using the method of additionally arranging the virtual edges has two types of edges, namely a real edge and a virtual edge, the real edge is mapped to the road section in the road section set E, and the virtual edge is mapped to the driving direction in the intersection;

the congestion Index of each road section of the road network is calculated by adopting a modulation Q-learning algorithm and applying reinforcement learning according to the real-time acquired traffic flow information of each road section of the road network, the Index reflects the expectation of the traffic flow of the road section, and the larger the Index is, the smaller the expectation of the traffic flow is;

let NB for any road segment i_iSet of neighbor segments representing segment i, NB for kth neighbor segment of segment i_i[k]Representing that a modularized Agent system is established with a neighbor road section set by taking any road section as a center; for a section i, Agent_iRepresenting Agents on a road segment as Agents_iAs a center and a neighbor

Form a module, Agent inside the module_iRespectively and arbitrarily react

Forming a pair of Agent Q reinforcement learning, wherein the basic idea of the reinforcement learning, namely training agents continuously take actions with the expectation value of the maximized Q value as a target and obtain a return value, the agents use the return value to evaluate the previous action and update the knowledge, and then the agents turn to the next state, so that the optimal strategies under different environmental states are learned;

at time t, Agent is obtained by road sensor_iAnd

state of (1)

Simultaneously obtains the Agent at the t-1 moment_iAnd

selection of the best action

And the value of the return obtained

The specific implementation of the Modular Q-learning algorithm comprises the following sub-steps:

step 2.1: computing federated states

Optimum response of

Wherein, Agent_iState of(s)_iExpressed in total number of vehicles on road section i; agent_iAct a of_iIs represented by a number, a_iThe function of (a) is to calculate the congestion index of the road segment i: index (i) ═ a_iTime (i) represents the travel time on the free-flow route section i, A_iRepresenting Agent_iA set of actions that are likely to be taken under all environmental conditions;

representing Agent_iIn a state of

Acting as a_i，

In a state of

Acting as

Q value of (1);

indicating that time t-1 is in a joint state

Lower part

Taking action

The probability of (d);

step 2.2: updated at time t

Where α is the learning rate and γ is a discount factor;

step 2.3: agent for calculating t time_iOptimal action

Step 2.4: according to the Agent_iActions taken

Calculating congestion index for road segment i

The flow rate of any section i should be maintained at the saturation flow rate SF_iTraffic flow and SF at the next time_iThe closer the return value is, the larger the return value is, the same as SF_iThe larger the difference, the smaller the return value;

the saturated flow SF is calculated in the following way:

SF＝leng×lane×state×speed×fmy×F_SF/100；

wherein, leng represents the length of the road section, and the value range is (0, infinity); lane represents the number of lanes of the road section, and the value range is (0, infinity); the state represents the flatness degree of the road section, the larger the value is, the flatter the road section is, the value range is (0, 1)](ii) a speed represents the ratio of the highest speed limit of the road section to the highest speed limit of the road network, and the value range is (0, 1)](ii) a fmy, the greater the value, the more familiar the road section, the range is (0, 1)]；F_SFThe saturated flow coefficient of the road network is expressed, the calculation of SF is adjusted, and if the number of vehicles in the whole road network is small, F is set manually_SFSetting F at a small value, and if the number of vehicles is large, setting F at a small value_SFSet at a larger value;

and 4, step 4: carrying out path selection in a PST model by using an A-R algorithm so that the traffic flow is uniformly distributed in the whole road network;

2. The non-global information oriented urban road network path planning method according to claim 1, wherein in step 3, the PST model is constructed according to road network congestion indexes, the whole path planning time T range is divided into a plurality of time scales according to time granularity Δ, nodes obtained by road network information preprocessing are mapped to ordinate, the time scales are mapped to abscissa, and links are connected between the nodes according to conditions;

the specific implementation comprises the following substeps:

step 3.1: for the time t, the initial value is 0, if the traffic light is given to any intersection l_xAt intersection point x_u∈Y_xGet to the intersection and go out of the point x_v∈B_xRight of passage, then:

time(x_u，y_v) Representing a free flow regime of x_uDirection of travel y of node_vThe travel time of the node; b is_xIs represented by_xIntersection departure set of intersections, Y_xIs represented by_xGathering intersection access points of the intersection; an intersection out-point represents a node that can leave the intersection, and an intersection in-point represents a node that can enter the intersection;

step 3.2: for any node y_z∈From(x_v)，

From(x_v) Indicating intersection out point x_vA set of nodes that can be reached directly; index (X)_v，y_z) Denotes x_vNode to y_zCongestion index of the node;

step 3.3: adding 1 to T, and returning to the step 3.1 if T is less than T;

step 3.4: for time t, the initial value is 0, for any intersection l_xAt any intersection point x_u∈Y_xIf there is a point (t, x)_u) Then find a time s that is greater than time t and minimal such that point of presence (s, x)_u) (ii) a New edge (t, x)_u)→(s，x_u)；

step 3.6: traversing all the edges currently existing;

3. The non-global information oriented urban road network path planning method according to claim 1, wherein in step 4, the path selection is performed in the PST model by using an a R algorithm, and the specific implementation process is as follows: for any vehicle F ∈ F, the set of road segments Passed that the vehicle F has traveled is known^fFor any link i ∈ Passed^fTotal time sum of vehicles spent on the road section_iThe known vehicle f is a road section set from the starting node to the x node in the path planning

Order to

Comprises the following steps:

for vehicle f, pair

The reselection of the middle road section is the waste of road section resources, so in order to avoid the waste of the road section resources, a penalty needs to be made for the selection, and a penalty function e (x) is defined as follows:

wherein, F_A*RIs a x R algorithm coefficient;

assuming that the parent node of node x is y ═ fast (x), then the recursion of the e (x) function is:

wherein sum (y, x) represents that the vehicle is on the road section (l)_y，l_x) The total time spent;

the valuation function f (x) is:

f(x)＝g(x)+h(x)+e(x)；