CN108335497B - Traffic signal self-adaptive control system and method - Google Patents
Traffic signal self-adaptive control system and method Download PDFInfo
- Publication number
- CN108335497B CN108335497B CN201810125948.8A CN201810125948A CN108335497B CN 108335497 B CN108335497 B CN 108335497B CN 201810125948 A CN201810125948 A CN 201810125948A CN 108335497 B CN108335497 B CN 108335497B
- Authority
- CN
- China
- Prior art keywords
- traffic
- state
- module
- phase
- learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/07—Controlling traffic signals
- G08G1/08—Controlling traffic signals according to detected number or speed of vehicles
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Traffic Control Systems (AREA)
- Feedback Control In General (AREA)
Abstract
A traffic signal adaptive control system and method, aiming at the complexity of traffic control, designs a reward and punishment mechanism of a learning system by adopting a fuzzy technology, and more accurately and reasonably reflects the change of a traffic state; meanwhile, a Q learning state space is constructed by using experience-based state division, the updating complexity of the Q learning state space is reduced on the premise of keeping multi-parameter evaluation traffic states by establishing a traffic parameter fusion function, and a green light timing scheme based on a phase is given, so that the real-time response control of the traffic flow is finally achieved. The invention can effectively reduce the response time of traffic jam, quickly coordinate the signal control of each phase and improve the traffic efficiency of the intersection; due to the characteristic of no model, the method has strong self-adaptive capacity and universality; the system simplifies the storage form of the parameter indexes in the Q table, gives consideration to the learning effect and the response speed of the system on the traffic state, and reduces the complexity of control.
Description
Technical Field
The invention belongs to the technical field of intelligent traffic, and particularly relates to a traffic signal self-adaptive control system and method.
Background
At present, the urban traffic flow is increasing at a high speed, and along with the high-speed development of traffic, an effective intelligent traffic management control technology is urgently needed. In recent years, artificial intelligence technology has been greatly developed in the aspect of traffic control application, and a new technical scheme is provided for solving the problem of urban traffic control. The self-adaptive control system can adjust the time-setting parameters in real time according to the control target of a manager and the characteristics of time-varying traffic flow at the intersection, can better utilize the overall traffic capacity of a road network compared with timing and induction control, effectively improves the traffic efficiency of the road network, and is an effective means for solving urban traffic jam.
According to the knowledge of the applicant, the existing self-adaptive traffic signal timing method, such as fuzzy control, neural network, evolutionary algorithm, expert system and the like, cannot adapt to the real-time changeable intersection traffic flow characteristics to a certain extent, and is difficult to efficiently adjust and control signal timing according to the real-time traffic state under the condition of effectively considering various indexes. In addition, most adaptive traffic control methods need to establish a complex traffic model, which increases the difficulty in implementing the method to some extent. Different from the method, the self-adaptive traffic control method based on reinforcement learning can be more suitable for the change of the traffic environment of the urban intersection. The reinforcement learning does not need an accurate model of a traffic environment, and a better traffic signal timing strategy can be effectively obtained in a random traffic environment. Reinforcement learning is to obtain feedback with uncertainty based on behavior after interaction with external environment, and update the state-behavior value associated with the feedback, so as to obtain the optimal control strategy. Among them, Q learning is the most typical, and the Q learning algorithm is a reinforcement learning algorithm independent of the currently effective model.
However, the application of the Q-learning algorithm to traffic control still has some disadvantages. The existing Q learning intersection traffic self-adaptive control method does not fully utilize various traffic state parameters under a complex traffic state, and is difficult to give accurate and reasonable feedback to the change of the traffic flow; in addition, for the adaptive control of the Q learning with an indefinite period, the state space is too large, so that the learning efficiency is low, and effective control is difficult to form.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a traffic signal self-adaptive control system and a traffic signal self-adaptive control method.
In order to achieve the aim, the self-adaptive control system for the traffic signals comprises a traffic state induction module, a parameter fusion and division module, a Q learning module, a control decision module, a Q table and a fuzzy evaluator;
the traffic state sensing module is used for acquiring traffic flow state information of the intersection and respectively transmitting the traffic flow state information to the parameter fusion and division module and the fuzzy evaluator;
the parameter fusion and division module is used for fusing the received traffic flow state information to obtain traffic parameters, dividing the traffic parameters into corresponding state segment sets through state division, and using the traffic parameters as a basis for inquiring a timing scheme in the Q table and a parameter for updating a state space through the Q learning module;
the fuzzy evaluator is used for inquiring the reward and punishment value feedback table according to the traffic flow state information collected by the traffic state induction module and feeding the inquired reward and punishment value back to the Q school module;
the Q learning module is used for taking the state section set corresponding to the traffic parameters transmitted by the parameter fusion and division module and the reward and punishment value fed back by the fuzzy evaluator as the basis for updating the Q table by the Q learning module and updating the Q value of the last state;
the Q table is used for storing each traffic parameter state section set, the timing scheme and the corresponding selected Q value and can be updated through the learning module;
and the control decision module is used for providing a corresponding scheme for the signal lamp of the current phase from the result transmitted by the Q table according to the selection strategy.
The preferred scheme of the invention is as follows: the traffic flow state information comprises traffic flow information, the vehicle queuing length in the last period and the vehicle delay time of each phase at the intersection.
Preferably, the method for establishing the fusion function in the parameter fusion and division module is as follows:
first, w is equal to (F, L)g,Lr) Defining a state vector set with three dimensions of green light phase traffic busy degree, green light phase vehicle queuing length and red light phase vehicle queuing length;
establishing a function according to the green light phase vehicle queuing length: s1=Lmax-LgWherein L isgVehicle queue length for green light phase; l ismaxVehicle queue length limit acceptable for the road;
establishing a function according to the queuing length of the vehicles in the red light phase: s3=K1s1+K2LrWherein L isrFor red light phase vehicle queue length, k1,k2Is a scale factor, and k1>k2;
And (3) establishing a final fusion function by combining the traffic busyness of the green light phase:
more preferably, the scale factor k1Is composed ofThe scale factor k2Is composed ofWherein, TminAnd TmaxThe shortest and longest effective time of the green lamp phase.
Preferably, the fuzzy evaluator selects the green light phase traffic busy degree change rate F' and the red light phase vehicle queue length change rate Lr' as an input; the output of the fuzzy evaluator is a reward punishment signal value, the output is deblurred through a gravity center method, and the reward punishment signal value range is (-1, 1); the input and output variables are all divided in a five-level fuzzy division mode, namely { 'negative large', 'negative small', 'middle', 'positive small', 'positive large' } represent the vehicle queue length change rate and the green light phase traffic busy rate of five different red light phases, are respectively marked as { NB, NS, ZO, PS and PB }, and are represented by a triangular membership function; and when the traffic busy degree change rate of the green light phase and the vehicle queue length change rate of the red light phase exceed half of a preset threshold value, judging that the influence generated by the timing scheme is maximum.
More preferably, the formula of the green light phase traffic busy degree change rate F' isWhere q is the traffic throughput of the green phase over the decision duration and st is the saturation flow of the green phase over the decision duration.
Preferably, the Q table stores the timing strategies under different traffic states in a table form, wherein the first column stores a state segment set of the traffic state, the first row stores different phase green light timing schemes, the Q table initializes each timing strategy in advance, and in the subsequent iteration process, the Q learning module continuously updates the timing strategies until the optimal timing strategies under different traffic states are obtained
Preferably, the learning update formula of the q-table is:
Q(S,a)←Q(S,a)+α[r+γmaxa′Q(S′,a′)];
wherein S is a state segment set of a traffic state S, and a is a traffic timing scheme; q (S, a) represents the selection basis under the current state set S; alpha is learning efficiency, and the higher alpha represents that Q (S, a) is subjected to the next learningThe greater the individual state impact; r is a feedback, namely a reward penalty value, after the timing scheme a is executed; s ' represents the next state set, and Q (S ', a ') represents the selection strategy in the next state set; maxa'a(S ', a') then represents the best selection strategy estimated for the next set of states; γ represents the degree of attenuation.
The invention also provides a traffic signal self-adaptive control method, which comprises the following steps:
s1, the traffic state sensing module collects the traffic flow state information of the traffic intersection at the current moment and respectively transmits the traffic flow state information to the parameter fusion state and division module and the fuzzy evaluator;
s2, the parameter fusion state and division module fuses the received traffic flow state information to obtain traffic parameters, the traffic parameters are divided into corresponding state segment sets through state division, and then the obtained state segments are transmitted to the Q table and the Q learning module;
s3, the Q table inquires all timing schemes suitable for the current traffic state according to the received state section, transmits all the adapted timing schemes to the control decision module, and selects a suitable timing scheme through the selection strategy of the control decision module;
s4, when the timing scheme is executed and the traffic flow changes, the sensing module collects the current traffic flow state information again, and acquires the corresponding traffic parameters and the divided state segment set through the parameter fusion state and division module;
s5, the fuzzy evaluator inquires a feedback table according to the traffic flow state information to obtain a reward and punishment value, and feeds the reward and punishment value back to the Q learning module;
s6, the Q learning module updates a Q table according to the changed traffic state section set and the reward and punishment values;
and S7, switching to the next phase after the current green lamp phase is finished, and repeating the steps S1 to S6.
The invention has the beneficial effects that: the invention can effectively reduce the response time of traffic jam, quickly coordinate the signal control of each phase and improve the traffic efficiency of the intersection; due to the characteristic of no model, the method has strong self-adaptive capacity and universality; meanwhile, the storage form of the parameter indexes in the Q table is simplified, the learning effect and the response speed of the system on the traffic state are considered, and the control complexity is reduced;
drawings
The invention will be further described with reference to the accompanying drawings.
FIG. 1 is a schematic diagram of the system of the present invention;
FIG. 2 is a schematic flow diagram of the process of the present invention;
FIG. 3 is a schematic diagram of a four-phase intersection of the present invention;
FIG. 4 is a schematic diagram of the fuzzy evaluator structure of the present invention.
Detailed Description
Example one
Referring to fig. 1, the present embodiment provides a traffic signal adaptive control system and method, including a traffic state sensing module, a parameter fusion and division module, a Q learning module, a control decision module, a Q table, and a fuzzy evaluator;
the traffic state sensing module is used for acquiring traffic flow state information of the intersection through a sensing or image processing technology and respectively transmitting the traffic flow state information to the parameter fusion and division module and the fuzzy evaluator;
the parameter fusion and division module is used for fusing the received traffic flow state information to obtain a traffic parameter S, dividing the traffic parameter S into corresponding state segment sets S through state division, and using the traffic parameter S as a basis for inquiring a timing scheme in the Q table and a parameter for updating a state space through the Q learning module;
the fuzzy evaluator is used for inquiring the reward and punishment value feedback table according to the traffic flow state information collected by the traffic state induction module and feeding the inquired reward and punishment value back to the Q school module;
the Q learning module is used for taking the state section set S corresponding to the traffic parameters transmitted by the parameter fusion and division module and the reward and punishment value r fed back by the fuzzy evaluator as the basis for updating the Q table by the Q learning module and updating the Q value of the previous state;
the Q table is used for storing timing strategies Q (S, a) under different traffic states in a table form, wherein the first column stores a state section set of the traffic states, the first row stores different phase green light timing schemes, the Q table initializes each Q (S, a) in advance according to experience, and in the subsequent iteration process, the Q learning module continuously updates the Q (S, a) until the optimal timing strategies under different traffic states are obtained;
and the control decision module is used for providing a corresponding scheme for the signal lamp of the current phase from the result transmitted by the Q table according to the selection strategy.
Referring to fig. 2, the traffic signal adaptive control method based on the control system includes the following steps:
s1, the traffic state sensing module collects the traffic flow state information of the traffic intersection at the current moment and respectively transmits the traffic flow state information to the parameter fusion state and division module and the fuzzy evaluator;
s2, the parameter fusion state and division module fuses the received traffic flow state information to obtain traffic parameters, the traffic parameters are divided into corresponding state segment sets through state division, and then the obtained state segments are transmitted to the Q table and the Q learning module;
s3, the Q table inquires all timing schemes suitable for the current traffic state according to the received state section, transmits all the adapted timing schemes to the control decision module, and selects a suitable timing scheme through the selection strategy of the control decision module;
s4, when the timing scheme is executed and the traffic flow changes, the sensing module collects the current traffic flow state information again, and acquires the corresponding traffic parameters and the divided state segment set through the parameter fusion state and division module;
s5, the fuzzy evaluator inquires a feedback table according to the traffic flow state information to obtain a reward and punishment value, and feeds the reward and punishment value back to the Q learning module;
s6, the Q learning module updates a Q table according to the changed traffic state section set and the reward and punishment values;
and S7, switching to the next phase after the current green lamp phase is finished, and repeating the steps S1 to S6.
The system and the method can effectively reduce the response time of traffic jam, quickly coordinate the signal control of each phase and improve the traffic efficiency of the intersection; due to the characteristic of no model, the method has strong self-adaptive capacity and universality; meanwhile, the storage form of the parameter indexes in the Q table is simplified, the learning effect and the response speed of the system on the traffic state are considered, and the control complexity is reduced.
The intersection model of the present embodiment is shown in fig. 3, and the intersection traffic model is composed of four phases: straight in the east-west direction, left turn in the east-west direction, straight in the north-south direction, left turn in the north-south direction. The vehicle right turn is merged into straight ahead. Traffic flow a at each phase of intersection1,a2,a3,a4,a5,a6,a7,a8。
The traffic flow state information comprises traffic flow information, the vehicle queuing length in the last period and the vehicle delay time of each phase at the intersection.
The method for establishing the fusion function in the parameter fusion and division module comprises the following steps:
first, w is equal to (F, L)g,Lr) Defining a state vector set with three dimensions of green light phase traffic busy degree, green light phase vehicle queuing length and red light phase vehicle queuing length;
establishing a function according to the green light phase vehicle queuing length: s1=Lmax-LgWherein L isgVehicle queue length for green light phase; l ismaxVehicle queue length limit acceptable for the road;
establishing a function according to the queuing length of the vehicles in the red light phase: s2=k1s1+k2LrWherein L isrFor red light phase vehicle queue length, k1,k2Is a scale factor, and k1>k2;
And (3) establishing a final fusion function by combining the traffic busyness of the green light phase:
scale factor k1Is composed ofThe scale factor k2Is composed ofWherein, TminAnd TmaxThe shortest and longest effective time of the green lamp phase.
The fuzzy evaluator selects the traffic busy rate F' of the green light phase and the vehicle queue length rate L of the red light phaser' as an input; the output of the fuzzy evaluator is a reward punishment signal value, the output is deblurred through a gravity center method, and the reward punishment signal value range is (-1, 1); the input and output variables are all divided in a five-level fuzzy division mode, namely { 'negative large', 'negative small', 'middle', 'positive small', 'positive large' } represent the vehicle queue length change rate and the green light phase traffic busy rate of five different red light phases, are respectively marked as { NB, NS, ZO, PS and PB }, and are represented by a triangular membership function; when the traffic busy degree change rate of the green light phase and the vehicle queue length change rate of the red light phase exceed half of a preset threshold value, the influence generated by the timing scheme is judged to be maximum, and therefore LrThe discourse range of' is set to (-100, 100); the domain of F' was set to (-0.5, 0.5). The fuzzy evaluator has only one output which is a reward punishment signal value r, the fuzzy output is deblurred by adopting a gravity center method, and the finally output reward punishment signal value range is (-1, 1).
The formula of the green light phase traffic busy degree change rate F' isWhere q is the traffic throughput of the green phase over the decision duration and st is the saturation flow of the green phase over the decision duration.
The Q table stores the timing strategies under different traffic states in a table form, wherein the first column stores a state section set of the traffic states, the first row stores different phase green light timing schemes, the Q table initializes each timing strategy in advance, and in the subsequent iteration process, the Q learning module continuously updates the timing strategies until the optimal timing strategies under different traffic states are obtained.
After the timing scheme is executed for a period of time, the traffic state sensing module senses that the traffic flow of the phase is increased, the traffic busy degree of the phase is increased, the vehicle queuing length is increased, and the average delay of the vehicles is increased. The timing scheme given by the Q learning module does not have a good effect, so that negative feedback is obtained from the fuzzy evaluator, timing selection under the road condition is updated according to the Q value, and then when the same traffic state is sensed again, a more appropriate timing scheme is selected according to a learned strategy, and the traffic degree of the intersection is improved. By the mode, the control system continuously detects the current traffic condition, learns in real time and continuously updates the timing scheme selection basis in the Q table, so that the traffic flow of the intersection tends to be optimal. And when the learning and updating of the Q table are completed and the system is relatively stable, the system can directly query the Q table according to the traffic state section S obtained after parameter fusion and state division to obtain an optimal timing scheme.
The learning update formula of the q table is as follows:
Q(S,a)←Q(S,a)+α[r+γmaxa′Q(S′,a′)];
wherein S is a state segment set of a traffic state S, and a is a traffic timing scheme; q (S, a) represents the current shape
Selecting basis under the state set S; α is learning efficiency, and the higher α represents that Q (S, a) is more influenced by the next state; r is a feedback, namely a reward penalty value, after the timing scheme a is executed; s ' represents the next state set, and Q (S ', a ') represents the selection strategy in the next state set; maxa' Q (S ', a ') represents the best selection strategy estimated for the next set of states; γ represents the degree of attenuation.
The specific steps of Q table and fuzzy evaluator design are as follows:
1) designing Q table to obtain the parameters (F) of controlled road junctiong,Lg,Lr) Obtaining a traffic parameter s according to the parameter values, and obtaining a traffic parameter according to the traffic parameter sDividing the size of S into different state segment sets S;
2) according to the time distribution experience of the intersection, if the traffic busyness of the intersection at the current phase is low, a relatively short green light time length should be allocated to the phase; when the busy level increases, the green duration assigned to this phase should be increased accordingly. It is not reasonable to configure a phase green time too high or too low for a traffic situation. Therefore, as shown in table 1, the present embodiment empirically configures 4 different green light timing schemes for each traffic state in the current phase. Thus, those poor options can be directly ignored when the Q-table is updated. The method is focused on learning and updating a plurality of better timing schemes under the traffic state, so that the optimal phase green light timing scheme is obtained. According to the above rules, the initial Q-table is designed as shown in Table 1.
TABLE 1 initial Q-table
25 | 30 | 35 | 40 | 45 | 50 | 55 | 60 | 65 | 70 | 75 | |
(14,15) | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
(13,14) | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
(12,13) | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
(11,12) | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
(9,11) | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
(7,9) | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
(5,7) | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
(3,5) | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
(2,3) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
(1,2) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
(0,1) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
Wherein, the first row of the initial Q table is a timing scheme a, and the first column is a state segment set S. In the simulation process of the embodiment, the maximum queuing length of the vehicles is set to be 200m, and the range of the vehicle traffic busy degree F is 0-1. The minimum active duration of the green light is set to 25 seconds and at most 75 seconds. The calculated traffic parameter s ranges from 0 to 15, wherein the larger s is, the better the road traffic condition is represented. And 4 different timing schemes are selected according to experience under each traffic state set S in the initial Q table for traffic control.
3) As shown in fig. 4, a fuzzy evaluator, i.e., a reward and punishment function for Q learning update, is established according to each parameter of the intersection and the fuzzy rule. To ensure the accuracy of the evaluation, the steps of the fuzzy input were set to 0.5 and 20, respectively, in MATLAB, resulting in a table of size 11 × 11, the form of which is shown in table 2 below.
TABLE 2 fuzzy evaluator feedback Table
-0.5 | -0.4 | -0.3 | -0.2 | -0.1 | 0 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | |
-100 | -0.84 | -0.84 | -0.84 | -0.55 | -0.34 | -0.01 | 0.29 | 0.51 | 0.82 | 0.84 | 0.84 |
-80 | -0.84 | -0.84 | -0.84 | -0.55 | -0.34 | -0.01 | 0.29 | 0.51 | 0.82 | 0.84 | 0.84 |
-60 | -0.84 | -0.84 | -0.84 | -0.55 | -0.34 | -0.01 | 0.29 | 0.51 | 0.82 | 0.84 | 0.84 |
-40 | -0.55 | -0.55 | -0.54 | -0.27 | -0.06 | 0.27 | 0.36 | 0.59 | 0.80 | 0.82 | 0.82 |
-20 | -0.50 | -0.50 | -0.50 | -0.22 | -0.01 | 0.30 | 0.39 | 0.62 | 0.80 | 0.82 | 0.82 |
0 | -0.50 | -0.50 | -0.50 | -0.50 | -0.34 | 0.00 | 0.29 | 0.51 | 0.82 | 0.84 | 0.84 |
20 | -0.60 | -0.60 | -0.60 | -0.27 | -0.04 | 0.28 | 0.30 | 0.52 | 0.80 | 0.82 | 0.82 |
40 | -0.61 | -0.61 | -0.60 | -0.27 | -0.04 | 0.44 | 0.54 | 0.53 | 0.80 | 0.83 | 0.83 |
60 | -0.50 | -0.50 | -0.50 | -0.50 | -0.16 | 0.44 | 0.59 | 0.83 | 0.84 | 0.84 | 0.84 |
80 | -0.50 | -0.50 | -0.50 | -0.50 | -0.16 | 0.44 | 0.59 | 0.83 | 0.84 | 0.84 | 0.84 |
100 | -0.50 | -0.50 | -0.50 | -0.50 | -0.16 | 0.44 | 0.59 | 0.83 | 0.84 | 0.84 | 0.84 |
Wherein, the first row of the fuzzy evaluator feedback table is the vehicle passing busy degree change rate F', and the first column is the vehicle queuing length Lr'. The row and column correspond to a reward and punishment value r used for Q learning updating. When the timing scheme control is implemented, the traffic busy degree is reduced, the queuing length is reduced, positive feedback is obtained, and otherwise, corresponding negative feedback is obtained.
The intersection optimization control implementation process comprises the following steps:
1) and the Q table provides a random timing scheme according to the current phase road condition.
2) And after the traffic flow state is changed, acquiring corresponding traffic state parameters, calculating the corresponding traffic parameters through a fusion function, and determining the traffic state section through state division for updating the Q table by a Q learning algorithm.
3) And in the effective time of the green light, comparing the acquired traffic state parameter with the traffic state parameter at the last moment, and transmitting the difference value into a fuzzy evaluator to inquire so as to obtain a reward and punishment value.
4) And before the next phase begins (the green interval is set to be 2 seconds), updating the state section corresponding to the control table according to the formula 2 and the reward and punishment value.
5) And after the current green lamp phase timing scheme is finished, switching to the next phase.
6) And inquiring a state set in the Q table according to the current traffic state, and giving a timing scheme according to a selection strategy after matching corresponding items.
7) Repeating the steps 2) to 6).
The form of the Q table after completion of learning update is shown in table 3:
TABLE 3 updated Q-Table
25 | 30 | 35 | 40 | 45 | 50 | 55 | 60 | 65 | 70 | 75 | |
(14,15) | 3.45 | 3.34 | 3.07 | 2.558 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
(13,14) | 3.23 | 3.33 | 2.93 | 2.368 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
(12,13) | 2.73 | 2.95 | 3.05 | 2.168 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
(11,12) | 0 | 0 | 2.51 | 2.118 | 1.992 | 1.99 | 0 | 0 | 0 | 0 | 0 |
(9,11) | 0 | 0 | 2 | 2.44 | 1.952 | 1.95 | 0 | 0 | 0 | 0 | 0 |
(7,9) | 0 | 0 | 2.11 | 2.369 | 1.952 | 1.95 | 0 | 0 | 0 | 0 | 0 |
(5,7) | 0 | 0 | 0 | 0 | 1.942 | 1.94 | 1.54 | 1.4 | 0 | 0 | 0 |
(3,5) | 0 | 0 | 0 | 0 | 1.842 | 1.89 | 1.44 | 1.4 | 0 | 0 | 0 |
(2,3) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.1 | 0.5 | 0.2 | 0.1 |
(1,2) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.8 | 0.3 | 0.1 | 0 |
(0,1) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.8 | 0.2 | 0 | 0 |
In the embodiment, VISSIM is used as an experimental simulation platform, the maximum queuing length of the vehicles is set to be 200m, and the range of the vehicle traffic busyness F is 0-1. The minimum active duration of the green light is set to 25 seconds and at most 75 seconds. The saturated flow of each lane at the intersection is 1500 (veh/h). The attenuation factor γ in equation 2 is set to 0.9, and the learning efficiency α is set to 0.7.
And establishing a small traffic intersection, and carrying out a simulation experiment by taking the small traffic intersection as an analysis object. Each road section comprises four lanes, one is a left-turning lane, two straight lanes and one is a right-turning lane. The turning probability of the vehicle at the intersection is 30% of left turning, 40% of straight going and 30% of right turning respectively. The interior sections are each 200m in length. The average running speed of the vehicle is 40 km/h. The traffic flow of the intersection lanes is shown in table 4:
TABLE 4 intersection traffic flow
For comparison, the traditional reinforcement learning is additionally adopted to control traffic in the simulation process, the VISSIM is used for collecting control effects, and the results are shown in the following tables 5 and 6:
TABLE 5 traditional reinforcement learning control result feedback (2 bits after decimal point)
TABLE 6 control result feedback based on Q learning improvement method (2 bits after decimal point)
Analysis of the tables shows that after the improved traffic control method based on Q learning is used, the overall delay of the vehicle is shortened by about 32 percent compared with the traditional reinforcement learning control; in addition, the indexes such as queuing length, total parking times and the like are superior to those of the traditional reinforcement learning control.
In addition to the above embodiments, the present invention may have other embodiments. All technical solutions formed by adopting equivalent substitutions or equivalent transformations fall within the protection scope of the claims of the present invention.
Claims (8)
1. A traffic signal self-adaptive control system is characterized by comprising a traffic state sensing module, a parameter fusion and division module, a Q learning module, a control decision module, a Q table and a fuzzy evaluator;
the traffic state sensing module is used for acquiring traffic flow state information of intersections and respectively transmitting the traffic flow state information to the parameter fusion and division module and the fuzzy evaluator;
the parameter fusion and division module is used for fusing the received traffic flow state information to obtain traffic parameters, dividing the traffic parameters into corresponding state segment sets through state division to serve as a basis for inquiring a timing scheme in a Q table and a parameter for updating a state space through the Q learning module;
the fuzzy evaluator is used for inquiring a reward and punishment value feedback table according to the traffic flow state information collected by the traffic state induction module and feeding the inquired reward and punishment value back to the Q learning module;
the Q learning module is used for taking a state section set corresponding to the traffic parameters transmitted by the parameter fusion and division module and a reward and punishment value fed back by the fuzzy evaluator as a basis for updating the Q table of the Q learning module and updating the Q value of the previous state;
the Q table is used for storing each traffic parameter state segment set, the timing scheme and the corresponding selected Q value and can be updated through the learning module;
the control decision module is used for providing a corresponding scheme for the signal lamp of the current phase from the result transmitted by the Q table according to the selection strategy;
the method for establishing the fusion function in the parameter fusion and division module comprises the following steps:
first, w is equal to (F)g, Lg,Lr) Defining a state vector set with three dimensions of green light phase traffic busy degree, green light phase vehicle queuing length and red light phase vehicle queuing length;
establishing a function according to the green light phase vehicle queuing length: s1=Lmax-LgWherein L isgVehicle queue length for green light phase; l ismaxVehicle queue length limit acceptable for the road;
establishing a function according to the queuing length of the vehicles in the red light phase: s2=k1s1+k2LrWherein L isrFor red light phase vehicle queue length, k1,k2Is a scale factor, and k1>k2;
2. The adaptive traffic signal control system of claim 1, wherein the traffic flow status information comprises traffic flow information, last cycle vehicle queue length, and intersection phase vehicle delay time.
4. The adaptive traffic signal control system according to claim 1, wherein the fuzzy evaluator selects a green light phase traffic busy rate F' and a red light phase vehicle queue length rate Lr' as an input; the output of the fuzzy evaluator is a reward punishment signal value, the output is deblurred through a gravity center method, and the reward punishment signal value range is (-1, 1); the input and output variables are all divided in a five-level fuzzy division mode, namely { 'negative large', 'negative small', 'middle', 'positive small', 'positive large' } represent the vehicle queue length change rate and the green light phase traffic busy rate of five different red light phases, are respectively marked as { NB, NS, ZO, PS and PB }, and are represented by a triangular membership function; and when the traffic busy degree change rate of the green light phase and the vehicle queue length change rate of the red light phase exceed half of a preset threshold value, judging that the influence generated by the timing scheme is maximum.
6. The adaptive traffic signal control system according to claim 1, wherein the Q-table stores the timing strategies under different traffic states in a table form, wherein the first column stores a set of state segments of the traffic states, the first row stores different phase green light timing schemes, the Q-table initializes each timing strategy in advance, and in the subsequent iteration process, the Q-learning module continuously updates the timing strategies until the optimal timing strategies under different traffic states are obtained.
7. The adaptive traffic signal control system according to claim 1 or 6, wherein the learning and updating formula of the Q table is as follows:
Q(S,a)←Q(S,a)+α[r+γmaxa′Q(S′,a′)];
wherein S is a state segment set of a traffic state S, and a is a traffic timing scheme; q (S, a) represents the selection basis under the current state set S; α is learning efficiency, and the higher a represents that Q (S, a) is influenced more by the next state; r is a feedback, namely a reward penalty value, after the timing scheme a is executed; s ' represents the next state set, and Q (S ', a ') represents the selection strategy in the next state set; maxa′Q (S ', alpha') represents the best selection strategy estimated by the next state set; γ represents the degree of attenuation.
8. A control method of the traffic signal adaptive control system according to claim 1, comprising the steps of:
s1, the traffic state sensing module collects the traffic flow state information of the traffic intersection at the current moment and respectively transmits the traffic flow state information to the parameter fusion state and division module and the fuzzy evaluator;
s2, the parameter fusion state and division module fuses the received traffic flow state information to obtain traffic parameters, the traffic parameters are divided into corresponding state segment sets through state division, and then the obtained state segments are transmitted to the Q table and the Q learning module;
s3, the Q table inquires all timing schemes suitable for the current traffic state according to the received state section, transmits all the adapted timing schemes to the control decision module, and selects a suitable timing scheme through the selection strategy of the control decision module;
s4, when the timing scheme is executed and the traffic flow changes, the sensing module collects the current traffic flow state information again, and acquires the corresponding traffic parameters and the divided state segment set through the parameter fusion state and division module;
s5, the fuzzy evaluator inquires a feedback table according to the traffic flow state information to obtain a reward and punishment value, and feeds the reward and punishment value back to the Q learning module;
s6, the Q learning module updates a Q table according to the changed traffic state section set and the reward and punishment values;
and S7, switching to the next phase after the current green lamp phase is finished, and repeating the steps S1 to S6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810125948.8A CN108335497B (en) | 2018-02-08 | 2018-02-08 | Traffic signal self-adaptive control system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810125948.8A CN108335497B (en) | 2018-02-08 | 2018-02-08 | Traffic signal self-adaptive control system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108335497A CN108335497A (en) | 2018-07-27 |
CN108335497B true CN108335497B (en) | 2021-09-14 |
Family
ID=62928553
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810125948.8A Active CN108335497B (en) | 2018-02-08 | 2018-02-08 | Traffic signal self-adaptive control system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108335497B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4307270A1 (en) * | 2022-07-14 | 2024-01-17 | Kapsch TrafficCom AG | Method and server for controlling traffic lights |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109035812B (en) * | 2018-09-05 | 2021-07-27 | 平安科技(深圳)有限公司 | Traffic signal lamp control method and device, computer equipment and storage medium |
CN109544913A (en) * | 2018-11-07 | 2019-03-29 | 南京邮电大学 | A kind of traffic lights dynamic timing algorithm based on depth Q e-learning |
US10964207B2 (en) * | 2018-11-19 | 2021-03-30 | Fortran Traffic Systems Limited | Systems and methods for managing traffic flow using connected vehicle data |
CN110085038B (en) * | 2019-04-26 | 2020-11-27 | 同济大学 | Intersection self-adaptive signal control method based on real-time queuing information |
CN110428615B (en) * | 2019-07-12 | 2021-06-22 | 中国科学院自动化研究所 | Single intersection traffic signal control method, system and device based on deep reinforcement learning |
CN111081035A (en) * | 2019-12-17 | 2020-04-28 | 扬州市鑫通智能信息技术有限公司 | Traffic signal control method based on Q learning |
CN111311996A (en) * | 2020-03-27 | 2020-06-19 | 湖南有色金属职业技术学院 | Online education informationization teaching system based on big data |
CN111564048A (en) * | 2020-04-28 | 2020-08-21 | 郑州大学 | Traffic signal lamp control method and device, electronic equipment and storage medium |
CN111613072A (en) * | 2020-05-08 | 2020-09-01 | 上海数道信息科技有限公司 | Intelligent signal lamp timing optimization method, device, equipment, system and medium |
CN113506450B (en) * | 2021-07-28 | 2022-05-17 | 浙江海康智联科技有限公司 | Qspare-based single-point signal timing scheme selection method |
CN113870590A (en) * | 2021-09-23 | 2021-12-31 | 福建船政交通职业学院 | Wireless control method and system for traffic flow |
CN114202935B (en) * | 2021-11-16 | 2023-04-28 | 广西中科曙光云计算有限公司 | Time distribution method and device for intersection signal lamps based on cloud network |
CN114120672B (en) * | 2021-11-19 | 2022-10-25 | 大连海事大学 | Heterogeneous intersection scene traffic signal control method based on multi-agent reinforcement learning |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB201009974D0 (en) * | 2010-06-15 | 2010-07-21 | Trinity College Dublin | Decentralised autonomic system and method for use inan urban traffic control environment |
CN104933876B (en) * | 2015-06-03 | 2018-03-13 | 浙江师范大学 | A kind of control method of adaptive smart city intelligent traffic signal |
CN105654744B (en) * | 2016-03-10 | 2018-07-06 | 同济大学 | A kind of improvement traffic signal control method based on Q study |
CN106846836B (en) * | 2017-02-28 | 2019-05-24 | 许昌学院 | A kind of Single Intersection signal timing control method and system |
CN107393319B (en) * | 2017-08-31 | 2020-06-19 | 长安大学 | Signal optimization control method for preventing single cross port queuing overflow |
-
2018
- 2018-02-08 CN CN201810125948.8A patent/CN108335497B/en active Active
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4307270A1 (en) * | 2022-07-14 | 2024-01-17 | Kapsch TrafficCom AG | Method and server for controlling traffic lights |
Also Published As
Publication number | Publication date |
---|---|
CN108335497A (en) | 2018-07-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108335497B (en) | Traffic signal self-adaptive control system and method | |
CN108510764B (en) | Multi-intersection self-adaptive phase difference coordination control system and method based on Q learning | |
CN111696370B (en) | Traffic light control method based on heuristic deep Q network | |
CN111785045B (en) | Distributed traffic signal lamp combined control method based on actor-critic algorithm | |
CN104766484B (en) | Traffic Control and Guidance system and method based on Evolutionary multiobjective optimization and ant group algorithm | |
CN112489464B (en) | Crossing traffic signal lamp regulation and control method with position sensing function | |
CN105118308B (en) | Urban road intersection traffic signal optimization method based on cluster intensified learning | |
CN110047278A (en) | A kind of self-adapting traffic signal control system and method based on deeply study | |
CN112700642B (en) | Method for improving traffic passing efficiency by using intelligent internet vehicle | |
CN112950963B (en) | Self-adaptive signal control optimization method for main branch intersection of city | |
CN109269516B (en) | Dynamic path induction method based on multi-target Sarsa learning | |
CN114973650B (en) | Vehicle ramp entrance confluence control method, vehicle, electronic device and storage medium | |
Kurzer et al. | Decentralized cooperative planning for automated vehicles with continuous monte carlo tree search | |
Li | Multi-agent deep deterministic policy gradient for traffic signal control on urban road network | |
CN115016537B (en) | Heterogeneous unmanned aerial vehicle configuration and task planning combined optimization method in SEAD scene | |
Hussain et al. | Optimizing traffic lights with multi-agent deep reinforcement learning and v2x communication | |
CN113724507B (en) | Traffic control and vehicle guidance cooperative method and system based on deep reinforcement learning | |
CN114384916A (en) | Adaptive decision-making method and system for off-road vehicle path planning | |
Luo et al. | Researches on intelligent traffic signal control based on deep reinforcement learning | |
CN102902824B (en) | System and method for searching traffic route | |
CN116189454A (en) | Traffic signal control method, device, electronic equipment and storage medium | |
Zhao et al. | Learning multi-agent communication with policy fingerprints for adaptive traffic signal control | |
Li et al. | Multi-intersections traffic signal intelligent control using collaborative q-learning algorithm | |
CN113487870A (en) | Method for generating anti-disturbance to intelligent single intersection based on CW (continuous wave) attack | |
Li et al. | Adaptive dynamic neuro-fuzzy system for traffic signal control |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |