CN108335497B - Traffic signal self-adaptive control system and method - Google Patents

Traffic signal self-adaptive control system and method Download PDF

Info

Publication number
CN108335497B
CN108335497B CN201810125948.8A CN201810125948A CN108335497B CN 108335497 B CN108335497 B CN 108335497B CN 201810125948 A CN201810125948 A CN 201810125948A CN 108335497 B CN108335497 B CN 108335497B
Authority
CN
China
Prior art keywords
traffic
state
module
phase
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810125948.8A
Other languages
Chinese (zh)
Other versions
CN108335497A (en
Inventor
罗杰
刘成建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201810125948.8A priority Critical patent/CN108335497B/en
Publication of CN108335497A publication Critical patent/CN108335497A/en
Application granted granted Critical
Publication of CN108335497B publication Critical patent/CN108335497B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • G08G1/08Controlling traffic signals according to detected number or speed of vehicles

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Traffic Control Systems (AREA)
  • Feedback Control In General (AREA)

Abstract

A traffic signal adaptive control system and method, aiming at the complexity of traffic control, designs a reward and punishment mechanism of a learning system by adopting a fuzzy technology, and more accurately and reasonably reflects the change of a traffic state; meanwhile, a Q learning state space is constructed by using experience-based state division, the updating complexity of the Q learning state space is reduced on the premise of keeping multi-parameter evaluation traffic states by establishing a traffic parameter fusion function, and a green light timing scheme based on a phase is given, so that the real-time response control of the traffic flow is finally achieved. The invention can effectively reduce the response time of traffic jam, quickly coordinate the signal control of each phase and improve the traffic efficiency of the intersection; due to the characteristic of no model, the method has strong self-adaptive capacity and universality; the system simplifies the storage form of the parameter indexes in the Q table, gives consideration to the learning effect and the response speed of the system on the traffic state, and reduces the complexity of control.

Description

Traffic signal self-adaptive control system and method
Technical Field
The invention belongs to the technical field of intelligent traffic, and particularly relates to a traffic signal self-adaptive control system and method.
Background
At present, the urban traffic flow is increasing at a high speed, and along with the high-speed development of traffic, an effective intelligent traffic management control technology is urgently needed. In recent years, artificial intelligence technology has been greatly developed in the aspect of traffic control application, and a new technical scheme is provided for solving the problem of urban traffic control. The self-adaptive control system can adjust the time-setting parameters in real time according to the control target of a manager and the characteristics of time-varying traffic flow at the intersection, can better utilize the overall traffic capacity of a road network compared with timing and induction control, effectively improves the traffic efficiency of the road network, and is an effective means for solving urban traffic jam.
According to the knowledge of the applicant, the existing self-adaptive traffic signal timing method, such as fuzzy control, neural network, evolutionary algorithm, expert system and the like, cannot adapt to the real-time changeable intersection traffic flow characteristics to a certain extent, and is difficult to efficiently adjust and control signal timing according to the real-time traffic state under the condition of effectively considering various indexes. In addition, most adaptive traffic control methods need to establish a complex traffic model, which increases the difficulty in implementing the method to some extent. Different from the method, the self-adaptive traffic control method based on reinforcement learning can be more suitable for the change of the traffic environment of the urban intersection. The reinforcement learning does not need an accurate model of a traffic environment, and a better traffic signal timing strategy can be effectively obtained in a random traffic environment. Reinforcement learning is to obtain feedback with uncertainty based on behavior after interaction with external environment, and update the state-behavior value associated with the feedback, so as to obtain the optimal control strategy. Among them, Q learning is the most typical, and the Q learning algorithm is a reinforcement learning algorithm independent of the currently effective model.
However, the application of the Q-learning algorithm to traffic control still has some disadvantages. The existing Q learning intersection traffic self-adaptive control method does not fully utilize various traffic state parameters under a complex traffic state, and is difficult to give accurate and reasonable feedback to the change of the traffic flow; in addition, for the adaptive control of the Q learning with an indefinite period, the state space is too large, so that the learning efficiency is low, and effective control is difficult to form.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a traffic signal self-adaptive control system and a traffic signal self-adaptive control method.
In order to achieve the aim, the self-adaptive control system for the traffic signals comprises a traffic state induction module, a parameter fusion and division module, a Q learning module, a control decision module, a Q table and a fuzzy evaluator;
the traffic state sensing module is used for acquiring traffic flow state information of the intersection and respectively transmitting the traffic flow state information to the parameter fusion and division module and the fuzzy evaluator;
the parameter fusion and division module is used for fusing the received traffic flow state information to obtain traffic parameters, dividing the traffic parameters into corresponding state segment sets through state division, and using the traffic parameters as a basis for inquiring a timing scheme in the Q table and a parameter for updating a state space through the Q learning module;
the fuzzy evaluator is used for inquiring the reward and punishment value feedback table according to the traffic flow state information collected by the traffic state induction module and feeding the inquired reward and punishment value back to the Q school module;
the Q learning module is used for taking the state section set corresponding to the traffic parameters transmitted by the parameter fusion and division module and the reward and punishment value fed back by the fuzzy evaluator as the basis for updating the Q table by the Q learning module and updating the Q value of the last state;
the Q table is used for storing each traffic parameter state section set, the timing scheme and the corresponding selected Q value and can be updated through the learning module;
and the control decision module is used for providing a corresponding scheme for the signal lamp of the current phase from the result transmitted by the Q table according to the selection strategy.
The preferred scheme of the invention is as follows: the traffic flow state information comprises traffic flow information, the vehicle queuing length in the last period and the vehicle delay time of each phase at the intersection.
Preferably, the method for establishing the fusion function in the parameter fusion and division module is as follows:
first, w is equal to (F, L)g,Lr) Defining a state vector set with three dimensions of green light phase traffic busy degree, green light phase vehicle queuing length and red light phase vehicle queuing length;
establishing a function according to the green light phase vehicle queuing length: s1=Lmax-LgWherein L isgVehicle queue length for green light phase; l ismaxVehicle queue length limit acceptable for the road;
establishing a function according to the queuing length of the vehicles in the red light phase: s3=K1s1+K2LrWherein L isrFor red light phase vehicle queue length, k1,k2Is a scale factor, and k1>k2
And (3) establishing a final fusion function by combining the traffic busyness of the green light phase:
Figure BDA0001573476920000031
more preferably, the scale factor k1Is composed of
Figure BDA0001573476920000032
The scale factor k2Is composed of
Figure BDA0001573476920000033
Wherein, TminAnd TmaxThe shortest and longest effective time of the green lamp phase.
Preferably, the fuzzy evaluator selects the green light phase traffic busy degree change rate F' and the red light phase vehicle queue length change rate Lr' as an input; the output of the fuzzy evaluator is a reward punishment signal value, the output is deblurred through a gravity center method, and the reward punishment signal value range is (-1, 1); the input and output variables are all divided in a five-level fuzzy division mode, namely { 'negative large', 'negative small', 'middle', 'positive small', 'positive large' } represent the vehicle queue length change rate and the green light phase traffic busy rate of five different red light phases, are respectively marked as { NB, NS, ZO, PS and PB }, and are represented by a triangular membership function; and when the traffic busy degree change rate of the green light phase and the vehicle queue length change rate of the red light phase exceed half of a preset threshold value, judging that the influence generated by the timing scheme is maximum.
More preferably, the formula of the green light phase traffic busy degree change rate F' is
Figure BDA0001573476920000035
Where q is the traffic throughput of the green phase over the decision duration and st is the saturation flow of the green phase over the decision duration.
Preferably, the Q table stores the timing strategies under different traffic states in a table form, wherein the first column stores a state segment set of the traffic state, the first row stores different phase green light timing schemes, the Q table initializes each timing strategy in advance, and in the subsequent iteration process, the Q learning module continuously updates the timing strategies until the optimal timing strategies under different traffic states are obtained
Preferably, the learning update formula of the q-table is:
Q(S,a)←Q(S,a)+α[r+γmaxa′Q(S′,a′)];
wherein S is a state segment set of a traffic state S, and a is a traffic timing scheme; q (S, a) represents the selection basis under the current state set S; alpha is learning efficiency, and the higher alpha represents that Q (S, a) is subjected to the next learningThe greater the individual state impact; r is a feedback, namely a reward penalty value, after the timing scheme a is executed; s ' represents the next state set, and Q (S ', a ') represents the selection strategy in the next state set; maxa'a(S ', a') then represents the best selection strategy estimated for the next set of states; γ represents the degree of attenuation.
The invention also provides a traffic signal self-adaptive control method, which comprises the following steps:
s1, the traffic state sensing module collects the traffic flow state information of the traffic intersection at the current moment and respectively transmits the traffic flow state information to the parameter fusion state and division module and the fuzzy evaluator;
s2, the parameter fusion state and division module fuses the received traffic flow state information to obtain traffic parameters, the traffic parameters are divided into corresponding state segment sets through state division, and then the obtained state segments are transmitted to the Q table and the Q learning module;
s3, the Q table inquires all timing schemes suitable for the current traffic state according to the received state section, transmits all the adapted timing schemes to the control decision module, and selects a suitable timing scheme through the selection strategy of the control decision module;
s4, when the timing scheme is executed and the traffic flow changes, the sensing module collects the current traffic flow state information again, and acquires the corresponding traffic parameters and the divided state segment set through the parameter fusion state and division module;
s5, the fuzzy evaluator inquires a feedback table according to the traffic flow state information to obtain a reward and punishment value, and feeds the reward and punishment value back to the Q learning module;
s6, the Q learning module updates a Q table according to the changed traffic state section set and the reward and punishment values;
and S7, switching to the next phase after the current green lamp phase is finished, and repeating the steps S1 to S6.
The invention has the beneficial effects that: the invention can effectively reduce the response time of traffic jam, quickly coordinate the signal control of each phase and improve the traffic efficiency of the intersection; due to the characteristic of no model, the method has strong self-adaptive capacity and universality; meanwhile, the storage form of the parameter indexes in the Q table is simplified, the learning effect and the response speed of the system on the traffic state are considered, and the control complexity is reduced;
drawings
The invention will be further described with reference to the accompanying drawings.
FIG. 1 is a schematic diagram of the system of the present invention;
FIG. 2 is a schematic flow diagram of the process of the present invention;
FIG. 3 is a schematic diagram of a four-phase intersection of the present invention;
FIG. 4 is a schematic diagram of the fuzzy evaluator structure of the present invention.
Detailed Description
Example one
Referring to fig. 1, the present embodiment provides a traffic signal adaptive control system and method, including a traffic state sensing module, a parameter fusion and division module, a Q learning module, a control decision module, a Q table, and a fuzzy evaluator;
the traffic state sensing module is used for acquiring traffic flow state information of the intersection through a sensing or image processing technology and respectively transmitting the traffic flow state information to the parameter fusion and division module and the fuzzy evaluator;
the parameter fusion and division module is used for fusing the received traffic flow state information to obtain a traffic parameter S, dividing the traffic parameter S into corresponding state segment sets S through state division, and using the traffic parameter S as a basis for inquiring a timing scheme in the Q table and a parameter for updating a state space through the Q learning module;
the fuzzy evaluator is used for inquiring the reward and punishment value feedback table according to the traffic flow state information collected by the traffic state induction module and feeding the inquired reward and punishment value back to the Q school module;
the Q learning module is used for taking the state section set S corresponding to the traffic parameters transmitted by the parameter fusion and division module and the reward and punishment value r fed back by the fuzzy evaluator as the basis for updating the Q table by the Q learning module and updating the Q value of the previous state;
the Q table is used for storing timing strategies Q (S, a) under different traffic states in a table form, wherein the first column stores a state section set of the traffic states, the first row stores different phase green light timing schemes, the Q table initializes each Q (S, a) in advance according to experience, and in the subsequent iteration process, the Q learning module continuously updates the Q (S, a) until the optimal timing strategies under different traffic states are obtained;
and the control decision module is used for providing a corresponding scheme for the signal lamp of the current phase from the result transmitted by the Q table according to the selection strategy.
Referring to fig. 2, the traffic signal adaptive control method based on the control system includes the following steps:
s1, the traffic state sensing module collects the traffic flow state information of the traffic intersection at the current moment and respectively transmits the traffic flow state information to the parameter fusion state and division module and the fuzzy evaluator;
s2, the parameter fusion state and division module fuses the received traffic flow state information to obtain traffic parameters, the traffic parameters are divided into corresponding state segment sets through state division, and then the obtained state segments are transmitted to the Q table and the Q learning module;
s3, the Q table inquires all timing schemes suitable for the current traffic state according to the received state section, transmits all the adapted timing schemes to the control decision module, and selects a suitable timing scheme through the selection strategy of the control decision module;
s4, when the timing scheme is executed and the traffic flow changes, the sensing module collects the current traffic flow state information again, and acquires the corresponding traffic parameters and the divided state segment set through the parameter fusion state and division module;
s5, the fuzzy evaluator inquires a feedback table according to the traffic flow state information to obtain a reward and punishment value, and feeds the reward and punishment value back to the Q learning module;
s6, the Q learning module updates a Q table according to the changed traffic state section set and the reward and punishment values;
and S7, switching to the next phase after the current green lamp phase is finished, and repeating the steps S1 to S6.
The system and the method can effectively reduce the response time of traffic jam, quickly coordinate the signal control of each phase and improve the traffic efficiency of the intersection; due to the characteristic of no model, the method has strong self-adaptive capacity and universality; meanwhile, the storage form of the parameter indexes in the Q table is simplified, the learning effect and the response speed of the system on the traffic state are considered, and the control complexity is reduced.
The intersection model of the present embodiment is shown in fig. 3, and the intersection traffic model is composed of four phases: straight in the east-west direction, left turn in the east-west direction, straight in the north-south direction, left turn in the north-south direction. The vehicle right turn is merged into straight ahead. Traffic flow a at each phase of intersection1,a2,a3,a4,a5,a6,a7,a8
The traffic flow state information comprises traffic flow information, the vehicle queuing length in the last period and the vehicle delay time of each phase at the intersection.
The method for establishing the fusion function in the parameter fusion and division module comprises the following steps:
first, w is equal to (F, L)g,Lr) Defining a state vector set with three dimensions of green light phase traffic busy degree, green light phase vehicle queuing length and red light phase vehicle queuing length;
establishing a function according to the green light phase vehicle queuing length: s1=Lmax-LgWherein L isgVehicle queue length for green light phase; l ismaxVehicle queue length limit acceptable for the road;
establishing a function according to the queuing length of the vehicles in the red light phase: s2=k1s1+k2LrWherein L isrFor red light phase vehicle queue length, k1,k2Is a scale factor, and k1>k2
And (3) establishing a final fusion function by combining the traffic busyness of the green light phase:
Figure BDA0001573476920000071
scale factor k1Is composed of
Figure BDA0001573476920000072
The scale factor k2Is composed of
Figure BDA0001573476920000073
Wherein, TminAnd TmaxThe shortest and longest effective time of the green lamp phase.
The fuzzy evaluator selects the traffic busy rate F' of the green light phase and the vehicle queue length rate L of the red light phaser' as an input; the output of the fuzzy evaluator is a reward punishment signal value, the output is deblurred through a gravity center method, and the reward punishment signal value range is (-1, 1); the input and output variables are all divided in a five-level fuzzy division mode, namely { 'negative large', 'negative small', 'middle', 'positive small', 'positive large' } represent the vehicle queue length change rate and the green light phase traffic busy rate of five different red light phases, are respectively marked as { NB, NS, ZO, PS and PB }, and are represented by a triangular membership function; when the traffic busy degree change rate of the green light phase and the vehicle queue length change rate of the red light phase exceed half of a preset threshold value, the influence generated by the timing scheme is judged to be maximum, and therefore LrThe discourse range of' is set to (-100, 100); the domain of F' was set to (-0.5, 0.5). The fuzzy evaluator has only one output which is a reward punishment signal value r, the fuzzy output is deblurred by adopting a gravity center method, and the finally output reward punishment signal value range is (-1, 1).
The formula of the green light phase traffic busy degree change rate F' is
Figure BDA0001573476920000074
Where q is the traffic throughput of the green phase over the decision duration and st is the saturation flow of the green phase over the decision duration.
The Q table stores the timing strategies under different traffic states in a table form, wherein the first column stores a state section set of the traffic states, the first row stores different phase green light timing schemes, the Q table initializes each timing strategy in advance, and in the subsequent iteration process, the Q learning module continuously updates the timing strategies until the optimal timing strategies under different traffic states are obtained.
After the timing scheme is executed for a period of time, the traffic state sensing module senses that the traffic flow of the phase is increased, the traffic busy degree of the phase is increased, the vehicle queuing length is increased, and the average delay of the vehicles is increased. The timing scheme given by the Q learning module does not have a good effect, so that negative feedback is obtained from the fuzzy evaluator, timing selection under the road condition is updated according to the Q value, and then when the same traffic state is sensed again, a more appropriate timing scheme is selected according to a learned strategy, and the traffic degree of the intersection is improved. By the mode, the control system continuously detects the current traffic condition, learns in real time and continuously updates the timing scheme selection basis in the Q table, so that the traffic flow of the intersection tends to be optimal. And when the learning and updating of the Q table are completed and the system is relatively stable, the system can directly query the Q table according to the traffic state section S obtained after parameter fusion and state division to obtain an optimal timing scheme.
The learning update formula of the q table is as follows:
Q(S,a)←Q(S,a)+α[r+γmaxa′Q(S′,a′)];
wherein S is a state segment set of a traffic state S, and a is a traffic timing scheme; q (S, a) represents the current shape
Selecting basis under the state set S; α is learning efficiency, and the higher α represents that Q (S, a) is more influenced by the next state; r is a feedback, namely a reward penalty value, after the timing scheme a is executed; s ' represents the next state set, and Q (S ', a ') represents the selection strategy in the next state set; maxa' Q (S ', a ') represents the best selection strategy estimated for the next set of states; γ represents the degree of attenuation.
The specific steps of Q table and fuzzy evaluator design are as follows:
1) designing Q table to obtain the parameters (F) of controlled road junctiong,Lg,Lr) Obtaining a traffic parameter s according to the parameter values, and obtaining a traffic parameter according to the traffic parameter sDividing the size of S into different state segment sets S;
2) according to the time distribution experience of the intersection, if the traffic busyness of the intersection at the current phase is low, a relatively short green light time length should be allocated to the phase; when the busy level increases, the green duration assigned to this phase should be increased accordingly. It is not reasonable to configure a phase green time too high or too low for a traffic situation. Therefore, as shown in table 1, the present embodiment empirically configures 4 different green light timing schemes for each traffic state in the current phase. Thus, those poor options can be directly ignored when the Q-table is updated. The method is focused on learning and updating a plurality of better timing schemes under the traffic state, so that the optimal phase green light timing scheme is obtained. According to the above rules, the initial Q-table is designed as shown in Table 1.
TABLE 1 initial Q-table
25 30 35 40 45 50 55 60 65 70 75
(14,15) 1 1 1 1 0 0 0 0 0 0 0
(13,14) 1 1 1 1 0 0 0 0 0 0 0
(12,13) 1 1 1 1 0 0 0 0 0 0 0
(11,12) 0 0 1 1 1 1 0 0 0 0 0
(9,11) 0 0 1 1 1 1 0 0 0 0 0
(7,9) 0 0 1 1 1 1 0 0 0 0 0
(5,7) 0 0 0 0 1 1 1 1 0 0 0
(3,5) 0 0 0 0 1 1 1 1 0 0 0
(2,3) 0 0 0 0 0 0 0 1 1 1 1
(1,2) 0 0 0 0 0 0 0 1 1 1 1
(0,1) 0 0 0 0 0 0 0 1 1 1 1
Wherein, the first row of the initial Q table is a timing scheme a, and the first column is a state segment set S. In the simulation process of the embodiment, the maximum queuing length of the vehicles is set to be 200m, and the range of the vehicle traffic busy degree F is 0-1. The minimum active duration of the green light is set to 25 seconds and at most 75 seconds. The calculated traffic parameter s ranges from 0 to 15, wherein the larger s is, the better the road traffic condition is represented. And 4 different timing schemes are selected according to experience under each traffic state set S in the initial Q table for traffic control.
3) As shown in fig. 4, a fuzzy evaluator, i.e., a reward and punishment function for Q learning update, is established according to each parameter of the intersection and the fuzzy rule. To ensure the accuracy of the evaluation, the steps of the fuzzy input were set to 0.5 and 20, respectively, in MATLAB, resulting in a table of size 11 × 11, the form of which is shown in table 2 below.
TABLE 2 fuzzy evaluator feedback Table
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5
-100 -0.84 -0.84 -0.84 -0.55 -0.34 -0.01 0.29 0.51 0.82 0.84 0.84
-80 -0.84 -0.84 -0.84 -0.55 -0.34 -0.01 0.29 0.51 0.82 0.84 0.84
-60 -0.84 -0.84 -0.84 -0.55 -0.34 -0.01 0.29 0.51 0.82 0.84 0.84
-40 -0.55 -0.55 -0.54 -0.27 -0.06 0.27 0.36 0.59 0.80 0.82 0.82
-20 -0.50 -0.50 -0.50 -0.22 -0.01 0.30 0.39 0.62 0.80 0.82 0.82
0 -0.50 -0.50 -0.50 -0.50 -0.34 0.00 0.29 0.51 0.82 0.84 0.84
20 -0.60 -0.60 -0.60 -0.27 -0.04 0.28 0.30 0.52 0.80 0.82 0.82
40 -0.61 -0.61 -0.60 -0.27 -0.04 0.44 0.54 0.53 0.80 0.83 0.83
60 -0.50 -0.50 -0.50 -0.50 -0.16 0.44 0.59 0.83 0.84 0.84 0.84
80 -0.50 -0.50 -0.50 -0.50 -0.16 0.44 0.59 0.83 0.84 0.84 0.84
100 -0.50 -0.50 -0.50 -0.50 -0.16 0.44 0.59 0.83 0.84 0.84 0.84
Wherein, the first row of the fuzzy evaluator feedback table is the vehicle passing busy degree change rate F', and the first column is the vehicle queuing length Lr'. The row and column correspond to a reward and punishment value r used for Q learning updating. When the timing scheme control is implemented, the traffic busy degree is reduced, the queuing length is reduced, positive feedback is obtained, and otherwise, corresponding negative feedback is obtained.
The intersection optimization control implementation process comprises the following steps:
1) and the Q table provides a random timing scheme according to the current phase road condition.
2) And after the traffic flow state is changed, acquiring corresponding traffic state parameters, calculating the corresponding traffic parameters through a fusion function, and determining the traffic state section through state division for updating the Q table by a Q learning algorithm.
3) And in the effective time of the green light, comparing the acquired traffic state parameter with the traffic state parameter at the last moment, and transmitting the difference value into a fuzzy evaluator to inquire so as to obtain a reward and punishment value.
4) And before the next phase begins (the green interval is set to be 2 seconds), updating the state section corresponding to the control table according to the formula 2 and the reward and punishment value.
5) And after the current green lamp phase timing scheme is finished, switching to the next phase.
6) And inquiring a state set in the Q table according to the current traffic state, and giving a timing scheme according to a selection strategy after matching corresponding items.
7) Repeating the steps 2) to 6).
The form of the Q table after completion of learning update is shown in table 3:
TABLE 3 updated Q-Table
25 30 35 40 45 50 55 60 65 70 75
(14,15) 3.45 3.34 3.07 2.558 0 0 0 0 0 0 0
(13,14) 3.23 3.33 2.93 2.368 0 0 0 0 0 0 0
(12,13) 2.73 2.95 3.05 2.168 0 0 0 0 0 0 0
(11,12) 0 0 2.51 2.118 1.992 1.99 0 0 0 0 0
(9,11) 0 0 2 2.44 1.952 1.95 0 0 0 0 0
(7,9) 0 0 2.11 2.369 1.952 1.95 0 0 0 0 0
(5,7) 0 0 0 0 1.942 1.94 1.54 1.4 0 0 0
(3,5) 0 0 0 0 1.842 1.89 1.44 1.4 0 0 0
(2,3) 0 0 0 0 0 0 0 1.1 0.5 0.2 0.1
(1,2) 0 0 0 0 0 0 0 0.8 0.3 0.1 0
(0,1) 0 0 0 0 0 0 0 0.8 0.2 0 0
In the embodiment, VISSIM is used as an experimental simulation platform, the maximum queuing length of the vehicles is set to be 200m, and the range of the vehicle traffic busyness F is 0-1. The minimum active duration of the green light is set to 25 seconds and at most 75 seconds. The saturated flow of each lane at the intersection is 1500 (veh/h). The attenuation factor γ in equation 2 is set to 0.9, and the learning efficiency α is set to 0.7.
And establishing a small traffic intersection, and carrying out a simulation experiment by taking the small traffic intersection as an analysis object. Each road section comprises four lanes, one is a left-turning lane, two straight lanes and one is a right-turning lane. The turning probability of the vehicle at the intersection is 30% of left turning, 40% of straight going and 30% of right turning respectively. The interior sections are each 200m in length. The average running speed of the vehicle is 40 km/h. The traffic flow of the intersection lanes is shown in table 4:
TABLE 4 intersection traffic flow
Figure BDA0001573476920000111
For comparison, the traditional reinforcement learning is additionally adopted to control traffic in the simulation process, the VISSIM is used for collecting control effects, and the results are shown in the following tables 5 and 6:
TABLE 5 traditional reinforcement learning control result feedback (2 bits after decimal point)
Figure BDA0001573476920000121
TABLE 6 control result feedback based on Q learning improvement method (2 bits after decimal point)
Figure BDA0001573476920000122
Analysis of the tables shows that after the improved traffic control method based on Q learning is used, the overall delay of the vehicle is shortened by about 32 percent compared with the traditional reinforcement learning control; in addition, the indexes such as queuing length, total parking times and the like are superior to those of the traditional reinforcement learning control.
In addition to the above embodiments, the present invention may have other embodiments. All technical solutions formed by adopting equivalent substitutions or equivalent transformations fall within the protection scope of the claims of the present invention.

Claims (8)

1. A traffic signal self-adaptive control system is characterized by comprising a traffic state sensing module, a parameter fusion and division module, a Q learning module, a control decision module, a Q table and a fuzzy evaluator;
the traffic state sensing module is used for acquiring traffic flow state information of intersections and respectively transmitting the traffic flow state information to the parameter fusion and division module and the fuzzy evaluator;
the parameter fusion and division module is used for fusing the received traffic flow state information to obtain traffic parameters, dividing the traffic parameters into corresponding state segment sets through state division to serve as a basis for inquiring a timing scheme in a Q table and a parameter for updating a state space through the Q learning module;
the fuzzy evaluator is used for inquiring a reward and punishment value feedback table according to the traffic flow state information collected by the traffic state induction module and feeding the inquired reward and punishment value back to the Q learning module;
the Q learning module is used for taking a state section set corresponding to the traffic parameters transmitted by the parameter fusion and division module and a reward and punishment value fed back by the fuzzy evaluator as a basis for updating the Q table of the Q learning module and updating the Q value of the previous state;
the Q table is used for storing each traffic parameter state segment set, the timing scheme and the corresponding selected Q value and can be updated through the learning module;
the control decision module is used for providing a corresponding scheme for the signal lamp of the current phase from the result transmitted by the Q table according to the selection strategy;
the method for establishing the fusion function in the parameter fusion and division module comprises the following steps:
first, w is equal to (F)g, Lg,Lr) Defining a state vector set with three dimensions of green light phase traffic busy degree, green light phase vehicle queuing length and red light phase vehicle queuing length;
establishing a function according to the green light phase vehicle queuing length: s1=Lmax-LgWherein L isgVehicle queue length for green light phase; l ismaxVehicle queue length limit acceptable for the road;
establishing a function according to the queuing length of the vehicles in the red light phase: s2=k1s1+k2LrWherein L isrFor red light phase vehicle queue length, k1,k2Is a scale factor, and k1>k2
And (3) establishing a final fusion function by combining the traffic busyness of the green light phase:
Figure FDA0003176827500000021
wherein, FgTraffic busy for the green light phase.
2. The adaptive traffic signal control system of claim 1, wherein the traffic flow status information comprises traffic flow information, last cycle vehicle queue length, and intersection phase vehicle delay time.
3. The adaptive traffic signal control system of claim 2, whichCharacterized in that the scale factor k1Is composed of
Figure FDA0003176827500000022
The scale factor k2Is composed of
Figure FDA0003176827500000023
Wherein, TminAnd TmaxThe shortest and longest effective time of the green lamp phase.
4. The adaptive traffic signal control system according to claim 1, wherein the fuzzy evaluator selects a green light phase traffic busy rate F' and a red light phase vehicle queue length rate Lr' as an input; the output of the fuzzy evaluator is a reward punishment signal value, the output is deblurred through a gravity center method, and the reward punishment signal value range is (-1, 1); the input and output variables are all divided in a five-level fuzzy division mode, namely { 'negative large', 'negative small', 'middle', 'positive small', 'positive large' } represent the vehicle queue length change rate and the green light phase traffic busy rate of five different red light phases, are respectively marked as { NB, NS, ZO, PS and PB }, and are represented by a triangular membership function; and when the traffic busy degree change rate of the green light phase and the vehicle queue length change rate of the red light phase exceed half of a preset threshold value, judging that the influence generated by the timing scheme is maximum.
5. The adaptive traffic signal control system of claim 4, wherein the green light phase traffic busy rate F' is expressed by
Figure FDA0003176827500000024
Where q is the traffic throughput of the green phase over the decision duration and st is the saturation flow of the green phase over the decision duration.
6. The adaptive traffic signal control system according to claim 1, wherein the Q-table stores the timing strategies under different traffic states in a table form, wherein the first column stores a set of state segments of the traffic states, the first row stores different phase green light timing schemes, the Q-table initializes each timing strategy in advance, and in the subsequent iteration process, the Q-learning module continuously updates the timing strategies until the optimal timing strategies under different traffic states are obtained.
7. The adaptive traffic signal control system according to claim 1 or 6, wherein the learning and updating formula of the Q table is as follows:
Q(S,a)←Q(S,a)+α[r+γmaxa′Q(S′,a′)];
wherein S is a state segment set of a traffic state S, and a is a traffic timing scheme; q (S, a) represents the selection basis under the current state set S; α is learning efficiency, and the higher a represents that Q (S, a) is influenced more by the next state; r is a feedback, namely a reward penalty value, after the timing scheme a is executed; s ' represents the next state set, and Q (S ', a ') represents the selection strategy in the next state set; maxa′Q (S ', alpha') represents the best selection strategy estimated by the next state set; γ represents the degree of attenuation.
8. A control method of the traffic signal adaptive control system according to claim 1, comprising the steps of:
s1, the traffic state sensing module collects the traffic flow state information of the traffic intersection at the current moment and respectively transmits the traffic flow state information to the parameter fusion state and division module and the fuzzy evaluator;
s2, the parameter fusion state and division module fuses the received traffic flow state information to obtain traffic parameters, the traffic parameters are divided into corresponding state segment sets through state division, and then the obtained state segments are transmitted to the Q table and the Q learning module;
s3, the Q table inquires all timing schemes suitable for the current traffic state according to the received state section, transmits all the adapted timing schemes to the control decision module, and selects a suitable timing scheme through the selection strategy of the control decision module;
s4, when the timing scheme is executed and the traffic flow changes, the sensing module collects the current traffic flow state information again, and acquires the corresponding traffic parameters and the divided state segment set through the parameter fusion state and division module;
s5, the fuzzy evaluator inquires a feedback table according to the traffic flow state information to obtain a reward and punishment value, and feeds the reward and punishment value back to the Q learning module;
s6, the Q learning module updates a Q table according to the changed traffic state section set and the reward and punishment values;
and S7, switching to the next phase after the current green lamp phase is finished, and repeating the steps S1 to S6.
CN201810125948.8A 2018-02-08 2018-02-08 Traffic signal self-adaptive control system and method Active CN108335497B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810125948.8A CN108335497B (en) 2018-02-08 2018-02-08 Traffic signal self-adaptive control system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810125948.8A CN108335497B (en) 2018-02-08 2018-02-08 Traffic signal self-adaptive control system and method

Publications (2)

Publication Number Publication Date
CN108335497A CN108335497A (en) 2018-07-27
CN108335497B true CN108335497B (en) 2021-09-14

Family

ID=62928553

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810125948.8A Active CN108335497B (en) 2018-02-08 2018-02-08 Traffic signal self-adaptive control system and method

Country Status (1)

Country Link
CN (1) CN108335497B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4307270A1 (en) * 2022-07-14 2024-01-17 Kapsch TrafficCom AG Method and server for controlling traffic lights

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109035812B (en) * 2018-09-05 2021-07-27 平安科技(深圳)有限公司 Traffic signal lamp control method and device, computer equipment and storage medium
CN109544913A (en) * 2018-11-07 2019-03-29 南京邮电大学 A kind of traffic lights dynamic timing algorithm based on depth Q e-learning
US10964207B2 (en) * 2018-11-19 2021-03-30 Fortran Traffic Systems Limited Systems and methods for managing traffic flow using connected vehicle data
CN110085038B (en) * 2019-04-26 2020-11-27 同济大学 Intersection self-adaptive signal control method based on real-time queuing information
CN110428615B (en) * 2019-07-12 2021-06-22 中国科学院自动化研究所 Single intersection traffic signal control method, system and device based on deep reinforcement learning
CN111081035A (en) * 2019-12-17 2020-04-28 扬州市鑫通智能信息技术有限公司 Traffic signal control method based on Q learning
CN111311996A (en) * 2020-03-27 2020-06-19 湖南有色金属职业技术学院 Online education informationization teaching system based on big data
CN111564048A (en) * 2020-04-28 2020-08-21 郑州大学 Traffic signal lamp control method and device, electronic equipment and storage medium
CN111613072A (en) * 2020-05-08 2020-09-01 上海数道信息科技有限公司 Intelligent signal lamp timing optimization method, device, equipment, system and medium
CN113506450B (en) * 2021-07-28 2022-05-17 浙江海康智联科技有限公司 Qspare-based single-point signal timing scheme selection method
CN113870590A (en) * 2021-09-23 2021-12-31 福建船政交通职业学院 Wireless control method and system for traffic flow
CN114202935B (en) * 2021-11-16 2023-04-28 广西中科曙光云计算有限公司 Time distribution method and device for intersection signal lamps based on cloud network
CN114120672B (en) * 2021-11-19 2022-10-25 大连海事大学 Heterogeneous intersection scene traffic signal control method based on multi-agent reinforcement learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201009974D0 (en) * 2010-06-15 2010-07-21 Trinity College Dublin Decentralised autonomic system and method for use inan urban traffic control environment
CN104933876B (en) * 2015-06-03 2018-03-13 浙江师范大学 A kind of control method of adaptive smart city intelligent traffic signal
CN105654744B (en) * 2016-03-10 2018-07-06 同济大学 A kind of improvement traffic signal control method based on Q study
CN106846836B (en) * 2017-02-28 2019-05-24 许昌学院 A kind of Single Intersection signal timing control method and system
CN107393319B (en) * 2017-08-31 2020-06-19 长安大学 Signal optimization control method for preventing single cross port queuing overflow

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4307270A1 (en) * 2022-07-14 2024-01-17 Kapsch TrafficCom AG Method and server for controlling traffic lights

Also Published As

Publication number Publication date
CN108335497A (en) 2018-07-27

Similar Documents

Publication Publication Date Title
CN108335497B (en) Traffic signal self-adaptive control system and method
CN108510764B (en) Multi-intersection self-adaptive phase difference coordination control system and method based on Q learning
CN111696370B (en) Traffic light control method based on heuristic deep Q network
CN111785045B (en) Distributed traffic signal lamp combined control method based on actor-critic algorithm
CN104766484B (en) Traffic Control and Guidance system and method based on Evolutionary multiobjective optimization and ant group algorithm
CN112489464B (en) Crossing traffic signal lamp regulation and control method with position sensing function
CN105118308B (en) Urban road intersection traffic signal optimization method based on cluster intensified learning
CN110047278A (en) A kind of self-adapting traffic signal control system and method based on deeply study
CN112700642B (en) Method for improving traffic passing efficiency by using intelligent internet vehicle
CN112950963B (en) Self-adaptive signal control optimization method for main branch intersection of city
CN109269516B (en) Dynamic path induction method based on multi-target Sarsa learning
CN114973650B (en) Vehicle ramp entrance confluence control method, vehicle, electronic device and storage medium
Kurzer et al. Decentralized cooperative planning for automated vehicles with continuous monte carlo tree search
Li Multi-agent deep deterministic policy gradient for traffic signal control on urban road network
CN115016537B (en) Heterogeneous unmanned aerial vehicle configuration and task planning combined optimization method in SEAD scene
Hussain et al. Optimizing traffic lights with multi-agent deep reinforcement learning and v2x communication
CN113724507B (en) Traffic control and vehicle guidance cooperative method and system based on deep reinforcement learning
CN114384916A (en) Adaptive decision-making method and system for off-road vehicle path planning
Luo et al. Researches on intelligent traffic signal control based on deep reinforcement learning
CN102902824B (en) System and method for searching traffic route
CN116189454A (en) Traffic signal control method, device, electronic equipment and storage medium
Zhao et al. Learning multi-agent communication with policy fingerprints for adaptive traffic signal control
Li et al. Multi-intersections traffic signal intelligent control using collaborative q-learning algorithm
CN113487870A (en) Method for generating anti-disturbance to intelligent single intersection based on CW (continuous wave) attack
Li et al. Adaptive dynamic neuro-fuzzy system for traffic signal control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant