CN108335497B

CN108335497B - Traffic signal self-adaptive control system and method

Info

Publication number: CN108335497B
Application number: CN201810125948.8A
Authority: CN
Inventors: 罗杰; 刘成建
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2018-02-08
Filing date: 2018-02-08
Publication date: 2021-09-14
Anticipated expiration: 2038-02-08
Also published as: CN108335497A

Abstract

A traffic signal adaptive control system and method, aiming at the complexity of traffic control, designs a reward and punishment mechanism of a learning system by adopting a fuzzy technology, and more accurately and reasonably reflects the change of a traffic state; meanwhile, a Q learning state space is constructed by using experience-based state division, the updating complexity of the Q learning state space is reduced on the premise of keeping multi-parameter evaluation traffic states by establishing a traffic parameter fusion function, and a green light timing scheme based on a phase is given, so that the real-time response control of the traffic flow is finally achieved. The invention can effectively reduce the response time of traffic jam, quickly coordinate the signal control of each phase and improve the traffic efficiency of the intersection; due to the characteristic of no model, the method has strong self-adaptive capacity and universality; the system simplifies the storage form of the parameter indexes in the Q table, gives consideration to the learning effect and the response speed of the system on the traffic state, and reduces the complexity of control.

Description

Traffic signal self-adaptive control system and method

Technical Field

The invention belongs to the technical field of intelligent traffic, and particularly relates to a traffic signal self-adaptive control system and method.

Background

At present, the urban traffic flow is increasing at a high speed, and along with the high-speed development of traffic, an effective intelligent traffic management control technology is urgently needed. In recent years, artificial intelligence technology has been greatly developed in the aspect of traffic control application, and a new technical scheme is provided for solving the problem of urban traffic control. The self-adaptive control system can adjust the time-setting parameters in real time according to the control target of a manager and the characteristics of time-varying traffic flow at the intersection, can better utilize the overall traffic capacity of a road network compared with timing and induction control, effectively improves the traffic efficiency of the road network, and is an effective means for solving urban traffic jam.

According to the knowledge of the applicant, the existing self-adaptive traffic signal timing method, such as fuzzy control, neural network, evolutionary algorithm, expert system and the like, cannot adapt to the real-time changeable intersection traffic flow characteristics to a certain extent, and is difficult to efficiently adjust and control signal timing according to the real-time traffic state under the condition of effectively considering various indexes. In addition, most adaptive traffic control methods need to establish a complex traffic model, which increases the difficulty in implementing the method to some extent. Different from the method, the self-adaptive traffic control method based on reinforcement learning can be more suitable for the change of the traffic environment of the urban intersection. The reinforcement learning does not need an accurate model of a traffic environment, and a better traffic signal timing strategy can be effectively obtained in a random traffic environment. Reinforcement learning is to obtain feedback with uncertainty based on behavior after interaction with external environment, and update the state-behavior value associated with the feedback, so as to obtain the optimal control strategy. Among them, Q learning is the most typical, and the Q learning algorithm is a reinforcement learning algorithm independent of the currently effective model.

However, the application of the Q-learning algorithm to traffic control still has some disadvantages. The existing Q learning intersection traffic self-adaptive control method does not fully utilize various traffic state parameters under a complex traffic state, and is difficult to give accurate and reasonable feedback to the change of the traffic flow; in addition, for the adaptive control of the Q learning with an indefinite period, the state space is too large, so that the learning efficiency is low, and effective control is difficult to form.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides a traffic signal self-adaptive control system and a traffic signal self-adaptive control method.

In order to achieve the aim, the self-adaptive control system for the traffic signals comprises a traffic state induction module, a parameter fusion and division module, a Q learning module, a control decision module, a Q table and a fuzzy evaluator;

the traffic state sensing module is used for acquiring traffic flow state information of the intersection and respectively transmitting the traffic flow state information to the parameter fusion and division module and the fuzzy evaluator;

the parameter fusion and division module is used for fusing the received traffic flow state information to obtain traffic parameters, dividing the traffic parameters into corresponding state segment sets through state division, and using the traffic parameters as a basis for inquiring a timing scheme in the Q table and a parameter for updating a state space through the Q learning module;

the fuzzy evaluator is used for inquiring the reward and punishment value feedback table according to the traffic flow state information collected by the traffic state induction module and feeding the inquired reward and punishment value back to the Q school module;

the Q learning module is used for taking the state section set corresponding to the traffic parameters transmitted by the parameter fusion and division module and the reward and punishment value fed back by the fuzzy evaluator as the basis for updating the Q table by the Q learning module and updating the Q value of the last state;

the Q table is used for storing each traffic parameter state section set, the timing scheme and the corresponding selected Q value and can be updated through the learning module;

and the control decision module is used for providing a corresponding scheme for the signal lamp of the current phase from the result transmitted by the Q table according to the selection strategy.

The preferred scheme of the invention is as follows: the traffic flow state information comprises traffic flow information, the vehicle queuing length in the last period and the vehicle delay time of each phase at the intersection.

Preferably, the method for establishing the fusion function in the parameter fusion and division module is as follows:

first, w is equal to (F, L)_g，L_r) Defining a state vector set with three dimensions of green light phase traffic busy degree, green light phase vehicle queuing length and red light phase vehicle queuing length;

establishing a function according to the green light phase vehicle queuing length: s₁＝L_max-L_gWherein L is_gVehicle queue length for green light phase; l is_maxVehicle queue length limit acceptable for the road;

establishing a function according to the queuing length of the vehicles in the red light phase: s₃＝K₁s₁+K₂L_rWherein L is_rFor red light phase vehicle queue length, k₁,k₂Is a scale factor, and k₁＞k₂；

And (3) establishing a final fusion function by combining the traffic busyness of the green light phase:

more preferably, the scale factor k₁Is composed of

The scale factor k₂Is composed of

Wherein, T_minAnd T_maxThe shortest and longest effective time of the green lamp phase.

Preferably, the fuzzy evaluator selects the green light phase traffic busy degree change rate F' and the red light phase vehicle queue length change rate L_r' as an input; the output of the fuzzy evaluator is a reward punishment signal value, the output is deblurred through a gravity center method, and the reward punishment signal value range is (-1, 1); the input and output variables are all divided in a five-level fuzzy division mode, namely { 'negative large', 'negative small', 'middle', 'positive small', 'positive large' } represent the vehicle queue length change rate and the green light phase traffic busy rate of five different red light phases, are respectively marked as { NB, NS, ZO, PS and PB }, and are represented by a triangular membership function; and when the traffic busy degree change rate of the green light phase and the vehicle queue length change rate of the red light phase exceed half of a preset threshold value, judging that the influence generated by the timing scheme is maximum.

More preferably, the formula of the green light phase traffic busy degree change rate F' is

Where q is the traffic throughput of the green phase over the decision duration and st is the saturation flow of the green phase over the decision duration.

Preferably, the Q table stores the timing strategies under different traffic states in a table form, wherein the first column stores a state segment set of the traffic state, the first row stores different phase green light timing schemes, the Q table initializes each timing strategy in advance, and in the subsequent iteration process, the Q learning module continuously updates the timing strategies until the optimal timing strategies under different traffic states are obtained

Preferably, the learning update formula of the q-table is:

Q(S，a)←Q(S，a)+α[r+γmax_a′Q(S′，a′)]；

wherein S is a state segment set of a traffic state S, and a is a traffic timing scheme; q (S, a) represents the selection basis under the current state set S; alpha is learning efficiency, and the higher alpha represents that Q (S, a) is subjected to the next learningThe greater the individual state impact; r is a feedback, namely a reward penalty value, after the timing scheme a is executed; s ' represents the next state set, and Q (S ', a ') represents the selection strategy in the next state set; maxa'_a(S ', a') then represents the best selection strategy estimated for the next set of states; γ represents the degree of attenuation.

The invention also provides a traffic signal self-adaptive control method, which comprises the following steps:

s1, the traffic state sensing module collects the traffic flow state information of the traffic intersection at the current moment and respectively transmits the traffic flow state information to the parameter fusion state and division module and the fuzzy evaluator;

s2, the parameter fusion state and division module fuses the received traffic flow state information to obtain traffic parameters, the traffic parameters are divided into corresponding state segment sets through state division, and then the obtained state segments are transmitted to the Q table and the Q learning module;

s3, the Q table inquires all timing schemes suitable for the current traffic state according to the received state section, transmits all the adapted timing schemes to the control decision module, and selects a suitable timing scheme through the selection strategy of the control decision module;

s4, when the timing scheme is executed and the traffic flow changes, the sensing module collects the current traffic flow state information again, and acquires the corresponding traffic parameters and the divided state segment set through the parameter fusion state and division module;

s5, the fuzzy evaluator inquires a feedback table according to the traffic flow state information to obtain a reward and punishment value, and feeds the reward and punishment value back to the Q learning module;

s6, the Q learning module updates a Q table according to the changed traffic state section set and the reward and punishment values;

and S7, switching to the next phase after the current green lamp phase is finished, and repeating the steps S1 to S6.

The invention has the beneficial effects that: the invention can effectively reduce the response time of traffic jam, quickly coordinate the signal control of each phase and improve the traffic efficiency of the intersection; due to the characteristic of no model, the method has strong self-adaptive capacity and universality; meanwhile, the storage form of the parameter indexes in the Q table is simplified, the learning effect and the response speed of the system on the traffic state are considered, and the control complexity is reduced;

drawings

The invention will be further described with reference to the accompanying drawings.

FIG. 1 is a schematic diagram of the system of the present invention;

FIG. 2 is a schematic flow diagram of the process of the present invention;

FIG. 3 is a schematic diagram of a four-phase intersection of the present invention;

FIG. 4 is a schematic diagram of the fuzzy evaluator structure of the present invention.

Detailed Description

Example one

Referring to fig. 1, the present embodiment provides a traffic signal adaptive control system and method, including a traffic state sensing module, a parameter fusion and division module, a Q learning module, a control decision module, a Q table, and a fuzzy evaluator;

the traffic state sensing module is used for acquiring traffic flow state information of the intersection through a sensing or image processing technology and respectively transmitting the traffic flow state information to the parameter fusion and division module and the fuzzy evaluator;

the parameter fusion and division module is used for fusing the received traffic flow state information to obtain a traffic parameter S, dividing the traffic parameter S into corresponding state segment sets S through state division, and using the traffic parameter S as a basis for inquiring a timing scheme in the Q table and a parameter for updating a state space through the Q learning module;

the Q learning module is used for taking the state section set S corresponding to the traffic parameters transmitted by the parameter fusion and division module and the reward and punishment value r fed back by the fuzzy evaluator as the basis for updating the Q table by the Q learning module and updating the Q value of the previous state;

the Q table is used for storing timing strategies Q (S, a) under different traffic states in a table form, wherein the first column stores a state section set of the traffic states, the first row stores different phase green light timing schemes, the Q table initializes each Q (S, a) in advance according to experience, and in the subsequent iteration process, the Q learning module continuously updates the Q (S, a) until the optimal timing strategies under different traffic states are obtained;

Referring to fig. 2, the traffic signal adaptive control method based on the control system includes the following steps:

The system and the method can effectively reduce the response time of traffic jam, quickly coordinate the signal control of each phase and improve the traffic efficiency of the intersection; due to the characteristic of no model, the method has strong self-adaptive capacity and universality; meanwhile, the storage form of the parameter indexes in the Q table is simplified, the learning effect and the response speed of the system on the traffic state are considered, and the control complexity is reduced.

The intersection model of the present embodiment is shown in fig. 3, and the intersection traffic model is composed of four phases: straight in the east-west direction, left turn in the east-west direction, straight in the north-south direction, left turn in the north-south direction. The vehicle right turn is merged into straight ahead. Traffic flow a at each phase of intersection₁,a₂,a₃,a₄,a₅,a₆,a₇,a₈。

The traffic flow state information comprises traffic flow information, the vehicle queuing length in the last period and the vehicle delay time of each phase at the intersection.

The method for establishing the fusion function in the parameter fusion and division module comprises the following steps:

establishing a function according to the queuing length of the vehicles in the red light phase: s₂＝k₁s₁+k₂L_rWherein L is_rFor red light phase vehicle queue length, k₁,k₂Is a scale factor, and k₁＞k₂；

scale factor k₁Is composed of

The scale factor k₂Is composed of

The fuzzy evaluator selects the traffic busy rate F' of the green light phase and the vehicle queue length rate L of the red light phase_r' as an input; the output of the fuzzy evaluator is a reward punishment signal value, the output is deblurred through a gravity center method, and the reward punishment signal value range is (-1, 1); the input and output variables are all divided in a five-level fuzzy division mode, namely { 'negative large', 'negative small', 'middle', 'positive small', 'positive large' } represent the vehicle queue length change rate and the green light phase traffic busy rate of five different red light phases, are respectively marked as { NB, NS, ZO, PS and PB }, and are represented by a triangular membership function; when the traffic busy degree change rate of the green light phase and the vehicle queue length change rate of the red light phase exceed half of a preset threshold value, the influence generated by the timing scheme is judged to be maximum, and therefore L_rThe discourse range of' is set to (-100, 100); the domain of F' was set to (-0.5, 0.5). The fuzzy evaluator has only one output which is a reward punishment signal value r, the fuzzy output is deblurred by adopting a gravity center method, and the finally output reward punishment signal value range is (-1, 1).

The formula of the green light phase traffic busy degree change rate F' is

The Q table stores the timing strategies under different traffic states in a table form, wherein the first column stores a state section set of the traffic states, the first row stores different phase green light timing schemes, the Q table initializes each timing strategy in advance, and in the subsequent iteration process, the Q learning module continuously updates the timing strategies until the optimal timing strategies under different traffic states are obtained.

After the timing scheme is executed for a period of time, the traffic state sensing module senses that the traffic flow of the phase is increased, the traffic busy degree of the phase is increased, the vehicle queuing length is increased, and the average delay of the vehicles is increased. The timing scheme given by the Q learning module does not have a good effect, so that negative feedback is obtained from the fuzzy evaluator, timing selection under the road condition is updated according to the Q value, and then when the same traffic state is sensed again, a more appropriate timing scheme is selected according to a learned strategy, and the traffic degree of the intersection is improved. By the mode, the control system continuously detects the current traffic condition, learns in real time and continuously updates the timing scheme selection basis in the Q table, so that the traffic flow of the intersection tends to be optimal. And when the learning and updating of the Q table are completed and the system is relatively stable, the system can directly query the Q table according to the traffic state section S obtained after parameter fusion and state division to obtain an optimal timing scheme.

The learning update formula of the q table is as follows:

Q(S，a)←Q(S，a)+α[r+γmax_a′Q(S′，a′)]；

wherein S is a state segment set of a traffic state S, and a is a traffic timing scheme; q (S, a) represents the current shape

Selecting basis under the state set S; α is learning efficiency, and the higher α represents that Q (S, a) is more influenced by the next state; r is a feedback, namely a reward penalty value, after the timing scheme a is executed; s ' represents the next state set, and Q (S ', a ') represents the selection strategy in the next state set; max_a' Q (S ', a ') represents the best selection strategy estimated for the next set of states; γ represents the degree of attenuation.

The specific steps of Q table and fuzzy evaluator design are as follows:

1) designing Q table to obtain the parameters (F) of controlled road junction_g，L_g，L_r) Obtaining a traffic parameter s according to the parameter values, and obtaining a traffic parameter according to the traffic parameter sDividing the size of S into different state segment sets S;

2) according to the time distribution experience of the intersection, if the traffic busyness of the intersection at the current phase is low, a relatively short green light time length should be allocated to the phase; when the busy level increases, the green duration assigned to this phase should be increased accordingly. It is not reasonable to configure a phase green time too high or too low for a traffic situation. Therefore, as shown in table 1, the present embodiment empirically configures 4 different green light timing schemes for each traffic state in the current phase. Thus, those poor options can be directly ignored when the Q-table is updated. The method is focused on learning and updating a plurality of better timing schemes under the traffic state, so that the optimal phase green light timing scheme is obtained. According to the above rules, the initial Q-table is designed as shown in Table 1.

TABLE 1 initial Q-table

	25	30	35	40	45	50	55	60	65	70	75
												(14，15)	1	1	1	1	0	0	0	0	0	0	0
(13，14)	1	1	1	1	0	0	0	0	0	0	0
												(12，13)	1	1	1	1	0	0	0	0	0	0	0
(11，12)	0	0	1	1	1	1	0	0	0	0	0
												(9，11)	0	0	1	1	1	1	0	0	0	0	0
(7，9)	0	0	1	1	1	1	0	0	0	0	0
												(5，7)	0	0	0	0	1	1	1	1	0	0	0
(3，5)	0	0	0	0	1	1	1	1	0	0	0
												(2，3)	0	0	0	0	0	0	0	1	1	1	1
(1，2)	0	0	0	0	0	0	0	1	1	1	1
												(0，1)	0	0	0	0	0	0	0	1	1	1	1

Wherein, the first row of the initial Q table is a timing scheme a, and the first column is a state segment set S. In the simulation process of the embodiment, the maximum queuing length of the vehicles is set to be 200m, and the range of the vehicle traffic busy degree F is 0-1. The minimum active duration of the green light is set to 25 seconds and at most 75 seconds. The calculated traffic parameter s ranges from 0 to 15, wherein the larger s is, the better the road traffic condition is represented. And 4 different timing schemes are selected according to experience under each traffic state set S in the initial Q table for traffic control.

3) As shown in fig. 4, a fuzzy evaluator, i.e., a reward and punishment function for Q learning update, is established according to each parameter of the intersection and the fuzzy rule. To ensure the accuracy of the evaluation, the steps of the fuzzy input were set to 0.5 and 20, respectively, in MATLAB, resulting in a table of size 11 × 11, the form of which is shown in table 2 below.

TABLE 2 fuzzy evaluator feedback Table

	-0.5	-0.4	-0.3	-0.2	-0.1	0	0.1	0.2	0.3	0.4	0.5
												-100	-0.84	-0.84	-0.84	-0.55	-0.34	-0.01	0.29	0.51	0.82	0.84	0.84
-80	-0.84	-0.84	-0.84	-0.55	-0.34	-0.01	0.29	0.51	0.82	0.84	0.84
												-60	-0.84	-0.84	-0.84	-0.55	-0.34	-0.01	0.29	0.51	0.82	0.84	0.84
-40	-0.55	-0.55	-0.54	-0.27	-0.06	0.27	0.36	0.59	0.80	0.82	0.82
												-20	-0.50	-0.50	-0.50	-0.22	-0.01	0.30	0.39	0.62	0.80	0.82	0.82
0	-0.50	-0.50	-0.50	-0.50	-0.34	0.00	0.29	0.51	0.82	0.84	0.84
												20	-0.60	-0.60	-0.60	-0.27	-0.04	0.28	0.30	0.52	0.80	0.82	0.82
40	-0.61	-0.61	-0.60	-0.27	-0.04	0.44	0.54	0.53	0.80	0.83	0.83
												60	-0.50	-0.50	-0.50	-0.50	-0.16	0.44	0.59	0.83	0.84	0.84	0.84
80	-0.50	-0.50	-0.50	-0.50	-0.16	0.44	0.59	0.83	0.84	0.84	0.84
												100	-0.50	-0.50	-0.50	-0.50	-0.16	0.44	0.59	0.83	0.84	0.84	0.84

Wherein, the first row of the fuzzy evaluator feedback table is the vehicle passing busy degree change rate F', and the first column is the vehicle queuing length L_r'. The row and column correspond to a reward and punishment value r used for Q learning updating. When the timing scheme control is implemented, the traffic busy degree is reduced, the queuing length is reduced, positive feedback is obtained, and otherwise, corresponding negative feedback is obtained.

The intersection optimization control implementation process comprises the following steps:

1) and the Q table provides a random timing scheme according to the current phase road condition.

2) And after the traffic flow state is changed, acquiring corresponding traffic state parameters, calculating the corresponding traffic parameters through a fusion function, and determining the traffic state section through state division for updating the Q table by a Q learning algorithm.

3) And in the effective time of the green light, comparing the acquired traffic state parameter with the traffic state parameter at the last moment, and transmitting the difference value into a fuzzy evaluator to inquire so as to obtain a reward and punishment value.

4) And before the next phase begins (the green interval is set to be 2 seconds), updating the state section corresponding to the control table according to the formula 2 and the reward and punishment value.

5) And after the current green lamp phase timing scheme is finished, switching to the next phase.

6) And inquiring a state set in the Q table according to the current traffic state, and giving a timing scheme according to a selection strategy after matching corresponding items.

7) Repeating the steps 2) to 6).

The form of the Q table after completion of learning update is shown in table 3:

TABLE 3 updated Q-Table

	25	30	35	40	45	50	55	60	65	70	75
												(14，15)	3.45	3.34	3.07	2.558	0	0	0	0	0	0	0
(13，14)	3.23	3.33	2.93	2.368	0	0	0	0	0	0	0
												(12，13)	2.73	2.95	3.05	2.168	0	0	0	0	0	0	0
(11，12)	0	0	2.51	2.118	1.992	1.99	0	0	0	0	0
												(9，11)	0	0	2	2.44	1.952	1.95	0	0	0	0	0
(7，9)	0	0	2.11	2.369	1.952	1.95	0	0	0	0	0
												(5，7)	0	0	0	0	1.942	1.94	1.54	1.4	0	0	0
(3，5)	0	0	0	0	1.842	1.89	1.44	1.4	0	0	0
												(2，3)	0	0	0	0	0	0	0	1.1	0.5	0.2	0.1
(1，2)	0	0	0	0	0	0	0	0.8	0.3	0.1	0
												(0，1)	0	0	0	0	0	0	0	0.8	0.2	0	0

In the embodiment, VISSIM is used as an experimental simulation platform, the maximum queuing length of the vehicles is set to be 200m, and the range of the vehicle traffic busyness F is 0-1. The minimum active duration of the green light is set to 25 seconds and at most 75 seconds. The saturated flow of each lane at the intersection is 1500 (veh/h). The attenuation factor γ in equation 2 is set to 0.9, and the learning efficiency α is set to 0.7.

And establishing a small traffic intersection, and carrying out a simulation experiment by taking the small traffic intersection as an analysis object. Each road section comprises four lanes, one is a left-turning lane, two straight lanes and one is a right-turning lane. The turning probability of the vehicle at the intersection is 30% of left turning, 40% of straight going and 30% of right turning respectively. The interior sections are each 200m in length. The average running speed of the vehicle is 40 km/h. The traffic flow of the intersection lanes is shown in table 4:

TABLE 4 intersection traffic flow

For comparison, the traditional reinforcement learning is additionally adopted to control traffic in the simulation process, the VISSIM is used for collecting control effects, and the results are shown in the following tables 5 and 6:

TABLE 5 traditional reinforcement learning control result feedback (2 bits after decimal point)

TABLE 6 control result feedback based on Q learning improvement method (2 bits after decimal point)

Analysis of the tables shows that after the improved traffic control method based on Q learning is used, the overall delay of the vehicle is shortened by about 32 percent compared with the traditional reinforcement learning control; in addition, the indexes such as queuing length, total parking times and the like are superior to those of the traditional reinforcement learning control.

In addition to the above embodiments, the present invention may have other embodiments. All technical solutions formed by adopting equivalent substitutions or equivalent transformations fall within the protection scope of the claims of the present invention.

Claims

1. A traffic signal self-adaptive control system is characterized by comprising a traffic state sensing module, a parameter fusion and division module, a Q learning module, a control decision module, a Q table and a fuzzy evaluator;

the traffic state sensing module is used for acquiring traffic flow state information of intersections and respectively transmitting the traffic flow state information to the parameter fusion and division module and the fuzzy evaluator;

the parameter fusion and division module is used for fusing the received traffic flow state information to obtain traffic parameters, dividing the traffic parameters into corresponding state segment sets through state division to serve as a basis for inquiring a timing scheme in a Q table and a parameter for updating a state space through the Q learning module;

the fuzzy evaluator is used for inquiring a reward and punishment value feedback table according to the traffic flow state information collected by the traffic state induction module and feeding the inquired reward and punishment value back to the Q learning module;

the Q learning module is used for taking a state section set corresponding to the traffic parameters transmitted by the parameter fusion and division module and a reward and punishment value fed back by the fuzzy evaluator as a basis for updating the Q table of the Q learning module and updating the Q value of the previous state;

the Q table is used for storing each traffic parameter state segment set, the timing scheme and the corresponding selected Q value and can be updated through the learning module;

the control decision module is used for providing a corresponding scheme for the signal lamp of the current phase from the result transmitted by the Q table according to the selection strategy;

first, w is equal to (F)_g， L_g，L_r) Defining a state vector set with three dimensions of green light phase traffic busy degree, green light phase vehicle queuing length and red light phase vehicle queuing length;

establishing a function according to the queuing length of the vehicles in the red light phase: s₂＝k₁s₁+k₂L_rWherein L is_rFor red light phase vehicle queue length, k₁，k₂Is a scale factor, and k₁＞k₂；

wherein, F_gTraffic busy for the green light phase.

2. The adaptive traffic signal control system of claim 1, wherein the traffic flow status information comprises traffic flow information, last cycle vehicle queue length, and intersection phase vehicle delay time.

3. The adaptive traffic signal control system of claim 2, whichCharacterized in that the scale factor k₁Is composed of

The scale factor k₂Is composed of

4. The adaptive traffic signal control system according to claim 1, wherein the fuzzy evaluator selects a green light phase traffic busy rate F' and a red light phase vehicle queue length rate L_r' as an input; the output of the fuzzy evaluator is a reward punishment signal value, the output is deblurred through a gravity center method, and the reward punishment signal value range is (-1, 1); the input and output variables are all divided in a five-level fuzzy division mode, namely { 'negative large', 'negative small', 'middle', 'positive small', 'positive large' } represent the vehicle queue length change rate and the green light phase traffic busy rate of five different red light phases, are respectively marked as { NB, NS, ZO, PS and PB }, and are represented by a triangular membership function; and when the traffic busy degree change rate of the green light phase and the vehicle queue length change rate of the red light phase exceed half of a preset threshold value, judging that the influence generated by the timing scheme is maximum.

5. The adaptive traffic signal control system of claim 4, wherein the green light phase traffic busy rate F' is expressed by

6. The adaptive traffic signal control system according to claim 1, wherein the Q-table stores the timing strategies under different traffic states in a table form, wherein the first column stores a set of state segments of the traffic states, the first row stores different phase green light timing schemes, the Q-table initializes each timing strategy in advance, and in the subsequent iteration process, the Q-learning module continuously updates the timing strategies until the optimal timing strategies under different traffic states are obtained.

7. The adaptive traffic signal control system according to claim 1 or 6, wherein the learning and updating formula of the Q table is as follows:

Q(S，a)←Q(S，a)+α[r+γmax_a′Q(S′，a′)]；

wherein S is a state segment set of a traffic state S, and a is a traffic timing scheme; q (S, a) represents the selection basis under the current state set S; α is learning efficiency, and the higher a represents that Q (S, a) is influenced more by the next state; r is a feedback, namely a reward penalty value, after the timing scheme a is executed; s ' represents the next state set, and Q (S ', a ') represents the selection strategy in the next state set; max_a′Q (S ', alpha') represents the best selection strategy estimated by the next state set; γ represents the degree of attenuation.

8. A control method of the traffic signal adaptive control system according to claim 1, comprising the steps of: