CN113838296B

CN113838296B - Traffic signal control method, device, equipment and storage medium

Info

Publication number: CN113838296B
Application number: CN202111094984.0A
Authority: CN
Inventors: 钟任新; 方炽霖; 李昕岸; 金啸; 苏子诚
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2021-09-17
Filing date: 2021-09-17
Publication date: 2022-08-30
Anticipated expiration: 2041-09-17
Also published as: CN113838296A

Abstract

The invention discloses a traffic signal control method, a device, equipment and a storage medium, wherein the method comprises the following steps: dividing a road network, and determining a protection area, a buffer area and boundary control signal information in the road network and adaptive signal control information in the protection area; calibrating the traffic flow in the road network according to the division result of the road network so as to pre-configure; according to the content of the pre-configuration, an upper layer signal control intelligent agent is set up by adopting deep reinforcement learning; according to the content of the pre-configuration, building bottom signal control intelligent agents by adopting deep reinforcement learning, wherein each bottom signal control intelligent agent represents a signal lamp of a signal control intersection in the protection area; and performing double-layer signal control on the target road network after the traffic information is collected according to the upper-layer signal control intelligent agent and the bottom-layer signal control intelligent agent. The invention has wide application range, can improve the running efficiency of the road network, and can be widely applied to the technical field of intelligent traffic.

Description

Traffic signal control method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of intelligent traffic, in particular to a traffic signal control method, a traffic signal control device, traffic signal control equipment and a storage medium.

Background

With the development of urbanization, the traffic demand of urban road networks is continuously increased, and the problem of unbalanced traffic supply and demand is becoming more serious in many big cities. With the arrival of the big data era, smart cities are rapidly developed, a large amount of traffic monitoring data are brought, and conditions are provided for improving the current urban traffic real-time control system and remarkably improving the performance of the existing traffic system. Under limited road space, adaptive signal control is a beneficial means for improving the operational capability and reliability of a road network so as to achieve sustainable development.

The adaptive signal control means that a traffic signal control system adaptively adjusts signal control parameters according to traffic flow data monitored in real time, so that optimal control of the traffic flow is realized, the vehicle traffic displacement in a period of time is maximized, or other traffic control evaluation indexes such as delay, parking times and the like are minimized. The coordination of a plurality of intersection signal machines of the trunk road or the area can lead the running scheme to be self-adaptively adjusted according to the change of the traffic flow, thereby improving the traffic running efficiency of the whole trunk road or the area.

The traditional traffic signal control has two main defects:

go on only on the road segment level. Research shows that under the condition of oversaturated traffic, signal control based on road sections can cause vehicles to queue and overflow to upstream road sections, and even a deadlock phenomenon is formed; the spatial dimension problem also poses a significant challenge to adaptive real-time traffic signal control strategies over road segments.

② depending on the method of model driving. The model-driven method has better causal analysis and generalization, but has the problem of model uncertainty; meanwhile, the problem of dimension disaster is caused by the fact that large-scale calculation is needed to solve the dynamic optimization problem of the model, and particularly after a large amount of traffic flow data are introduced, a larger state and a larger solving space need to be processed.

Therefore, by adopting a data driving method, the robustness of a traffic control algorithm on uncertainty such as modeling and the like and the adaptability to external information can be realized.

Reinforcement Learning (RL) is Learning by an Agent in a "trial and error" manner, with a reward directive behavior obtained through interaction with the environment with the goal of maximizing the reward for the Agent, and is different from supervised Learning, mainly represented on a Reinforcement signal, and the Reinforcement signal provided by the environment in Reinforcement Learning is an evaluation of how to generate an action, rather than telling the Reinforcement Learning system how to generate a correct action. Since the information provided by the external environment is very small, learning must be performed by its experience. In this way, the system gains knowledge in the context of action-evaluation, improving the action scheme to adapt to the context.

In the conventional traffic signal control, the following problems mainly exist:

the existing data-driven traffic signal control method mainly aims at constructing a learning network and optimizing model parameters for a single intersection node or a small-scale area, and lacks dynamic cooperation and unified consideration of a road network area level. Most traffic signal control methods cannot optimize the performance of the whole road network due to the difficulty of algorithm complexity and dimension disasters.

In the existing traffic signal control strategy, only the minimum delay or the maximum output of a single intersection node or a small-scale area is considered by a control target, and the overall operation efficiency of a road network cannot be evaluated. The optimal performance of a single intersection node cannot ensure that the efficiency of the whole road network reaches the optimal performance. For example, when each intersection maximizes the respective objective function, the whole road network may be deadlock due to excessive traffic.

Therefore, the existing method cannot coordinate the performance of a single intersection and the overall road network, so that the efficiency of the road network cannot be maximized, and the optimal system is obtained.

Most traffic signal control strategies rely on modeling of intersections and road network traffic. However, calibration and error of the model affect the performance of the control strategy. Meanwhile, the physical model cannot ensure suitability for all traffic scenes, so that the traditional traffic signal control strategy cannot be used in a large scale.

Disclosure of Invention

In view of this, embodiments of the present invention provide a traffic signal control method, apparatus, device and storage medium with a wide application range, so as to improve the operation efficiency of a road network.

One aspect of the present invention provides a traffic signal control method, including:

dividing a road network, and determining a protection area, a buffer area and boundary control signal information in the road network and adaptive signal control information in the protection area;

calibrating the traffic flow in the road network according to the division result of the road network, and pre-configuring the boundary control signal information;

according to the content of the pre-configuration, adopting deep reinforcement learning to build upper-layer signal control intelligent agents, wherein each upper-layer signal control intelligent agent represents a boundary intersection signal lamp;

according to the content of the pre-configuration, building bottom signal control intelligent bodies by adopting deep reinforcement learning, wherein each bottom signal control intelligent body represents a signal lamp of a signal control intersection in the protection area;

and performing double-layer signal control on the target road network after the traffic information is collected according to the upper-layer signal control intelligent agent and the bottom-layer signal control intelligent agent.

Optionally, the dividing the road network, and determining a protection area, a buffer area, boundary control signal information in the road network, and adaptive signal control information in the protection area includes:

determining a protection area in a road network, wherein the protection area is a target area in a city where the road network is located;

determining a buffer area in a road network, wherein the buffer area is a must-pass area entering the protection area in a city where the road network is located;

determining the position of a boundary control signal, wherein the boundary control signal is used for controlling the inflow and outflow of traffic inside and outside the protection area;

determining control information for an adaptive signal, wherein the adaptive signal is used to control a single intersection within the protected zone.

Optionally, the calibrating traffic flow in the road network and pre-configuring the boundary control signal information according to the division result of the road network includes:

collecting historical traffic data or simulated traffic data of a road network;

calibrating the historical traffic data or the simulated traffic data;

configuring a control period of the boundary control signal;

and configuring the safety signal duration of the boundary control signal and the self-adaptive signal.

Optionally, the building an upper signal control agent by deep reinforcement learning according to the preconfigured content includes:

defining the state of the upper layer signal control agent, and combining the total vehicle number of the road network and the total vehicle number of the buffer area into state information;

defining the action of the upper layer signal control agent, and determining each signal period;

defining environment rewards of the upper layer signal control intelligent agents, and determining the total trip completion number in the road network;

constructing a strategy network and a value network of the upper layer signal control agent, wherein the strategy network is used for calculating to obtain the optimal action according to the state of the upper layer signal control agent; the value network is used for evaluating the quality of the action;

and training the strategy network by adopting a random gradient descent method, and training the value network by adopting a time sequence difference error method.

Optionally, the building of a bottom-layer signal control agent by deep reinforcement learning according to the preconfigured content includes:

defining the state of the intelligent agent controlled by the bottom signal, and combining the queuing length and the road width of the intersection where the intelligent agent is located into state information;

defining the action of the bottom layer signal control agent, and determining each action as the phase sequence number of the next decision step;

defining environment reward of the bottom signal control intelligent agent, and determining total delay of an entrance way of the bottom signal control intelligent agent after the phase is executed;

and constructing a value network of the bottom signal control agent, wherein the value network is used for evaluating the quality of actions.

Another aspect of an embodiment of the present invention provides a traffic signal control apparatus, including:

the system comprises a first module, a second module and a third module, wherein the first module is used for dividing a road network and determining a protection area, a buffer area and boundary control signal information in the road network and adaptive signal control information in the protection area;

a second module, configured to calibrate traffic flow in the road network according to the division result of the road network, and pre-configure the boundary control signal information;

the third module is used for building upper-layer signal control intelligent agents by adopting deep reinforcement learning according to the preconfigured content, wherein each upper-layer signal control intelligent agent represents a boundary intersection signal lamp;

the fourth module is used for building bottom signal control intelligent bodies by adopting deep reinforcement learning according to the preconfigured content, wherein each bottom signal control intelligent body represents a signal lamp of a signal control intersection in the protection area;

and the fifth module is used for controlling the intelligent agent according to the upper layer signal and the bottom layer signal and performing double-layer signal control on the target road network after the traffic information is collected.

Optionally, the first module comprises:

the system comprises a first unit, a second unit and a third unit, wherein the first unit is used for determining a protection area in a road network, and the protection area is a target area in a city where the road network is located;

the second unit is used for determining a buffer area in a road network, wherein the buffer area is a necessary passing area entering the protection area in a city where the road network is located;

a third unit, configured to determine a position of a boundary control signal, where the boundary control signal is used to control inflow and outflow of traffic inside and outside the protected area;

a fourth unit for determining control information of an adaptive signal, wherein the adaptive signal is used for controlling a single intersection within the protection zone.

Optionally, the second module comprises:

the fifth unit is used for acquiring historical traffic data or simulated traffic data of a road network;

a sixth unit, configured to calibrate the historical traffic data or the simulated traffic data;

a seventh unit configured to configure a control period of the boundary control signal;

and the eighth unit is used for configuring the safety signal duration of the boundary control signal and the self-adaptive signal.

Another aspect of the embodiments of the present invention provides an electronic device, including a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

Another aspect of the embodiments of the present invention provides a computer-readable storage medium storing a program, the program being executed by a processor to implement the method as described above.

The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and the computer instructions executed by the processor cause the computer device to perform the foregoing method.

The embodiment of the invention divides the road network, and determines the protection area, the buffer area and the boundary control signal information in the road network and the self-adaptive signal control information in the protection area; calibrating the traffic flow in the road network according to the division result of the road network, and pre-configuring the boundary control signal information; according to the content of the pre-configuration, adopting deep reinforcement learning to build upper-layer signal control intelligent agents, wherein each upper-layer signal control intelligent agent represents a boundary intersection signal lamp; according to the content of the pre-configuration, building bottom signal control intelligent bodies by adopting deep reinforcement learning, wherein each bottom signal control intelligent body represents a signal lamp of a signal control intersection in the protection area; and performing double-layer signal control on the target road network after the traffic information is collected according to the upper-layer signal control intelligent agent and the bottom-layer signal control intelligent agent. The invention has wide application range and can improve the operation efficiency of the road network.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of road network region division according to an embodiment of the present invention;

fig. 2 is a schematic diagram of dual-layer signal control according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The embodiment of the invention provides a traffic signal control method, which comprises the following steps:

Optionally, the calibrating the traffic flow in the road network and pre-configuring the boundary control signal information according to the dividing result of the road network includes:

collecting historical traffic data or simulated traffic data of a road network;

calibrating the historical traffic data or the simulated traffic data;

configuring a control period of the boundary control signal;

Optionally, the building an upper-layer signal control agent by deep reinforcement learning according to the preconfigured content includes:

defining the environment reward of the upper signal control agent, and determining the total trip completion number in the road network;

defining the environment reward of the bottom signal control intelligent agent, and determining the total delay of the entrance channel of the bottom signal control intelligent agent after the phase is executed;

The embodiment of the invention provides a traffic signal control device, which comprises:

the second module is used for calibrating the traffic flow in the road network according to the dividing result of the road network and pre-configuring the boundary control signal information;

and the fifth module is used for controlling the intelligent agent according to the upper layer signal and the intelligent agent according to the lower layer signal and carrying out double-layer signal control on the target road network after the traffic information is collected.

Optionally, the first module comprises:

Optionally, the second module comprises:

The embodiment of the invention provides electronic equipment, which comprises a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

An embodiment of the present invention provides a computer-readable storage medium, which stores a program, and the program is executed by a processor to implement the method as described above.

Embodiments of the present invention also disclose a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and the computer instructions executed by the processor cause the computer device to perform the foregoing method.

The following detailed description of the specific implementation principles of the present invention is made with reference to the accompanying drawings:

the method comprises the following steps: scene mode generation: dividing and setting the road network (as shown in FIG. 1)

1. Define protection zones (Protected Network): in urban road networks, the location and boundaries of protected areas are defined. For example, a downtown CBD (Central Business distribution discard), a core area, a key management area, or within several rings. Such areas are selected as protected areas because the geographic location is more important and it is desirable to more reasonably arrange the influx and efflux of vehicles.

2. And defining a buffer area (buffer), wherein the buffer area is a surrounding type road entering the protection area from the periphery of the protection area. When external traffic flows enter the protection area, the traffic flows are firstly shunted through the buffer area, so that the traffic density of a certain part in the protection area is not changed sharply.

3. Defining boundary control signal positions: the upper control in the double-layer control consists of boundary control signal lamps. The position controlled by the boundary signal is a signal lamp at the boundary of the buffer area and the periphery. Such signal lamps control the inflow and outflow of traffic from and to the outside of the protected area, and act as "gates".

4. Defining adaptive signal control within the guard region: the bottom layer control in the double-layer control consists of signal lamps in a protection area. The bottom control signal lamp is optimized only for a single intersection.

Step two: collecting data

1. Calibrating the traffic flow of a road network: the traffic flow of the road network can be calibrated through historical data or simulation data. The road network flow rate q (t) has a composition of q (t) ([ q ]) ₁₁ ,q ₁₂ ,q ₂₁ ]Where subscript 1 represents a protection zone in the road network and subscript 2 represents a region other than the protection zone. Thus, q ₁₂ Representing the flow from the protected area to the outside, and so on.

2. Presetting a signal timing limit: for the upper layer boundary signal control, the period of the signal control is 100 seconds, namely C equals 100 sec; for the underlying adaptive signal control, the control is aperiodic and therefore does not require a constraint on the signal period. Furthermore, all signal controls assume a yellow time of five seconds to ensure road safety, i.e. Y ═ 5 sec.

Step three: build up the upper boundary signal control agent (as shown in figure 2)

An upper-layer boundary signal control Agent (Agent) is built by adopting a Deep Deterministic Policy Gradient (DDPG) architecture in Deep Reinforcement Learning (DRL), and each boundary intersection signal lamp is an Agent.

1. Agent State (State) is defined.

The total number n of vehicles in the road network _PN And total number n of vehicles in buffer zone _buffer Combined into status information, i.e.

2. An agent Action (Action) is defined.

Each signal period (i.e. decision step size) C is 100sec, each action being the total green time of the next period, i.e. the

3. Environmental rewards (rewarded) are defined.

The sum of the number of vehicles traveling in the protection area and the number of vehicles traveling out of the protection area, namely the total traveling completion number in the road network is used as the reward fed back to the intelligent agent by the environment in the current state

The goal is to maximize the reward.

4. And constructing a strategy network and a value network.

Policy networks for obtaining optimal actions directly from state, i.e.

Wherein theta is ^μ Parameters of the strategy model mu; the value network is used for evaluating the quality of the current action, namely obtaining

Wherein theta is ^Q For value evaluation model Q ^upper The parameter (c) of (c). Both are fully connected neural networks, the number of neurons in each layer is [256,128,1 respectively]And [128,64,1 ]]The activation functions are all Linear rectification functions (RecU).

5. And (5) off-line training.

Training a strategy network by adopting a Stochastic Gradient Descent (SGD), wherein the Gradient is a strategy Gradient

Training a value network by using a Time Difference error (TD error), wherein the TD error at each moment is

Where γ is the reduction factor.

Step four: building a bottom distributed signal control agent (as shown in figure 2)

A Deep Q-learning (DQN) framework in Deep reinforcement learning is adopted to build a bottom distributed signal control intelligent agent, and a signal lamp of each signal control intersection in a protection area is an intelligent agent.

1. Agent State (State) is defined.

Queuing length queue of intersection entrance lane where intelligent agent is located _t And road density dens _t Combined into status information, i.e.

2. An agent Action (Action) is defined.

Each action

The phase sequence number of the next decision step.

3. Environmental rewards (rewarded) are defined.

The total delay of the entry road of the road junction where the intelligent agent is positioned after the phase execution is used as the reward fed back to the intelligent agent under the current state

The goal is to minimize rewards.

4. And constructing a value network.

The value network is used for evaluating the quality of the current action, namely obtaining

Wherein theta is ^Q For value evaluation model Q ^lower The parameter (c) of (c).

Step five: practical application

Preprocessing the acquired data, extracting required characteristic information, respectively inputting the acquired data into the two trained models, and setting signals according to timing or phase output by the models to realize double-layer signal control of the road network.

In summary, the present invention implements two-level control by dividing the urban road network. The method researches a strategy cooperation and asynchronous updating method for traffic signal control from two levels of a road network and a single intersection, and realizes multi-scale cooperative optimization of traffic flow while reducing algorithm complexity. The control strategy of the upper layer aiming at the road network is boundary control, and signal lamps positioned at the boundary of the road network are used for controlling vehicles flowing in and out of the whole road network. The upper-level control can reduce the dimensionality of road network optimization, so that the road network is optimized on the whole.

The upper and lower layer control of the invention are mutually feedback control, namely the state output of the upper layer is the input of the lower layer control, and the state output of the lower layer control is the input of the upper layer control. Therefore, under the condition that the road network is congested, the single intersection control body on the lower layer can avoid the coordination step disorder in the area caused by excessive self optimization, and cooperates with the control on the upper layer to keep the overall operation efficiency of the road network, so that the phenomena of dead lock of the road network and the like are avoided.

The double-layer control of the invention adopts a reinforcement learning method mainly based on data drive to carry out self-adaptation on traffic data. The traffic state is extracted immediately through data information, and state estimation is not needed through a physical model. Therefore, the method can be suitable for various traffic scenes and has strong landing applicability.

The invention can realize the multi-scale collaborative optimization of traffic flow, and implements double-layer control by dividing the urban road network. The strategy cooperation and asynchronous updating method for traffic signal control is researched from two layers of a road network and a single intersection, so that the multi-scale cooperation optimization of the traffic flow is realized while the algorithm complexity is reduced; meanwhile, the upper and lower layer controls are feedback controls each other, that is, the state output of the upper layer is the input of the lower layer control, and the state output of the lower layer control is the input of the upper layer control. Therefore, under the condition that the road network is congested, the single intersection control body on the lower layer can avoid the coordination step disorder in the area caused by excessive self optimization, and cooperates with the control on the upper layer to keep the overall operation efficiency of the road network, so that the phenomena of dead lock of the road network and the like are avoided.

The invention can effectively solve the problem of dimension disaster, improve the solving efficiency and adopt a data-driven intensive learning method as a main method. The data-driven method can effectively solve the problem of dimension disaster of problem solving and realize solving of the problem of traffic signal control optimization of the regional road network. In the double-layer control, the control strategy of the upper layer aiming at the road network is boundary control, and the signal lamps positioned at the road network boundary control the inflow and outflow vehicles of the whole road network, so that the dimensionality of the road network optimization can be reduced, and the road network can be optimized on the whole.

The invention can carry out self-adaptive decision-making on different traffic scenes and adopts a data-driven-based reinforcement learning method. The traffic state information is extracted from the real-time monitoring data, so that the traffic state is self-adaptive, and the state estimation is not required to be carried out by a physical model. Therefore, the method can be suitable for various traffic scenes and has strong landing applicability.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A traffic signal control method, comprising:

according to the upper layer signal control agent and the bottom layer signal control agent, carrying out double-layer signal control on the target road network after the traffic information is collected;

the dividing the road network to determine the protection area, the buffer area, the boundary control signal information in the road network and the adaptive signal control information in the protection area includes:

determining the position of a boundary control signal, wherein the boundary control signal is used for controlling the inflow and outflow of the internal and external traffic of the protection area;

determining control information of an adaptive signal, wherein the adaptive signal is used for controlling a single intersection in the protection area;

according to the content of the pre-configuration, an upper layer signal control intelligent agent is built by adopting deep reinforcement learning, and the method comprises the following steps:

defining the actions of the upper layer signal control agent, wherein each action is used as the total green light time of the next period;

training the strategy network by adopting a random gradient descent method, and training the value network by adopting a time sequence difference error;

according to the content of the pre-configuration, a bottom layer signal control intelligent agent is built by adopting deep reinforcement learning, and the method comprises the following steps:

defining the state of the intelligent agent controlled by the bottom signal, and combining the queuing length and the road density of the intersection where the intelligent agent is located into state information;

2. The traffic signal control method according to claim 1, wherein the calibrating the traffic flow in the road network and pre-configuring the boundary control signal information according to the dividing result of the road network comprises:

collecting historical traffic data or simulated traffic data of a road network;

calibrating the historical traffic data or the simulated traffic data;

configuring a control period of the boundary control signal;

3. An electronic device comprising a processor and a memory;

the memory is used for storing programs;

the processor executing the program realizes the method of claim 1 or 2.

4. A computer-readable storage medium, characterized in that the storage medium stores a program, which is executed by a processor to implement the method according to claim 1 or 2.