CN113299078B - Multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation - Google Patents
Multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation Download PDFInfo
- Publication number
- CN113299078B CN113299078B CN202110331935.8A CN202110331935A CN113299078B CN 113299078 B CN113299078 B CN 113299078B CN 202110331935 A CN202110331935 A CN 202110331935A CN 113299078 B CN113299078 B CN 113299078B
- Authority
- CN
- China
- Prior art keywords
- agent
- time
- traffic
- intersection
- trunk
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/07—Controlling traffic signals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/01—Detecting movement of traffic to be counted or controlled
- G08G1/0104—Measuring and analyzing of parameters relative to traffic conditions
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/01—Detecting movement of traffic to be counted or controlled
- G08G1/0104—Measuring and analyzing of parameters relative to traffic conditions
- G08G1/0125—Traffic data processing
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/07—Controlling traffic signals
- G08G1/081—Plural intersections under common control
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Chemical & Material Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Analytical Chemistry (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention discloses a multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation, wherein the method comprises the following steps: multi-mode traffic trunk simulation calibration and flow generation; designing a plurality of intelligent agents for signal control of each intersection of a trunk line; constructing a collaborative value decomposition multi-agent reinforcement learning framework; and training and outputting the intelligent agents of the intersections of the multi-mode traffic trunk line. The method provided by the invention treats the multi-mode traffic signal control of each intersection as an intelligent body, comprehensively considers the cooperation of all intersections of the traffic trunk line, takes the integral pedestrian flow and delay of the trunk line as targets to optimally train the intelligent body for controlling the traffic signal, provides a control basis for a road traffic manager, realizes the integral optimal target of the traffic trunk line, and improves the urban road traffic service level.
Description
Technical Field
The invention relates to the field of urban traffic signal control, in particular to a multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation.
Background
In recent years, due to the rapid increase of traffic demand, road congestion and blockage, air pollution and transportation efficiency reduction are caused, and the economic development of cities and the daily life of citizens are seriously influenced. In order to relieve traffic problems, traffic trunk line signal coordination control is an optimal mode in urban traffic management and control, and a reasonable trunk line control method can effectively improve vehicle speed and traffic efficiency and reduce oil consumption and tail gas emission.
The traditional coordination control of traffic trunk signals mainly adopts a green wave model, the duration of a public period of use of each intersection of a trunk is set, and the phase sequence and the phase difference of each intersection are calculated by taking the number of vehicle stops, the width of a green wave band, the vehicle delay and the like as optimization indexes. However, such approaches greatly limit the efficiency of individual intersections, giving way to the benefits of a trunked vehicle. In the existing research of the Chinese patent, the Chinese patent 202010793652.0 builds a bidirectional optimization model of a trunk line by modeling a trunk line target road section in a tidal traffic state and taking weighted throughput as an optimization target, so that the aim of minimizing vehicle delay on the basis of maximizing system traffic capacity is fulfilled; similarly, according to the running track of the bus, the chinese patent 201910092239.9 establishes the model optimization cycle and the phase difference based on the bus priority policy, and realizes the trunk line green wave of the social bus and the bus. Generally, the existing research is biased to the benefit maximization of a trunk line vehicle and a public transport, the efficiency of branch lines and single-point intersections is sacrificed in a model, the comprehensive consideration of multi-mode traffic such as public transport, pedestrians and non-motor vehicles on the trunk line is lacked, and the single-point intersections of the trunk line are cooperated on the basis of multi-mode traffic adaptive control to realize the overall optimal microscopic research of the multi-mode trunk line.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects of the prior art, the invention aims to provide a multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation, which are used for carrying out multi-mode traffic simulation calibration and flow generation on a target trunk line; designing a signal control intelligent agent at each intersection of a trunk line; constructing a collaborative value decomposition multi-agent reinforcement learning framework; training and outputting intelligent agents of all intersections of the multi-mode traffic trunk line; on the basis of single-point multi-mode traffic self-adaptive control, the cooperation of each intersection of the traffic trunk line is considered, and the overall optimal target of the traffic trunk line is achieved.
The technical scheme is as follows: in order to solve the technical problems, the technical scheme adopted by the invention is as follows: a multi-mode traffic trunk signal coordination control method based on multi-agent cooperation comprises the following steps:
(1) and acquiring intersection information of the traffic trunk line and the multi-mode traffic flow mode, performing simulation calibration on the multi-mode traffic trunk line by using simulation software according to the data, and restoring the arrival rate of the multi-mode traffic flow.
(2) Generating a signal control agent for each intersection in the trunk line, wherein n intersections of the traffic trunk line correspond to n agents, and the agent i reads the time tkThe intersection comprises the states of multi-mode traffic position, queuing length and speed informationWill stateInputting agent i at time tkThe time parameter isThe neural network of (1) outputs intersection agent i at time tkPhase of motion ofWherein the content of the first and second substances,representing parameters in a neural networkSelecting the operation phase aiAnd the state isValue function in the case of (1), Q value, AiIndicates the set of motion phases, a, that can be released at this intersection iiIs represented by AiOne action phase of;
(3) initializing neural network parameters and experience playback pools of all agents in a trunk line, and setting the number N of training roundsepisode;
(4) Initializing simulated multi-mode traffic trunk traffic flow toThe arrival rate is set to be initial simulation time t0Total simulation time T;
(5) acquiring the multi-mode traffic state of each agent, taking agent i as an example, acquiring the corresponding intersection i at the time tkMulti-mode traffic local observation stateWhereinRespectively show that the intersection i is at the time tkThe social vehicle state, the public traffic state, the pedestrian state and the non-motor vehicle state, the states comprise the information of the position, the queuing length, the speed and the like,indicating intersection adjacent to intersection i at time tkThe phase state of (a);
(6) the local observed state of each agent is input into its neural network, and for agent i, it will beReturn time t after input to neural networkkPhase of motion ofPhase of simultaneous return motionCorresponding Q valueWherein A isiRepresenting the set of action phases that intersection i can clear,indicating agent i at time tkParameter of temporal neural network, aiIs represented by AiOne operating phase of (1), Qi(. cndot.) represents the neural network Q function corresponding to agent i,neural network representing agent i at time tkThe parameters of (1);
(7) phase of action to be returned by each agentExecuting delta t seconds in each corresponding intersection signal lamp of traffic trunk simulation, and time tk+1=tk+ Δ t, return multi-mode traffic trunk multi-agent at time tkTeam prize value ofWherein k isd、kf、klRespectively representing the per-person delay variation balance coefficient, the people flow throughput balance coefficient and the queuing length variation balance coefficient,it represents the amount of variation in the delay per person,whereinAndrespectively, at time tkAnd time tk+1The trunk line of (1) is delayed by all people,representing the throughput of people, i.e. the total number of people passing through the traffic trunk during at,indicating the amount of change in the queue length,whereinAndrespectively, at time tkAnd time tk+1The number of people queuing in the traffic trunk;
(8) repeating the step (5) to obtain the time tk+1Multi-mode traffic status for each agentWill be provided withSaving the experience to an experience playback pool, wherein,indicating that multiple agents are at time tkThe value of the team award of (a),andrespectively time tkAnd time tk+1The global state list of (a) is,whereinIndicating that the nth agent is at time tkThe state of execution is such that,whereinIndicating that the nth agent is at time tk+1The state of execution is such that,is shown at time tkA list of actions selected by all agents,whereinIndicating that the nth agent is at time tkAn action to perform;
(9) judging whether the preset simulation time is reached, if t, judging whether the preset simulation time is reachedk+1And (5) if the value is more than or equal to T, entering the step (10), and otherwise, returning to the step (5) for iteration.
(10) Randomly sampling N pairs of data from an empirical playback pool according to a loss functionUpdating each agent neural network parameter using a gradient descent, wherein θallThe neural network parameters representing all of the agents,a global reward function representing multi-agent collaboration,wherein k isbRepresenting the trade-off coefficient of the intersection b, n representing the number of agents, thetabNeural network parameter, target reward value representing agent bWherein γ represents an attenuation coefficient, uallA list of actions representing all agents;
(11) judging whether the updating times reach the preset training round number NepisodeIf the preset number of training rounds N is not reachedepisodeAnd (5) returning to the step (4) for loop iteration, and if the preset number of training rounds N is reachedepisodeAnd outputting the intelligent agents of each intersection of the multi-mode traffic trunk based on multi-agent cooperative training.
The invention also provides a multi-mode traffic trunk line signal coordination control device based on multi-agent cooperation, which comprises the following components:
the multi-mode traffic trunk sensing module comprises a traffic trunk data sensing unit and a traffic trunk state sensing unit, wherein the traffic trunk data sensing unit is used for acquiring the channelized design, the number of entrance lanes, the length of road sections, the positions of bus stations, non-motor vehicle lanes and the positions of sidewalks of all intersections of a target trunk, and the traffic trunk state sensing unit is used for acquiring the number of bus runs and routes, departure intervals, parking time, the number and speed of passengers of social vehicles, pedestrians and non-motor vehicles, the queuing length in front of the intersections and the like;
the data storage module comprises a traffic trunk intersection data unit and a traffic trunk traffic flow data unit and is respectively used for storing the data acquired by the multi-mode traffic trunk sensing module and the traffic trunk state sensing unit;
the cooperative multi-mode traffic trunk signal coordination control intelligent agent calculation module comprises an intelligent agent calculation and storage unit which is respectively used for calculating and storing the intelligent agents at the intersection of the iterative training cooperative trunk line in the method and outputting and storing the intelligent agents at each intersection of the multi-mode traffic trunk line cooperatively trained by the multiple intelligent agents.
In addition, the invention also provides a computer device, which comprises a processor, a memory and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes the steps of the multi-agent cooperation based multi-mode traffic trunk signal coordination control method when being executed by the processor.
In addition, the present invention also provides a computer readable storage medium, which stores a computer program, and the computer program when executed by a processor implements the steps of the multi-agent cooperation based multi-mode transportation trunk signal coordination control method.
Has the advantages that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:
the invention provides a multi-mode traffic trunk signal coordination control method and device based on multi-agent cooperation, wherein a multi-mode traffic trunk and flow generation are simulated and modeled; designing a plurality of intelligent agents for signal control of each intersection of a trunk line; constructing a collaborative value decomposition multi-agent reinforcement learning framework; and training and outputting the intelligent agents of the intersections of the multi-mode traffic trunk line. The invention designs the multi-mode traffic signal control of each intersection as an intelligent body, simultaneously comprehensively considers the cooperation of each intersection of the traffic trunk line, takes the integral pedestrian flow and delay of the trunk line as targets to optimize and train the traffic signal control intelligent body, provides a control basis for a road traffic manager, realizes the integral optimal target of the traffic trunk line and improves the urban road traffic service level.
Drawings
FIG. 1 is a flow chart of a method of an embodiment of the present invention;
FIG. 2 is a flow diagram of a multi-agent collaborative reinforcement learning framework of an embodiment of the present invention;
FIG. 3 is a schematic diagram of a multi-mode traffic trunk simulation of an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
In order that the present disclosure may be more readily and clearly understood, reference is now made to the following detailed description taken in conjunction with the accompanying drawings and specific examples.
As shown in fig. 1, the multi-agent cooperation-based multi-mode traffic trunk signal coordination control method disclosed in the embodiment of the present invention includes the following steps:
(1) acquiring intersection information of the traffic trunk line and a multi-mode traffic flow mode, performing simulation calibration on the multi-mode traffic trunk line by using simulation software according to the data, and reducing the arrival rate of the multi-mode traffic flow;
specifically, the information of the intersection of the traffic trunk and the data of the multi-mode traffic flow mode can be acquired by a field sensing device, the data can also be acquired on the field, and the simulation software can be sumo, vissim and the like;
(2) in this embodiment, for each intersection in the trunk lineA signal control agent is generated at the intersection, n intersections of the main traffic line correspond to n agents, and the agent i reads the time tkThe intersection comprises the states of multi-mode traffic position, queuing length and speed informationWill stateInputting agent i at time tkThe time parameter isThe neural network of (1) outputs intersection agent i at time tkPhase of motion ofWherein the content of the first and second substances,representing parameters in a neural networkSelecting the operation phase aiAnd the state isValue function in the case of (1), Q value, AiIndicates the set of motion phases, a, that can be released at this intersection iiIs represented by AiOne action phase of;
(3) in this embodiment, the neural network parameters and experience playback pool of all agents in the trunk are initialized, and the number N of training rounds is setepisode;
(4) Specifically, initializing the simulated multi-mode traffic trunk flow arrival rate, and setting the initial simulation time t0Total simulation time T;
(5) in this embodiment, the multi-mode traffic state of each agent is obtained, and agent i is taken as an example to obtain the corresponding intersection i at time tkIn a multi-mode traffic bureauPart observation stateWherein Respectively show that the intersection i is at the time tkThe social vehicle state, the public traffic state, the pedestrian state and the non-motor vehicle state, the states comprise the information of the position, the queuing length, the speed and the like,indicating intersection adjacent to intersection i at time tkThe phase state of (a);
(6) in this embodiment, the local observed states of each agent are input into its neural network, and for agent i, it will beReturn time t after input to neural networkkPhase of motion ofPhase of simultaneous return motionCorresponding Q valueWherein A isiRepresenting the set of action phases that intersection i can clear,indicating agent i at time tkParameter of temporal neural network, aiIs represented by AiOne operating phase of (1), Qi(. cndot.) represents the neural network Q function corresponding to agent i,neural network representing agent i at time tkThe parameters of (1);
(7) in this embodiment, the operation phase returned by each agentExecuting delta t seconds in each corresponding intersection signal lamp of traffic trunk simulation, and time tk+1=tk+ Δ t, return multi-mode traffic trunk multi-agent at time tkTeam prize value ofWherein k isd、kf、klRespectively representing the per-person delay variation balance coefficient, the people flow throughput balance coefficient and the queuing length variation balance coefficient,it represents the amount of variation in the delay per person,whereinAndrespectively, at time tkAnd time tk+1The trunk line of (1) is delayed by all people,representing the throughput of people, i.e. the total number of people passing through the traffic trunk during at,indicating the amount of change in the queue length,whereinAndrespectively, at time tkAnd time tk+1The number of people queuing in the traffic trunk;
(8) in this embodiment, the step (5) is repeated to obtain the time tk+1Multi-mode traffic status for each agentWill be provided withSaving the experience to an experience playback pool, wherein,indicating that multiple agents are at time tkThe value of the team award of (a),andrespectively time tkAnd time tk+1The global state list of (a) is,whereinIndicating that the nth agent is at time tkThe state of execution is such that,whereinIndicating that the nth agent is at time tk+1The state of execution is such that,is shown at time tkA list of actions selected by all agents,whereinIndicating that the nth agent is at time tkAn action to perform;
(9) specifically, whether the preset simulation time is reached is judged, and if t, the preset simulation time is judgedk+1And (5) if the value is more than or equal to T, entering the step (10), and otherwise, returning to the step (5) for iteration.
(10) In this embodiment, N pairs of data are randomly sampled from the empirical playback pool, according to the loss functionUpdating each agent neural network parameter using a gradient descent, wherein θallThe neural network parameters representing all of the agents,a global reward function representing multi-agent collaboration,wherein k isbRepresenting the trade-off coefficient of the intersection b, n representing the number of agents, thetabNeural network parameter, target reward value representing agent bWherein γ represents an attenuation coefficient, uallA list of actions representing all agents;
(11) in this embodiment, it is determined whether the number of updates reaches the preset number N of training roundsepisodeIf the preset number of training rounds N is not reachedepisodeAnd (5) returning to the step (4) for loop iteration, and if the preset number of training rounds N is reachedepisodeAnd outputting the intelligent agents of each intersection of the multi-mode traffic trunk based on multi-agent cooperative training.
The invention is further elucidated below on the basis of an example of a traffic trunk situation.
Traffic example: the method is characterized in that 4 intersections are arranged at a certain traffic trunk, namely an intersection 1, an intersection 2, an intersection 3 and an intersection 4 from west to east in sequence, the distances among the intersections are 160m, 140m and 180m in sequence, wherein the intersection 1 and the intersection 4 are the intersections of the trunk and the trunk, each entrance road is a bidirectional 8 lane, the intersections 2 and the intersections 3 are the intersections of the trunk and branches, the entrances in the trunk direction are bidirectional 8 lanes, the entrances in the branches are bidirectional 2 lanes, and all motor vehicle lanes are provided with pedestrians and non-motor vehicle lanes.
The invention provides a multi-mode traffic trunk line signal coordination control method based on multi-agent cooperation, which comprises the following steps:
(1) as shown in fig. 3, the intersection information of the traffic trunk and the multi-mode traffic flow mode are acquired, the multi-mode traffic trunk is simulated and calibrated by using simulation software sumo according to the data, and the multi-mode traffic flow arrival rate is restored.
(2) Generating a signal control agent for each intersection in the trunk line, wherein 4 intersections of the trunk line correspond to 4 agents, taking the agent 2 as an example, the agent 2 reads the time tkThe intersection comprises the states of multi-mode traffic position, queuing length and speed informationWill stateInput agent 2 at time tkThe time parameter isThe output intersection agent 2, at time tkPhase of motion ofWherein the content of the first and second substances,to representIn neural network parametersSelecting the operation phase a2And the state isThe cost function in the case of (1), A2Shows the set of motion phases, a, that can be released at this intersection 22Is represented by A2One action phase of;
(3) initializing neural network parameters and experience playback pools of all agents in a trunk line, and setting the number N of training roundsepisode=1000;
(4) Initializing the simulated multi-mode traffic trunk flow arrival rate, and setting the initial simulation time t00, 10800 total simulation time T;
(5) acquiring the multi-mode traffic state of each agent, taking agent 2 as an example, acquiring the corresponding intersection 2 at time t0Multi-mode traffic local observation stateWhereinRespectively, at time t0The social vehicle state, the public traffic state, the pedestrian state and the non-motor vehicle state, the states comprise the information of the position, the queuing length, the speed and the like,indicating intersection 1 and intersection 3 adjacent to intersection 2 at time t0The phase state of (a);
(6) the local observed state of each agent is input into its neural network, for agent 2 as an example, it will beReturn time t after input to neural network0Phase of motion ofPhase of simultaneous return motionCorresponding Q valueWherein A is2Indicating the set of action phases that intersection 2 can clear,indicating agent 2 at time tkParameter of temporal neural network, a2In the representation A2A phase of action, Q2(. cndot.) represents the neural network Q function corresponding to agent 2,the neural network representing agent 2 at time t0The parameters of (1);
(7) phase of action to be returned by each agentExecuting delta t in each corresponding intersection signal lamp of the traffic trunk simulation for 5 seconds, and executing the time t1=t0+ Δ t ═ 5, return multi-mode traffic trunk multi-agent at time t0Team prize value ofWherein k isd、kf、klRespectively representing the per-person delay variation balance coefficient, the people flow throughput balance coefficient and the queuing length variation balance coefficient,it represents the amount of variation in the delay per person,whereinAndrespectively, at time t0And time t1The trunk line of (1) is delayed by all people,representing the throughput of people, i.e. the total number of people passing through the traffic trunk during at,indicating the amount of change in the queue length,whereinAndrespectively, at time t0And time t1The number of people queuing in the traffic trunk;
(8) repeating the step (5) to obtain the time t1Multi-mode traffic status for each agentWill be provided withSaving to an experience playback pool, whereinIndicating that multiple agents are at time t0The value of the team award of (a),andrespectively time t0And time t1Global state list of (2) toFor the purpose of example only,whereinIndicating that agent 1 is at time t0The status of the acquisition is determined by the state of the acquisition,is shown at time t0A list of actions selected by all agents,whereinIndicating that agent 1 is at time t0An action to perform;
(9) judging whether a preset simulation time t is reached1And (5) and T10800, and the step (5) is returned to iterate until T is satisfiedk+1Entering the step (10) at the temperature of more than or equal to T.
(10) Randomly sampling N-64 pairs of data from an empirical playback pool according to a loss functionUpdating each agent neural network parameter using a gradient descent, wherein θallThe neural network parameters representing all of the agents,a global reward function representing 4 agent collaborations,wherein k isbThe importance balance coefficients representing the intersection b are all 1 in this example, and theta is taken asbNeural network parameters representing agent bTarget prize valueWhere γ represents the attenuation coefficient, in this example 0.85, uallA list of actions representing all agents; (11) every time the step (10) is carried out for representing 1 round of training, whether the updating times reach the preset training round number N is judgedepisode1000, if the preset training wheel number N is not reachedepisodeAnd (5) returning to the step (4) for loop iteration, and if the preset number of training rounds N is reachedepisodeAnd outputting the intelligent agents of 4 intersections of the multi-mode traffic trunk based on multi-agent cooperative training.
As shown in fig. 4, the multi-mode traffic trunk signal coordination control device based on multi-agent cooperation disclosed in the embodiment of the present invention includes: the system comprises a multi-mode traffic trunk line sensing module, a data storage module and a cooperative multi-mode traffic trunk line signal coordination control intelligent agent calculating module; the multi-mode traffic trunk sensing module is used for acquiring the channelized design, the number of entrance roads, the length of road sections, the positions of bus stations, non-motor vehicle lanes and the positions of sidewalks of all intersections of a target trunk, acquiring the number and the route of buses on the trunk, departure intervals, parking time, the number and the speed of passengers of social vehicles, pedestrians and non-motor vehicles, the queuing length in front of the intersections and the like; the data storage module is used for storing the data acquired by the multi-mode traffic trunk sensing module and the traffic trunk state sensing unit; the cooperative multi-mode traffic trunk signal coordination control intelligent agent calculation module is used for calculating and storing the intelligent agents at the cooperative trunk intersection according to the iterative training in the claim 1 and outputting and storing the intelligent agents at each intersection of the multi-mode traffic trunks cooperatively trained by the multiple intelligent agents.
Wherein the multi-mode traffic trunk perception module: the system comprises a traffic trunk data sensing unit and a traffic trunk state sensing unit; the data storage module includes: a traffic trunk intersection data unit and a traffic trunk traffic flow data unit; the cooperative multi-mode traffic trunk signal coordination control intelligent agent calculation module comprises: and the intelligent agent computing and storing unit.
The embodiment of the multi-mode traffic trunk signal coordination control device based on multi-agent cooperation and the embodiment of the multi-mode traffic trunk signal coordination control method based on multi-agent cooperation disclosed by the embodiment belong to the same concept, and the specific implementation process is described in the embodiment of the method, and is not described herein again.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.
Claims (4)
1. A multi-mode traffic trunk signal coordination control method based on multi-agent cooperation is characterized by comprising the following steps:
(1) acquiring intersection information of the traffic trunk line and a multi-mode traffic flow mode, performing simulation calibration on the multi-mode traffic trunk line by using simulation software according to the data, and reducing the arrival rate of the multi-mode traffic flow;
(2) generating a signal control agent for each intersection in the trunk line, wherein n intersections of the traffic trunk line correspond to n agents, and the agent i reads the time tkThe intersection comprises the states of multi-mode traffic position, queuing length and speed informationWill stateInputting agent i at time tkThe time parameter isThe neural network of (1) outputs intersection agent i at time tkPhase of motion ofWherein the content of the first and second substances,representing parameters in a neural networkSelecting the operation phase aiAnd the state isValue function in the case of (1), Q value, AiIndicating the set of motion phases, a, that can be released at intersection iiIs represented by AiOne action phase of;
(3) initializing neural network parameters and experience playback pools of all agents in a trunk line, and setting the number N of training roundsepisode;
(4) Initializing the simulated multi-mode traffic trunk flow arrival rate, and setting the initial simulation time t0Total simulation time T;
(5) obtaining the multi-mode traffic state of each agent in the traffic trunk simulation, and for agent i, obtaining the corresponding ith intersection at the time tkMulti-mode traffic local observation stateWherein the content of the first and second substances, respectively shows the ith intersection at the time tkThe social vehicle state, the public traffic state, the pedestrian state and the non-motor vehicle state, the states comprise the position, the queuing length and the speed information,indicating that the intersection adjacent to the ith intersection is at time tkThe phase state of (a);
(6) the local observed state of each agent is input into its neural network, and for agent i, it will beReturn time t after input to neural networkkPhase of motion ofPhase of simultaneous return motionCorresponding Q valueWherein A isiRepresenting the set of action phases that intersection i can clear,indicating agent i at time tkParameter of temporal neural network, aiIs represented by AiOne operating phase of (1), Qi(. cndot.) represents the neural network Q function corresponding to agent i,neural network representing agent i at time tkThe parameters of (1);
(7) phase of action to be returned by each agentDelta t seconds are executed in each corresponding intersection signal lamp of the traffic trunk simulation, and the time is changed into tk+1=tk+ Δ t, the simulation environment returns to the multi-mode traffic trunk multi-agent at time tkTeam prize value ofWherein k isd、kf、klRespectively representing the per-person delay variation balance coefficient, the people flow throughput balance coefficient and the queuing length variation balance coefficient,it represents the amount of variation in the delay per person,wherein the content of the first and second substances,andrespectively, at time tkAnd time tk+1The trunk line of (1) is delayed by all people,representing the throughput of people, i.e. the total number of people passing through the traffic trunk during at,indicating the amount of change in the queue length,wherein the content of the first and second substances,andrespectively, at time tkAnd time tk+1The number of people queuing in the traffic trunk;
(8) repeating the step (5) to obtain the time tk+1Multi-mode traffic status for each agentWill be provided withSaving the experience to an experience playback pool, wherein,indicating that multiple agents are at time tkThe value of the team award of (a),andrespectively represent the time tkAnd time tk+1The global state list of (a) is,wherein the content of the first and second substances,indicating that the nth agent is at time tkThe state of execution is such that,wherein the content of the first and second substances,indicating that the nth agent is at time tk+1The state of execution is such that,is shown at time tkA list of actions selected by all agents,wherein the content of the first and second substances,indicating that the nth agent is at time tkAn action to perform;
(9) judging whether the preset simulation time is reached, if t, judging whether the preset simulation time is reachedk+1If the value is more than or equal to T, entering the step (10), otherwise, returning to the step (5) for iteration;
(10) randomly sampling N pairs of data from an empirical playback pool according to a loss functionUpdating each agent neural network parameter using a gradient descent, wherein θallThe neural network parameters representing all of the agents,a global reward function representing multi-agent collaboration,wherein k isbRepresenting the trade-off coefficient of the intersection b, n representing the number of agents, thetabNeural network parameter, target reward value representing agent bWherein γ represents an attenuation coefficient, uallA set of actions representing all agents;
(11) judging whether the updating times reach the preset training round number NepisodeIf the preset number of training rounds N is not reachedepisodeAnd (5) returning to the step (4) for loop iteration, and if the preset number of training rounds N is reachedepisodeAnd outputting the intelligent agents of each intersection of the multi-mode traffic trunk based on multi-agent cooperative training.
2. A multi-mode traffic trunk line signal coordination control device based on multi-agent cooperation is characterized by comprising:
the multi-mode traffic trunk sensing module comprises a traffic trunk data sensing unit and a traffic trunk state sensing unit, wherein the traffic trunk data sensing unit is used for acquiring the channelized design, the number of entrance lanes, the length of road sections, the positions of bus stations, non-motor vehicle lanes and the positions of sidewalks of all intersections of a target trunk, and the traffic trunk state sensing unit is used for acquiring the number of bus runs and routes, departure intervals, parking time, the number and speed of passengers of social vehicles, pedestrians and non-motor vehicles, the queuing length in front of the intersections and the passing phase of the current intersection;
the data storage module comprises a traffic trunk intersection data unit and a traffic trunk traffic flow data unit and is respectively used for storing the data acquired by the multi-mode traffic trunk sensing module and the traffic trunk state sensing unit;
the cooperative multi-mode traffic trunk signal coordination control intelligent agent calculation module comprises an intelligent agent calculation and storage unit, wherein the intelligent agent calculation and storage unit is used for calculating and storing the intelligent agents at the cooperative trunk intersection according to the iterative training in the claim 1 and outputting and storing the intelligent agents at each intersection of the multi-mode traffic trunks cooperatively trained by the multiple intelligent agents.
3. A computer device comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, wherein the computer program when executed by the processor implements the steps of the multi-agent collaboration based multi-mode transportation trunk signal coordination control method of claim 1.
4. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, realizes the steps of the multi-agent cooperation based multi-mode transportation trunk signal coordination control method of claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110331935.8A CN113299078B (en) | 2021-03-29 | 2021-03-29 | Multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110331935.8A CN113299078B (en) | 2021-03-29 | 2021-03-29 | Multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113299078A CN113299078A (en) | 2021-08-24 |
CN113299078B true CN113299078B (en) | 2022-04-08 |
Family
ID=77319295
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110331935.8A Active CN113299078B (en) | 2021-03-29 | 2021-03-29 | Multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113299078B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114743388B (en) * | 2022-03-22 | 2023-06-20 | 中山大学·深圳 | Multi-intersection signal self-adaptive control method based on reinforcement learning |
CN114973698B (en) * | 2022-05-10 | 2024-04-16 | 阿波罗智联(北京)科技有限公司 | Control information generation method and machine learning model training method and device |
CN114627650B (en) * | 2022-05-11 | 2022-08-23 | 深圳市城市交通规划设计研究中心股份有限公司 | Urban public transport priority simulation deduction system, method, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112289044A (en) * | 2020-11-02 | 2021-01-29 | 南京信息工程大学 | Highway road cooperative control system and method based on deep reinforcement learning |
CN112365724A (en) * | 2020-04-13 | 2021-02-12 | 北方工业大学 | Continuous intersection signal cooperative control method based on deep reinforcement learning |
CN112406867A (en) * | 2020-11-19 | 2021-02-26 | 清华大学 | Emergency vehicle hybrid lane change decision method based on reinforcement learning and avoidance strategy |
CN112489464A (en) * | 2020-11-19 | 2021-03-12 | 天津大学 | Crossing traffic signal lamp regulation and control method with position sensing function |
WO2021051930A1 (en) * | 2019-09-18 | 2021-03-25 | 平安科技(深圳)有限公司 | Signal adjustment method and apparatus based on action prediction model, and computer device |
-
2021
- 2021-03-29 CN CN202110331935.8A patent/CN113299078B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021051930A1 (en) * | 2019-09-18 | 2021-03-25 | 平安科技(深圳)有限公司 | Signal adjustment method and apparatus based on action prediction model, and computer device |
CN112365724A (en) * | 2020-04-13 | 2021-02-12 | 北方工业大学 | Continuous intersection signal cooperative control method based on deep reinforcement learning |
CN112289044A (en) * | 2020-11-02 | 2021-01-29 | 南京信息工程大学 | Highway road cooperative control system and method based on deep reinforcement learning |
CN112406867A (en) * | 2020-11-19 | 2021-02-26 | 清华大学 | Emergency vehicle hybrid lane change decision method based on reinforcement learning and avoidance strategy |
CN112489464A (en) * | 2020-11-19 | 2021-03-12 | 天津大学 | Crossing traffic signal lamp regulation and control method with position sensing function |
Also Published As
Publication number | Publication date |
---|---|
CN113299078A (en) | 2021-08-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113299078B (en) | Multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation | |
CN110060475B (en) | Multi-intersection signal lamp cooperative control method based on deep reinforcement learning | |
CN112700664B (en) | Traffic signal timing optimization method based on deep reinforcement learning | |
CN111696370B (en) | Traffic light control method based on heuristic deep Q network | |
CN104464310B (en) | Urban area multi-intersection signal works in coordination with optimal control method and system | |
CN109215355A (en) | A kind of single-point intersection signal timing optimization method based on deeply study | |
CN110114806A (en) | Signalized control method, relevant device and system | |
CN103593535A (en) | Urban traffic complex self-adaptive network parallel simulation system and method based on multi-scale integration | |
CN112071062B (en) | Driving time estimation method based on graph convolution network and graph attention network | |
CN106355905A (en) | Control method for overhead signal based on checkpoint data | |
CN109544922B (en) | Traffic network distributed predictive control method based on region division | |
Aragon-Gómez et al. | Traffic-signal control reinforcement learning approach for continuous-time Markov games | |
CN113053120B (en) | Traffic signal lamp scheduling method and system based on iterative learning model predictive control | |
Kong et al. | Urban arterial traffic two-direction green wave intelligent coordination control technique and its application | |
CN115188204B (en) | Highway lane-level variable speed limit control method under abnormal weather condition | |
CN111625989A (en) | Intelligent vehicle influx method and system based on A3C-SRU | |
CN113963555A (en) | Deep reinforcement learning traffic signal control method combined with state prediction | |
CN113947928A (en) | Traffic signal lamp timing method based on combination of deep reinforcement learning and extended Kalman filtering | |
CN113421439A (en) | Monte Carlo algorithm-based single intersection traffic signal timing optimization method | |
CN113362618B (en) | Multi-mode traffic adaptive signal control method and device based on strategy gradient | |
CN115691167A (en) | Single-point traffic signal control method based on intersection holographic data | |
CN113724507B (en) | Traffic control and vehicle guidance cooperative method and system based on deep reinforcement learning | |
CN113392577B (en) | Regional boundary main intersection signal control method based on deep reinforcement learning | |
CN107977914B (en) | Urban traffic management and control strategy visual quantitative test method | |
CN110021168B (en) | Grading decision method for realizing real-time intelligent traffic management under Internet of vehicles |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |