CN113299078B - Multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation - Google Patents

Multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation Download PDF

Info

Publication number
CN113299078B
CN113299078B CN202110331935.8A CN202110331935A CN113299078B CN 113299078 B CN113299078 B CN 113299078B CN 202110331935 A CN202110331935 A CN 202110331935A CN 113299078 B CN113299078 B CN 113299078B
Authority
CN
China
Prior art keywords
agent
time
traffic
intersection
trunk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110331935.8A
Other languages
Chinese (zh)
Other versions
CN113299078A (en
Inventor
王昊
王雷震
董长印
杨朝友
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangzhou Fama Intelligent Equipment Co ltd
Southeast University
Original Assignee
Yangzhou Fama Intelligent Equipment Co ltd
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangzhou Fama Intelligent Equipment Co ltd, Southeast University filed Critical Yangzhou Fama Intelligent Equipment Co ltd
Priority to CN202110331935.8A priority Critical patent/CN113299078B/en
Publication of CN113299078A publication Critical patent/CN113299078A/en
Application granted granted Critical
Publication of CN113299078B publication Critical patent/CN113299078B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • G08G1/081Plural intersections under common control

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Analytical Chemistry (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation, wherein the method comprises the following steps: multi-mode traffic trunk simulation calibration and flow generation; designing a plurality of intelligent agents for signal control of each intersection of a trunk line; constructing a collaborative value decomposition multi-agent reinforcement learning framework; and training and outputting the intelligent agents of the intersections of the multi-mode traffic trunk line. The method provided by the invention treats the multi-mode traffic signal control of each intersection as an intelligent body, comprehensively considers the cooperation of all intersections of the traffic trunk line, takes the integral pedestrian flow and delay of the trunk line as targets to optimally train the intelligent body for controlling the traffic signal, provides a control basis for a road traffic manager, realizes the integral optimal target of the traffic trunk line, and improves the urban road traffic service level.

Description

Multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation
Technical Field
The invention relates to the field of urban traffic signal control, in particular to a multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation.
Background
In recent years, due to the rapid increase of traffic demand, road congestion and blockage, air pollution and transportation efficiency reduction are caused, and the economic development of cities and the daily life of citizens are seriously influenced. In order to relieve traffic problems, traffic trunk line signal coordination control is an optimal mode in urban traffic management and control, and a reasonable trunk line control method can effectively improve vehicle speed and traffic efficiency and reduce oil consumption and tail gas emission.
The traditional coordination control of traffic trunk signals mainly adopts a green wave model, the duration of a public period of use of each intersection of a trunk is set, and the phase sequence and the phase difference of each intersection are calculated by taking the number of vehicle stops, the width of a green wave band, the vehicle delay and the like as optimization indexes. However, such approaches greatly limit the efficiency of individual intersections, giving way to the benefits of a trunked vehicle. In the existing research of the Chinese patent, the Chinese patent 202010793652.0 builds a bidirectional optimization model of a trunk line by modeling a trunk line target road section in a tidal traffic state and taking weighted throughput as an optimization target, so that the aim of minimizing vehicle delay on the basis of maximizing system traffic capacity is fulfilled; similarly, according to the running track of the bus, the chinese patent 201910092239.9 establishes the model optimization cycle and the phase difference based on the bus priority policy, and realizes the trunk line green wave of the social bus and the bus. Generally, the existing research is biased to the benefit maximization of a trunk line vehicle and a public transport, the efficiency of branch lines and single-point intersections is sacrificed in a model, the comprehensive consideration of multi-mode traffic such as public transport, pedestrians and non-motor vehicles on the trunk line is lacked, and the single-point intersections of the trunk line are cooperated on the basis of multi-mode traffic adaptive control to realize the overall optimal microscopic research of the multi-mode trunk line.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects of the prior art, the invention aims to provide a multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation, which are used for carrying out multi-mode traffic simulation calibration and flow generation on a target trunk line; designing a signal control intelligent agent at each intersection of a trunk line; constructing a collaborative value decomposition multi-agent reinforcement learning framework; training and outputting intelligent agents of all intersections of the multi-mode traffic trunk line; on the basis of single-point multi-mode traffic self-adaptive control, the cooperation of each intersection of the traffic trunk line is considered, and the overall optimal target of the traffic trunk line is achieved.
The technical scheme is as follows: in order to solve the technical problems, the technical scheme adopted by the invention is as follows: a multi-mode traffic trunk signal coordination control method based on multi-agent cooperation comprises the following steps:
(1) and acquiring intersection information of the traffic trunk line and the multi-mode traffic flow mode, performing simulation calibration on the multi-mode traffic trunk line by using simulation software according to the data, and restoring the arrival rate of the multi-mode traffic flow.
(2) Generating a signal control agent for each intersection in the trunk line, wherein n intersections of the traffic trunk line correspond to n agents, and the agent i reads the time tkThe intersection comprises the states of multi-mode traffic position, queuing length and speed information
Figure BDA0002996151950000021
Will state
Figure BDA0002996151950000022
Inputting agent i at time tkThe time parameter is
Figure BDA0002996151950000023
The neural network of (1) outputs intersection agent i at time tkPhase of motion of
Figure BDA0002996151950000024
Wherein the content of the first and second substances,
Figure BDA0002996151950000025
representing parameters in a neural network
Figure BDA0002996151950000026
Selecting the operation phase aiAnd the state is
Figure BDA0002996151950000027
Value function in the case of (1), Q value, AiIndicates the set of motion phases, a, that can be released at this intersection iiIs represented by AiOne action phase of;
(3) initializing neural network parameters and experience playback pools of all agents in a trunk line, and setting the number N of training roundsepisode
(4) Initializing simulated multi-mode traffic trunk traffic flow toThe arrival rate is set to be initial simulation time t0Total simulation time T;
(5) acquiring the multi-mode traffic state of each agent, taking agent i as an example, acquiring the corresponding intersection i at the time tkMulti-mode traffic local observation state
Figure BDA0002996151950000028
Wherein
Figure BDA0002996151950000029
Respectively show that the intersection i is at the time tkThe social vehicle state, the public traffic state, the pedestrian state and the non-motor vehicle state, the states comprise the information of the position, the queuing length, the speed and the like,
Figure BDA00029961519500000210
indicating intersection adjacent to intersection i at time tkThe phase state of (a);
(6) the local observed state of each agent is input into its neural network, and for agent i, it will be
Figure BDA00029961519500000211
Return time t after input to neural networkkPhase of motion of
Figure BDA00029961519500000212
Phase of simultaneous return motion
Figure BDA00029961519500000213
Corresponding Q value
Figure BDA00029961519500000214
Wherein A isiRepresenting the set of action phases that intersection i can clear,
Figure BDA00029961519500000215
indicating agent i at time tkParameter of temporal neural network, aiIs represented by AiOne operating phase of (1), Qi(. cndot.) represents the neural network Q function corresponding to agent i,
Figure BDA00029961519500000216
neural network representing agent i at time tkThe parameters of (1);
(7) phase of action to be returned by each agent
Figure BDA00029961519500000217
Executing delta t seconds in each corresponding intersection signal lamp of traffic trunk simulation, and time tk+1=tk+ Δ t, return multi-mode traffic trunk multi-agent at time tkTeam prize value of
Figure BDA00029961519500000218
Wherein k isd、kf、klRespectively representing the per-person delay variation balance coefficient, the people flow throughput balance coefficient and the queuing length variation balance coefficient,
Figure BDA0002996151950000031
it represents the amount of variation in the delay per person,
Figure BDA0002996151950000032
wherein
Figure BDA0002996151950000033
And
Figure BDA0002996151950000034
respectively, at time tkAnd time tk+1The trunk line of (1) is delayed by all people,
Figure BDA0002996151950000035
representing the throughput of people, i.e. the total number of people passing through the traffic trunk during at,
Figure BDA0002996151950000036
indicating the amount of change in the queue length,
Figure BDA0002996151950000037
wherein
Figure BDA0002996151950000038
And
Figure BDA0002996151950000039
respectively, at time tkAnd time tk+1The number of people queuing in the traffic trunk;
(8) repeating the step (5) to obtain the time tk+1Multi-mode traffic status for each agent
Figure BDA00029961519500000310
Will be provided with
Figure BDA00029961519500000311
Saving the experience to an experience playback pool, wherein,
Figure BDA00029961519500000312
indicating that multiple agents are at time tkThe value of the team award of (a),
Figure BDA00029961519500000313
and
Figure BDA00029961519500000314
respectively time tkAnd time tk+1The global state list of (a) is,
Figure BDA00029961519500000315
wherein
Figure BDA00029961519500000316
Indicating that the nth agent is at time tkThe state of execution is such that,
Figure BDA00029961519500000317
wherein
Figure BDA00029961519500000318
Indicating that the nth agent is at time tk+1The state of execution is such that,
Figure BDA00029961519500000319
is shown at time tkA list of actions selected by all agents,
Figure BDA00029961519500000320
wherein
Figure BDA00029961519500000321
Indicating that the nth agent is at time tkAn action to perform;
(9) judging whether the preset simulation time is reached, if t, judging whether the preset simulation time is reachedk+1And (5) if the value is more than or equal to T, entering the step (10), and otherwise, returning to the step (5) for iteration.
(10) Randomly sampling N pairs of data from an empirical playback pool according to a loss function
Figure BDA00029961519500000322
Updating each agent neural network parameter using a gradient descent, wherein θallThe neural network parameters representing all of the agents,
Figure BDA00029961519500000323
a global reward function representing multi-agent collaboration,
Figure BDA00029961519500000324
wherein k isbRepresenting the trade-off coefficient of the intersection b, n representing the number of agents, thetabNeural network parameter, target reward value representing agent b
Figure BDA00029961519500000325
Wherein γ represents an attenuation coefficient, uallA list of actions representing all agents;
(11) judging whether the updating times reach the preset training round number NepisodeIf the preset number of training rounds N is not reachedepisodeAnd (5) returning to the step (4) for loop iteration, and if the preset number of training rounds N is reachedepisodeAnd outputting the intelligent agents of each intersection of the multi-mode traffic trunk based on multi-agent cooperative training.
The invention also provides a multi-mode traffic trunk line signal coordination control device based on multi-agent cooperation, which comprises the following components:
the multi-mode traffic trunk sensing module comprises a traffic trunk data sensing unit and a traffic trunk state sensing unit, wherein the traffic trunk data sensing unit is used for acquiring the channelized design, the number of entrance lanes, the length of road sections, the positions of bus stations, non-motor vehicle lanes and the positions of sidewalks of all intersections of a target trunk, and the traffic trunk state sensing unit is used for acquiring the number of bus runs and routes, departure intervals, parking time, the number and speed of passengers of social vehicles, pedestrians and non-motor vehicles, the queuing length in front of the intersections and the like;
the data storage module comprises a traffic trunk intersection data unit and a traffic trunk traffic flow data unit and is respectively used for storing the data acquired by the multi-mode traffic trunk sensing module and the traffic trunk state sensing unit;
the cooperative multi-mode traffic trunk signal coordination control intelligent agent calculation module comprises an intelligent agent calculation and storage unit which is respectively used for calculating and storing the intelligent agents at the intersection of the iterative training cooperative trunk line in the method and outputting and storing the intelligent agents at each intersection of the multi-mode traffic trunk line cooperatively trained by the multiple intelligent agents.
In addition, the invention also provides a computer device, which comprises a processor, a memory and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes the steps of the multi-agent cooperation based multi-mode traffic trunk signal coordination control method when being executed by the processor.
In addition, the present invention also provides a computer readable storage medium, which stores a computer program, and the computer program when executed by a processor implements the steps of the multi-agent cooperation based multi-mode transportation trunk signal coordination control method.
Has the advantages that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:
the invention provides a multi-mode traffic trunk signal coordination control method and device based on multi-agent cooperation, wherein a multi-mode traffic trunk and flow generation are simulated and modeled; designing a plurality of intelligent agents for signal control of each intersection of a trunk line; constructing a collaborative value decomposition multi-agent reinforcement learning framework; and training and outputting the intelligent agents of the intersections of the multi-mode traffic trunk line. The invention designs the multi-mode traffic signal control of each intersection as an intelligent body, simultaneously comprehensively considers the cooperation of each intersection of the traffic trunk line, takes the integral pedestrian flow and delay of the trunk line as targets to optimize and train the traffic signal control intelligent body, provides a control basis for a road traffic manager, realizes the integral optimal target of the traffic trunk line and improves the urban road traffic service level.
Drawings
FIG. 1 is a flow chart of a method of an embodiment of the present invention;
FIG. 2 is a flow diagram of a multi-agent collaborative reinforcement learning framework of an embodiment of the present invention;
FIG. 3 is a schematic diagram of a multi-mode traffic trunk simulation of an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
In order that the present disclosure may be more readily and clearly understood, reference is now made to the following detailed description taken in conjunction with the accompanying drawings and specific examples.
As shown in fig. 1, the multi-agent cooperation-based multi-mode traffic trunk signal coordination control method disclosed in the embodiment of the present invention includes the following steps:
(1) acquiring intersection information of the traffic trunk line and a multi-mode traffic flow mode, performing simulation calibration on the multi-mode traffic trunk line by using simulation software according to the data, and reducing the arrival rate of the multi-mode traffic flow;
specifically, the information of the intersection of the traffic trunk and the data of the multi-mode traffic flow mode can be acquired by a field sensing device, the data can also be acquired on the field, and the simulation software can be sumo, vissim and the like;
(2) in this embodiment, for each intersection in the trunk lineA signal control agent is generated at the intersection, n intersections of the main traffic line correspond to n agents, and the agent i reads the time tkThe intersection comprises the states of multi-mode traffic position, queuing length and speed information
Figure BDA0002996151950000051
Will state
Figure BDA0002996151950000052
Inputting agent i at time tkThe time parameter is
Figure BDA0002996151950000053
The neural network of (1) outputs intersection agent i at time tkPhase of motion of
Figure BDA0002996151950000054
Wherein the content of the first and second substances,
Figure BDA0002996151950000055
representing parameters in a neural network
Figure BDA0002996151950000056
Selecting the operation phase aiAnd the state is
Figure BDA0002996151950000057
Value function in the case of (1), Q value, AiIndicates the set of motion phases, a, that can be released at this intersection iiIs represented by AiOne action phase of;
(3) in this embodiment, the neural network parameters and experience playback pool of all agents in the trunk are initialized, and the number N of training rounds is setepisode
(4) Specifically, initializing the simulated multi-mode traffic trunk flow arrival rate, and setting the initial simulation time t0Total simulation time T;
(5) in this embodiment, the multi-mode traffic state of each agent is obtained, and agent i is taken as an example to obtain the corresponding intersection i at time tkIn a multi-mode traffic bureauPart observation state
Figure BDA0002996151950000058
Wherein
Figure BDA0002996151950000059
Figure BDA00029961519500000510
Respectively show that the intersection i is at the time tkThe social vehicle state, the public traffic state, the pedestrian state and the non-motor vehicle state, the states comprise the information of the position, the queuing length, the speed and the like,
Figure BDA00029961519500000511
indicating intersection adjacent to intersection i at time tkThe phase state of (a);
(6) in this embodiment, the local observed states of each agent are input into its neural network, and for agent i, it will be
Figure BDA00029961519500000512
Return time t after input to neural networkkPhase of motion of
Figure BDA00029961519500000513
Phase of simultaneous return motion
Figure BDA00029961519500000514
Corresponding Q value
Figure BDA00029961519500000515
Wherein A isiRepresenting the set of action phases that intersection i can clear,
Figure BDA00029961519500000516
indicating agent i at time tkParameter of temporal neural network, aiIs represented by AiOne operating phase of (1), Qi(. cndot.) represents the neural network Q function corresponding to agent i,
Figure BDA0002996151950000061
neural network representing agent i at time tkThe parameters of (1);
(7) in this embodiment, the operation phase returned by each agent
Figure BDA0002996151950000062
Executing delta t seconds in each corresponding intersection signal lamp of traffic trunk simulation, and time tk+1=tk+ Δ t, return multi-mode traffic trunk multi-agent at time tkTeam prize value of
Figure BDA0002996151950000063
Wherein k isd、kf、klRespectively representing the per-person delay variation balance coefficient, the people flow throughput balance coefficient and the queuing length variation balance coefficient,
Figure BDA0002996151950000064
it represents the amount of variation in the delay per person,
Figure BDA0002996151950000065
wherein
Figure BDA0002996151950000066
And
Figure BDA0002996151950000067
respectively, at time tkAnd time tk+1The trunk line of (1) is delayed by all people,
Figure BDA0002996151950000068
representing the throughput of people, i.e. the total number of people passing through the traffic trunk during at,
Figure BDA0002996151950000069
indicating the amount of change in the queue length,
Figure BDA00029961519500000610
wherein
Figure BDA00029961519500000611
And
Figure BDA00029961519500000612
respectively, at time tkAnd time tk+1The number of people queuing in the traffic trunk;
(8) in this embodiment, the step (5) is repeated to obtain the time tk+1Multi-mode traffic status for each agent
Figure BDA00029961519500000613
Will be provided with
Figure BDA00029961519500000614
Saving the experience to an experience playback pool, wherein,
Figure BDA00029961519500000615
indicating that multiple agents are at time tkThe value of the team award of (a),
Figure BDA00029961519500000616
and
Figure BDA00029961519500000617
respectively time tkAnd time tk+1The global state list of (a) is,
Figure BDA00029961519500000618
wherein
Figure BDA00029961519500000619
Indicating that the nth agent is at time tkThe state of execution is such that,
Figure BDA00029961519500000620
wherein
Figure BDA00029961519500000621
Indicating that the nth agent is at time tk+1The state of execution is such that,
Figure BDA00029961519500000622
is shown at time tkA list of actions selected by all agents,
Figure BDA00029961519500000623
wherein
Figure BDA00029961519500000624
Indicating that the nth agent is at time tkAn action to perform;
(9) specifically, whether the preset simulation time is reached is judged, and if t, the preset simulation time is judgedk+1And (5) if the value is more than or equal to T, entering the step (10), and otherwise, returning to the step (5) for iteration.
(10) In this embodiment, N pairs of data are randomly sampled from the empirical playback pool, according to the loss function
Figure BDA00029961519500000625
Updating each agent neural network parameter using a gradient descent, wherein θallThe neural network parameters representing all of the agents,
Figure BDA00029961519500000626
a global reward function representing multi-agent collaboration,
Figure BDA00029961519500000627
wherein k isbRepresenting the trade-off coefficient of the intersection b, n representing the number of agents, thetabNeural network parameter, target reward value representing agent b
Figure BDA0002996151950000071
Wherein γ represents an attenuation coefficient, uallA list of actions representing all agents;
(11) in this embodiment, it is determined whether the number of updates reaches the preset number N of training roundsepisodeIf the preset number of training rounds N is not reachedepisodeAnd (5) returning to the step (4) for loop iteration, and if the preset number of training rounds N is reachedepisodeAnd outputting the intelligent agents of each intersection of the multi-mode traffic trunk based on multi-agent cooperative training.
The invention is further elucidated below on the basis of an example of a traffic trunk situation.
Traffic example: the method is characterized in that 4 intersections are arranged at a certain traffic trunk, namely an intersection 1, an intersection 2, an intersection 3 and an intersection 4 from west to east in sequence, the distances among the intersections are 160m, 140m and 180m in sequence, wherein the intersection 1 and the intersection 4 are the intersections of the trunk and the trunk, each entrance road is a bidirectional 8 lane, the intersections 2 and the intersections 3 are the intersections of the trunk and branches, the entrances in the trunk direction are bidirectional 8 lanes, the entrances in the branches are bidirectional 2 lanes, and all motor vehicle lanes are provided with pedestrians and non-motor vehicle lanes.
The invention provides a multi-mode traffic trunk line signal coordination control method based on multi-agent cooperation, which comprises the following steps:
(1) as shown in fig. 3, the intersection information of the traffic trunk and the multi-mode traffic flow mode are acquired, the multi-mode traffic trunk is simulated and calibrated by using simulation software sumo according to the data, and the multi-mode traffic flow arrival rate is restored.
(2) Generating a signal control agent for each intersection in the trunk line, wherein 4 intersections of the trunk line correspond to 4 agents, taking the agent 2 as an example, the agent 2 reads the time tkThe intersection comprises the states of multi-mode traffic position, queuing length and speed information
Figure BDA0002996151950000072
Will state
Figure BDA0002996151950000073
Input agent 2 at time tkThe time parameter is
Figure BDA0002996151950000074
The output intersection agent 2, at time tkPhase of motion of
Figure BDA0002996151950000075
Wherein the content of the first and second substances,
Figure BDA0002996151950000076
to representIn neural network parameters
Figure BDA0002996151950000077
Selecting the operation phase a2And the state is
Figure BDA0002996151950000078
The cost function in the case of (1), A2Shows the set of motion phases, a, that can be released at this intersection 22Is represented by A2One action phase of;
(3) initializing neural network parameters and experience playback pools of all agents in a trunk line, and setting the number N of training roundsepisode=1000;
(4) Initializing the simulated multi-mode traffic trunk flow arrival rate, and setting the initial simulation time t00, 10800 total simulation time T;
(5) acquiring the multi-mode traffic state of each agent, taking agent 2 as an example, acquiring the corresponding intersection 2 at time t0Multi-mode traffic local observation state
Figure BDA0002996151950000081
Wherein
Figure BDA0002996151950000082
Respectively, at time t0The social vehicle state, the public traffic state, the pedestrian state and the non-motor vehicle state, the states comprise the information of the position, the queuing length, the speed and the like,
Figure BDA0002996151950000083
indicating intersection 1 and intersection 3 adjacent to intersection 2 at time t0The phase state of (a);
(6) the local observed state of each agent is input into its neural network, for agent 2 as an example, it will be
Figure BDA0002996151950000084
Return time t after input to neural network0Phase of motion of
Figure BDA0002996151950000085
Phase of simultaneous return motion
Figure BDA0002996151950000086
Corresponding Q value
Figure BDA0002996151950000087
Wherein A is2Indicating the set of action phases that intersection 2 can clear,
Figure BDA0002996151950000088
indicating agent 2 at time tkParameter of temporal neural network, a2In the representation A2A phase of action, Q2(. cndot.) represents the neural network Q function corresponding to agent 2,
Figure BDA0002996151950000089
the neural network representing agent 2 at time t0The parameters of (1);
(7) phase of action to be returned by each agent
Figure BDA00029961519500000810
Executing delta t in each corresponding intersection signal lamp of the traffic trunk simulation for 5 seconds, and executing the time t1=t0+ Δ t ═ 5, return multi-mode traffic trunk multi-agent at time t0Team prize value of
Figure BDA00029961519500000811
Wherein k isd、kf、klRespectively representing the per-person delay variation balance coefficient, the people flow throughput balance coefficient and the queuing length variation balance coefficient,
Figure BDA00029961519500000812
it represents the amount of variation in the delay per person,
Figure BDA00029961519500000813
wherein
Figure BDA00029961519500000814
And
Figure BDA00029961519500000815
respectively, at time t0And time t1The trunk line of (1) is delayed by all people,
Figure BDA00029961519500000816
representing the throughput of people, i.e. the total number of people passing through the traffic trunk during at,
Figure BDA00029961519500000817
indicating the amount of change in the queue length,
Figure BDA00029961519500000818
wherein
Figure BDA00029961519500000819
And
Figure BDA00029961519500000820
respectively, at time t0And time t1The number of people queuing in the traffic trunk;
(8) repeating the step (5) to obtain the time t1Multi-mode traffic status for each agent
Figure BDA00029961519500000821
Will be provided with
Figure BDA00029961519500000822
Saving to an experience playback pool, wherein
Figure BDA00029961519500000823
Indicating that multiple agents are at time t0The value of the team award of (a),
Figure BDA00029961519500000824
and
Figure BDA00029961519500000825
respectively time t0And time t1Global state list of (2) to
Figure BDA00029961519500000826
For the purpose of example only,
Figure BDA00029961519500000827
wherein
Figure BDA00029961519500000828
Indicating that agent 1 is at time t0The status of the acquisition is determined by the state of the acquisition,
Figure BDA00029961519500000829
is shown at time t0A list of actions selected by all agents,
Figure BDA00029961519500000830
wherein
Figure BDA00029961519500000831
Indicating that agent 1 is at time t0An action to perform;
(9) judging whether a preset simulation time t is reached1And (5) and T10800, and the step (5) is returned to iterate until T is satisfiedk+1Entering the step (10) at the temperature of more than or equal to T.
(10) Randomly sampling N-64 pairs of data from an empirical playback pool according to a loss function
Figure BDA0002996151950000091
Updating each agent neural network parameter using a gradient descent, wherein θallThe neural network parameters representing all of the agents,
Figure BDA0002996151950000092
a global reward function representing 4 agent collaborations,
Figure BDA0002996151950000093
wherein k isbThe importance balance coefficients representing the intersection b are all 1 in this example, and theta is taken asbNeural network parameters representing agent bTarget prize value
Figure BDA0002996151950000094
Where γ represents the attenuation coefficient, in this example 0.85, uallA list of actions representing all agents; (11) every time the step (10) is carried out for representing 1 round of training, whether the updating times reach the preset training round number N is judgedepisode1000, if the preset training wheel number N is not reachedepisodeAnd (5) returning to the step (4) for loop iteration, and if the preset number of training rounds N is reachedepisodeAnd outputting the intelligent agents of 4 intersections of the multi-mode traffic trunk based on multi-agent cooperative training.
As shown in fig. 4, the multi-mode traffic trunk signal coordination control device based on multi-agent cooperation disclosed in the embodiment of the present invention includes: the system comprises a multi-mode traffic trunk line sensing module, a data storage module and a cooperative multi-mode traffic trunk line signal coordination control intelligent agent calculating module; the multi-mode traffic trunk sensing module is used for acquiring the channelized design, the number of entrance roads, the length of road sections, the positions of bus stations, non-motor vehicle lanes and the positions of sidewalks of all intersections of a target trunk, acquiring the number and the route of buses on the trunk, departure intervals, parking time, the number and the speed of passengers of social vehicles, pedestrians and non-motor vehicles, the queuing length in front of the intersections and the like; the data storage module is used for storing the data acquired by the multi-mode traffic trunk sensing module and the traffic trunk state sensing unit; the cooperative multi-mode traffic trunk signal coordination control intelligent agent calculation module is used for calculating and storing the intelligent agents at the cooperative trunk intersection according to the iterative training in the claim 1 and outputting and storing the intelligent agents at each intersection of the multi-mode traffic trunks cooperatively trained by the multiple intelligent agents.
Wherein the multi-mode traffic trunk perception module: the system comprises a traffic trunk data sensing unit and a traffic trunk state sensing unit; the data storage module includes: a traffic trunk intersection data unit and a traffic trunk traffic flow data unit; the cooperative multi-mode traffic trunk signal coordination control intelligent agent calculation module comprises: and the intelligent agent computing and storing unit.
The embodiment of the multi-mode traffic trunk signal coordination control device based on multi-agent cooperation and the embodiment of the multi-mode traffic trunk signal coordination control method based on multi-agent cooperation disclosed by the embodiment belong to the same concept, and the specific implementation process is described in the embodiment of the method, and is not described herein again.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims (4)

1. A multi-mode traffic trunk signal coordination control method based on multi-agent cooperation is characterized by comprising the following steps:
(1) acquiring intersection information of the traffic trunk line and a multi-mode traffic flow mode, performing simulation calibration on the multi-mode traffic trunk line by using simulation software according to the data, and reducing the arrival rate of the multi-mode traffic flow;
(2) generating a signal control agent for each intersection in the trunk line, wherein n intersections of the traffic trunk line correspond to n agents, and the agent i reads the time tkThe intersection comprises the states of multi-mode traffic position, queuing length and speed information
Figure FDA0002996151940000011
Will state
Figure FDA0002996151940000012
Inputting agent i at time tkThe time parameter is
Figure FDA0002996151940000013
The neural network of (1) outputs intersection agent i at time tkPhase of motion of
Figure FDA0002996151940000014
Wherein the content of the first and second substances,
Figure FDA0002996151940000015
representing parameters in a neural network
Figure FDA0002996151940000016
Selecting the operation phase aiAnd the state is
Figure FDA0002996151940000017
Value function in the case of (1), Q value, AiIndicating the set of motion phases, a, that can be released at intersection iiIs represented by AiOne action phase of;
(3) initializing neural network parameters and experience playback pools of all agents in a trunk line, and setting the number N of training roundsepisode
(4) Initializing the simulated multi-mode traffic trunk flow arrival rate, and setting the initial simulation time t0Total simulation time T;
(5) obtaining the multi-mode traffic state of each agent in the traffic trunk simulation, and for agent i, obtaining the corresponding ith intersection at the time tkMulti-mode traffic local observation state
Figure FDA0002996151940000018
Wherein the content of the first and second substances,
Figure FDA0002996151940000019
Figure FDA00029961519400000110
respectively shows the ith intersection at the time tkThe social vehicle state, the public traffic state, the pedestrian state and the non-motor vehicle state, the states comprise the position, the queuing length and the speed information,
Figure FDA00029961519400000111
indicating that the intersection adjacent to the ith intersection is at time tkThe phase state of (a);
(6) the local observed state of each agent is input into its neural network, and for agent i, it will be
Figure FDA00029961519400000112
Return time t after input to neural networkkPhase of motion of
Figure FDA00029961519400000113
Phase of simultaneous return motion
Figure FDA00029961519400000114
Corresponding Q value
Figure FDA00029961519400000115
Wherein A isiRepresenting the set of action phases that intersection i can clear,
Figure FDA00029961519400000116
indicating agent i at time tkParameter of temporal neural network, aiIs represented by AiOne operating phase of (1), Qi(. cndot.) represents the neural network Q function corresponding to agent i,
Figure FDA00029961519400000117
neural network representing agent i at time tkThe parameters of (1);
(7) phase of action to be returned by each agent
Figure FDA00029961519400000118
Delta t seconds are executed in each corresponding intersection signal lamp of the traffic trunk simulation, and the time is changed into tk+1=tk+ Δ t, the simulation environment returns to the multi-mode traffic trunk multi-agent at time tkTeam prize value of
Figure FDA0002996151940000021
Wherein k isd、kf、klRespectively representing the per-person delay variation balance coefficient, the people flow throughput balance coefficient and the queuing length variation balance coefficient,
Figure FDA0002996151940000022
it represents the amount of variation in the delay per person,
Figure FDA0002996151940000023
wherein the content of the first and second substances,
Figure FDA0002996151940000024
and
Figure FDA0002996151940000025
respectively, at time tkAnd time tk+1The trunk line of (1) is delayed by all people,
Figure FDA0002996151940000026
representing the throughput of people, i.e. the total number of people passing through the traffic trunk during at,
Figure FDA0002996151940000027
indicating the amount of change in the queue length,
Figure FDA0002996151940000028
wherein the content of the first and second substances,
Figure FDA0002996151940000029
and
Figure FDA00029961519400000210
respectively, at time tkAnd time tk+1The number of people queuing in the traffic trunk;
(8) repeating the step (5) to obtain the time tk+1Multi-mode traffic status for each agent
Figure FDA00029961519400000211
Will be provided with
Figure FDA00029961519400000212
Saving the experience to an experience playback pool, wherein,
Figure FDA00029961519400000213
indicating that multiple agents are at time tkThe value of the team award of (a),
Figure FDA00029961519400000214
and
Figure FDA00029961519400000215
respectively represent the time tkAnd time tk+1The global state list of (a) is,
Figure FDA00029961519400000216
wherein the content of the first and second substances,
Figure FDA00029961519400000217
indicating that the nth agent is at time tkThe state of execution is such that,
Figure FDA00029961519400000218
wherein the content of the first and second substances,
Figure FDA00029961519400000219
indicating that the nth agent is at time tk+1The state of execution is such that,
Figure FDA00029961519400000220
is shown at time tkA list of actions selected by all agents,
Figure FDA00029961519400000221
wherein the content of the first and second substances,
Figure FDA00029961519400000222
indicating that the nth agent is at time tkAn action to perform;
(9) judging whether the preset simulation time is reached, if t, judging whether the preset simulation time is reachedk+1If the value is more than or equal to T, entering the step (10), otherwise, returning to the step (5) for iteration;
(10) randomly sampling N pairs of data from an empirical playback pool according to a loss function
Figure FDA00029961519400000223
Updating each agent neural network parameter using a gradient descent, wherein θallThe neural network parameters representing all of the agents,
Figure FDA00029961519400000224
a global reward function representing multi-agent collaboration,
Figure FDA00029961519400000225
wherein k isbRepresenting the trade-off coefficient of the intersection b, n representing the number of agents, thetabNeural network parameter, target reward value representing agent b
Figure FDA00029961519400000226
Wherein γ represents an attenuation coefficient, uallA set of actions representing all agents;
(11) judging whether the updating times reach the preset training round number NepisodeIf the preset number of training rounds N is not reachedepisodeAnd (5) returning to the step (4) for loop iteration, and if the preset number of training rounds N is reachedepisodeAnd outputting the intelligent agents of each intersection of the multi-mode traffic trunk based on multi-agent cooperative training.
2. A multi-mode traffic trunk line signal coordination control device based on multi-agent cooperation is characterized by comprising:
the multi-mode traffic trunk sensing module comprises a traffic trunk data sensing unit and a traffic trunk state sensing unit, wherein the traffic trunk data sensing unit is used for acquiring the channelized design, the number of entrance lanes, the length of road sections, the positions of bus stations, non-motor vehicle lanes and the positions of sidewalks of all intersections of a target trunk, and the traffic trunk state sensing unit is used for acquiring the number of bus runs and routes, departure intervals, parking time, the number and speed of passengers of social vehicles, pedestrians and non-motor vehicles, the queuing length in front of the intersections and the passing phase of the current intersection;
the data storage module comprises a traffic trunk intersection data unit and a traffic trunk traffic flow data unit and is respectively used for storing the data acquired by the multi-mode traffic trunk sensing module and the traffic trunk state sensing unit;
the cooperative multi-mode traffic trunk signal coordination control intelligent agent calculation module comprises an intelligent agent calculation and storage unit, wherein the intelligent agent calculation and storage unit is used for calculating and storing the intelligent agents at the cooperative trunk intersection according to the iterative training in the claim 1 and outputting and storing the intelligent agents at each intersection of the multi-mode traffic trunks cooperatively trained by the multiple intelligent agents.
3. A computer device comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, wherein the computer program when executed by the processor implements the steps of the multi-agent collaboration based multi-mode transportation trunk signal coordination control method of claim 1.
4. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, realizes the steps of the multi-agent cooperation based multi-mode transportation trunk signal coordination control method of claim 1.
CN202110331935.8A 2021-03-29 2021-03-29 Multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation Active CN113299078B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110331935.8A CN113299078B (en) 2021-03-29 2021-03-29 Multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110331935.8A CN113299078B (en) 2021-03-29 2021-03-29 Multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation

Publications (2)

Publication Number Publication Date
CN113299078A CN113299078A (en) 2021-08-24
CN113299078B true CN113299078B (en) 2022-04-08

Family

ID=77319295

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110331935.8A Active CN113299078B (en) 2021-03-29 2021-03-29 Multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation

Country Status (1)

Country Link
CN (1) CN113299078B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114743388B (en) * 2022-03-22 2023-06-20 中山大学·深圳 Multi-intersection signal self-adaptive control method based on reinforcement learning
CN114973698B (en) * 2022-05-10 2024-04-16 阿波罗智联(北京)科技有限公司 Control information generation method and machine learning model training method and device
CN114627650B (en) * 2022-05-11 2022-08-23 深圳市城市交通规划设计研究中心股份有限公司 Urban public transport priority simulation deduction system, method, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112289044A (en) * 2020-11-02 2021-01-29 南京信息工程大学 Highway road cooperative control system and method based on deep reinforcement learning
CN112365724A (en) * 2020-04-13 2021-02-12 北方工业大学 Continuous intersection signal cooperative control method based on deep reinforcement learning
CN112406867A (en) * 2020-11-19 2021-02-26 清华大学 Emergency vehicle hybrid lane change decision method based on reinforcement learning and avoidance strategy
CN112489464A (en) * 2020-11-19 2021-03-12 天津大学 Crossing traffic signal lamp regulation and control method with position sensing function
WO2021051930A1 (en) * 2019-09-18 2021-03-25 平安科技(深圳)有限公司 Signal adjustment method and apparatus based on action prediction model, and computer device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021051930A1 (en) * 2019-09-18 2021-03-25 平安科技(深圳)有限公司 Signal adjustment method and apparatus based on action prediction model, and computer device
CN112365724A (en) * 2020-04-13 2021-02-12 北方工业大学 Continuous intersection signal cooperative control method based on deep reinforcement learning
CN112289044A (en) * 2020-11-02 2021-01-29 南京信息工程大学 Highway road cooperative control system and method based on deep reinforcement learning
CN112406867A (en) * 2020-11-19 2021-02-26 清华大学 Emergency vehicle hybrid lane change decision method based on reinforcement learning and avoidance strategy
CN112489464A (en) * 2020-11-19 2021-03-12 天津大学 Crossing traffic signal lamp regulation and control method with position sensing function

Also Published As

Publication number Publication date
CN113299078A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
CN113299078B (en) Multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation
CN110060475B (en) Multi-intersection signal lamp cooperative control method based on deep reinforcement learning
CN112700664B (en) Traffic signal timing optimization method based on deep reinforcement learning
CN111696370B (en) Traffic light control method based on heuristic deep Q network
CN104464310B (en) Urban area multi-intersection signal works in coordination with optimal control method and system
CN109215355A (en) A kind of single-point intersection signal timing optimization method based on deeply study
CN110114806A (en) Signalized control method, relevant device and system
CN103593535A (en) Urban traffic complex self-adaptive network parallel simulation system and method based on multi-scale integration
CN112071062B (en) Driving time estimation method based on graph convolution network and graph attention network
CN106355905A (en) Control method for overhead signal based on checkpoint data
CN109544922B (en) Traffic network distributed predictive control method based on region division
Aragon-Gómez et al. Traffic-signal control reinforcement learning approach for continuous-time Markov games
CN113053120B (en) Traffic signal lamp scheduling method and system based on iterative learning model predictive control
Kong et al. Urban arterial traffic two-direction green wave intelligent coordination control technique and its application
CN115188204B (en) Highway lane-level variable speed limit control method under abnormal weather condition
CN111625989A (en) Intelligent vehicle influx method and system based on A3C-SRU
CN113963555A (en) Deep reinforcement learning traffic signal control method combined with state prediction
CN113947928A (en) Traffic signal lamp timing method based on combination of deep reinforcement learning and extended Kalman filtering
CN113421439A (en) Monte Carlo algorithm-based single intersection traffic signal timing optimization method
CN113362618B (en) Multi-mode traffic adaptive signal control method and device based on strategy gradient
CN115691167A (en) Single-point traffic signal control method based on intersection holographic data
CN113724507B (en) Traffic control and vehicle guidance cooperative method and system based on deep reinforcement learning
CN113392577B (en) Regional boundary main intersection signal control method based on deep reinforcement learning
CN107977914B (en) Urban traffic management and control strategy visual quantitative test method
CN110021168B (en) Grading decision method for realizing real-time intelligent traffic management under Internet of vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant