CN110930734A - Intelligent idle traffic indicator lamp control method based on reinforcement learning - Google Patents

Intelligent idle traffic indicator lamp control method based on reinforcement learning Download PDF

Info

Publication number
CN110930734A
CN110930734A CN201911207789.7A CN201911207789A CN110930734A CN 110930734 A CN110930734 A CN 110930734A CN 201911207789 A CN201911207789 A CN 201911207789A CN 110930734 A CN110930734 A CN 110930734A
Authority
CN
China
Prior art keywords
traffic indicator
defining
reinforcement learning
vehicles
indicator lamp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911207789.7A
Other languages
Chinese (zh)
Inventor
金志刚
韩玥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201911207789.7A priority Critical patent/CN110930734A/en
Publication of CN110930734A publication Critical patent/CN110930734A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention relates to a leisure traffic indicator lamp control method based on reinforcement learning, which comprises the following steps: the SlimYOLOv3 model used senses the environment, parses the scene, identifies objects of all vehicle types in the scene, and locates the positions of these objects by defining bounding boxes around each object. Training a traffic indicator control intelligent body by adopting a DQN-based reinforcement learning method: a) defining an action space, randomly selecting actions according to the probability of the traffic indicator light, and selecting the actions according to the probability by adopting a greedy algorithm; b) defining a state space: the road surface state observed at any moment is the number of vehicles in different sections in each direction, and the observed state value is a six-dimensional vector; c) defining a reward function: the punishment weights of the three interval road sections and the sum are respectively sum, and the reward value is the sum of the punishment weights of all the road sections; d) and learning a strategy for enabling the reward value to be the highest by adopting a DQN-based reinforcement learning method to obtain the traffic indicator control intelligent agent with high performance.

Description

Intelligent idle traffic indicator lamp control method based on reinforcement learning
Technical Field
The invention belongs to the technical field of intelligent traffic indicator lamps, and particularly relates to a leisure traffic indicator lamp control method based on reinforcement learning.
Background
Along with the increase of the urbanization speed of China, the urban scale is gradually enlarged. In the field of traffic management, governments and related departments are dedicated to strengthening urban public traffic construction, perfecting road layout and opening urban microcirculation. At present, traffic signal lamps at urban street crossroads in China mostly adopt a timing type conversion control mode, namely, the conversion interval time is fixed and invariable. However, in the idle road section where the signal lamps are frequently used, the control method cannot well meet the driving experience of the driver. Compared with the situation that when the vehicle runs at night, the traffic flow of the auxiliary road is less, the red light of the main road is often waited, and the auxiliary road has no embarrassing situation that one vehicle passes. If the main road signal lamp is more, often cause the operation that the vehicle unnecessary waited for or the brake starts many times, not only indirectly shortened the life of vehicle, increase the oil consumption, still can bring the mood of going dysphoria for the driver, greatly reduced driving experience's satisfaction.
In order to solve the problems, the traffic indicator lamp at a certain intersection is set to be in a 'yellow flashing' state, on one hand, the traffic indicator lamp reminds passing vehicles to slow down and walk slowly, and on the other hand, the passing efficiency of the vehicles is also ensured. However, in actual driving, accidents caused by 'yellow flash' are frequent. Especially in idle time road sections, drivers often lose their consciousness because of less vehicles or tired driving at night, and get luck at yellow flash intersections. Therefore, the intelligent idle-time traffic indicator lamp control method is designed to have important invention significance for improving the driving happiness.
In recent years, with the development of artificial intelligence, intelligent traffic indicator light control algorithms are in a variety, and a control mode taking classification discussion as a core is formed. The basis for controlling the release time mainly includes the peak or peak-flat, the traffic flow in different directions of the road junction, the ratio of the traffic flow in each direction, etc. On the basis, the invention introduces an artificial intelligence reinforcement learning method and trains an intelligent agent for controlling the traffic indicator light based on the neural network. By observing road conditions and obtaining feedback values, the optimization process of the change of the traffic indicator lamp is automatically learned, the optimal control decision is given, and the idle traffic indicator lamp control method based on reinforcement learning is provided.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method introduces an intelligent agent for automatic learning and an autonomous learning decision-making process on the basis of the traditional classification discussion. The idle traffic indicator lamp control method based on reinforcement learning mainly comprises an image recognition technology and a reinforcement learning technology based on DQN (Deep Qnetwork). Object detection refers to techniques for identifying the location of objects in an image, and may be used to calculate the number of objects in an image, even in real-time video. The present invention requires that the real-time object detection model should be able to sense the environment, parse the scene, identify objects of all vehicle types in the scene, and locate the positions of these objects by defining bounding boxes around each object. The SlimYOLOv3 model crossroad video monitoring is used for real-time target detection, and data support is provided for an intelligent traffic indicator control method. On the basis, the method adopts a DQN model to train and estimate the neural network and the target neural network respectively, and updates network parameters to obtain the traffic indicator light control intelligent body. In order to achieve the purpose, the invention adopts the following technical scheme:
a control method of idle traffic indicator lamps based on reinforcement learning comprises the following steps:
the first step is as follows: the SlimYOLOv3 model used senses the environment, parses the scene, identifies objects of all vehicle types in the scene, locates the positions of the objects by defining a bounding box around each object, counts the number of road vehicles at the intersection:
a) the crossroad is divided into an east-west direction and a south-north direction which are respectively marked as E-W and S-N, and roads in all directions are divided into x by taking the crossroad as a center according to the distance from the crossroad to the crossroad1、x2And x3Three different intervals;
b) using the vehicle head as a reference, detecting the number of vehicles in each section by using a SlimYOLOv3 model, and recording the number of vehicles in the section of the passing direction i as nBiThe number of vehicles in the waiting direction i interval is nRi
The second step is that: training a traffic indicator control intelligent body by adopting a DQN-based reinforcement learning method:
a) defining an action space: the traffic indicator lamp has two display states of green light E-W, red light S-N, red light E-W and green light S-N, which are respectively marked as B _ E and B _ S, and the initial traffic indicator lamp state is B _ E; the traffic indicator light has two behaviors of changing and not changing, which are respectively marked as Y and N, and the action space A is { Y, N }; the traffic indicator light randomly selects the action according to the probability of epsilon, and selects the action according to the probability of 1-epsilon by adopting a greedy algorithm;
b) defining a state space: the road surface state observed at any time t is the number of vehicles in different sections in each direction, and the observed state value stIs a six-dimensional vector, st=[nB1,nB2,nB3,nR1,nR2,nR3];
c) Defining a reward function: three block section x1、x2And x3Respectively is w1、w2And w3The reward value is the sum of the penalty weights of all the road sections and is recorded as
Figure BDA0002297301890000021
d) And initializing an estimated action value network, a target action value network, a traffic indicator lamp state and a road surface state, and learning a strategy for enabling the reward value to be the highest by adopting a DQN-based reinforcement learning method to obtain the traffic indicator lamp control intelligent agent with high performance.
Due to the adoption of the technical scheme, the invention has the following advantages:
(1) the SlimYOLOv3 model can be used to detect the target in real time. Assume that a trained object detection model is used, which takes 2 seconds to detect objects in the image. If the model is deployed in a traffic light system, the identified reasoning will be delayed and the traffic light cannot be adjusted in time. The SlimYOLOv3 model is improved on the traditional YOLOv3 model, and the pruned model results in less training parameters and lower calculation requirements, so that the real-time target detection is more convenient.
(2) Reinforcement learning is used to describe and solve the problem that an agent learns strategies to maximize return or achieve a specific goal during interaction with the environment, and the control essence of each intersection traffic light is the reinforcement learning problem. Compared with the traditional QL learning method, on one hand, the DQN adopts an experience playback strategy to randomly extract experiences and disorder the correlation among the experiences; on the other hand, two neural networks with the same structure but different parameters are adopted, so that the correlation is disturbed, and the updating of the neural networks is more efficient. Therefore, the invention provides a more effective and more intelligent traffic indicator lamp control method based on DQN.
Drawings
Fig. 1 shows the working principle of the slimyoov 3 model.
Fig. 2 is a DQN model framework.
Fig. 3 is a flow of a control method of idle traffic indicator lights based on reinforcement learning.
Fig. 4 is a schematic diagram of an intersection.
Detailed Description
The invention provides a reinforcement learning-based idle traffic indicator control method, which is characterized in that a SlimYOLOv3 model is used for collecting real-time road traffic flow conditions, a DQN reinforcement learning algorithm-based traffic control intelligent agent is used for learning a traffic control intelligent agent, and an intelligent traffic indicator control method is provided for idle road sections, wherein the flow of the method is shown in FIG. 3.
The specific implementation method comprises the following steps:
a) the crossroad is divided into east-west and south-north directions which are respectively marked as E-W and S-N. The traffic indicator light has two display states of green light E-W, red light S-N, red light E-W and green light S-N, which are respectively marked as B _ E and B _ S.
b) And acquiring real-time road traffic flow conditions by using a SlimYOLOv3 model. Specifically, each direction road is divided into x by taking the intersection as the center1、x2And x3Three intervals, as shown in fig. 4. Detecting the number of vehicles in each section by taking the vehicle head as a reference, and respectively recording the number as n1、n2And n3. Observed state value s at time ttIs a six-dimensional vector, st=[nB1,nB2,nB3,nR1,nR2,nR3]. Wherein n isBiNumber of vehicles, n, in section representing direction of passage iRiRepresenting the number of vehicles in the waiting direction i interval.
c) Initializing an experience pool D and estimating an action value network QθAnd a target action value network
Figure BDA0002297301890000031
d) The initial traffic light state is B _ E, and the road surface state s is initialized0=[nB1,nB2,nB3,nR1,nR2,nR3];
e) The traffic light has two behaviors of changing and not changing, which are respectively marked as Y and N, and the action space A is { Y, N }. The traffic indicator light randomly selects the action a according to the probability of epsilontThe action a is selected by a greedy algorithm with a probability of 1-epsilont=argmaxaQ(st,a;θ);
f) Three road sections x1、x2And x3Respectively is w1、w2And w3The reward value is the sum of the penalty weights of all the road sections and is recorded as
Figure BDA0002297301890000032
Traffic light performing action atObservation of the prize value rtAnd the state of the road surface at the next moment st+1
g) Will experience(s)t,at,rt,st+1) Recording the data into an experience pool D;
h) randomly draw mini-batch samples(s) from the experience pool Dj,aj,rj,sj+1);
i) Computing
Figure BDA0002297301890000033
j) Minimizing loss function J (theta) to E [ (y) by using random gradient descent algorithmj-Q(sj+1,aj;θ))2]Updating the estimated action value network parameter theta;
k) repeating steps e) -j), resetting the network every interval c
Figure BDA0002297301890000034
l) repeating the steps d) to k) until a strategy pi enabling the reward value to be the highest is learned, and obtaining the traffic indicator light control intelligent agent with high performance.

Claims (1)

1. A control method of idle traffic indicator lamps based on reinforcement learning comprises the following steps:
the first step is as follows: the SlimYOLOv3 model used senses the environment, parses the scene, identifies objects of all vehicle types in the scene, locates the positions of the objects by defining a bounding box around each object, counts the number of road vehicles at the intersection:
a) the crossroad is divided into an east-west direction and a south-north direction which are respectively marked as E-W and S-N, and roads in all directions are divided into x by taking the crossroad as a center according to the distance from the crossroad to the crossroad1、x2And x3Three different intervals;
b) using the vehicle head as a reference, detecting the number of vehicles in each section by using a SlimYOLOv3 model, and recording the number of vehicles in the section of the passing direction i as nBiThe number of vehicles in the waiting direction i interval is nRi
The second step is that: training a traffic indicator control intelligent body by adopting a DQN-based reinforcement learning method:
a) defining an action space: the traffic indicator lamp has two display states of green light E-W, red light S-N, red light E-W and green light S-N, which are respectively marked as B _ E and B _ S, and the initial traffic indicator lamp state is B _ E; the traffic indicator light has two behaviors of changing and not changing, which are respectively marked as Y and N, and the action space A is { Y, N }; the traffic indicator light randomly selects the action according to the probability of epsilon, and selects the action according to the probability of 1-epsilon by adopting a greedy algorithm;
b) defining a state space: the road surface state observed at any time t is the number of vehicles in different sections in each direction, and the observed state value stIs a six-dimensional vector, st=[nB1,nB2,nB3,nR1,nR2,nR3];
c) Defining a reward function: three block section x1、x2And x3Respectively is w1、w2And w3The reward value is the sum of the penalty weights of all the road sections and is recorded as
Figure FDA0002297301880000011
d) And initializing an estimated action value network, a target action value network, a traffic indicator lamp state and a road surface state, and learning a strategy for enabling the reward value to be the highest by adopting a DQN-based reinforcement learning method to obtain the traffic indicator lamp control intelligent agent with high performance.
CN201911207789.7A 2019-11-30 2019-11-30 Intelligent idle traffic indicator lamp control method based on reinforcement learning Pending CN110930734A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911207789.7A CN110930734A (en) 2019-11-30 2019-11-30 Intelligent idle traffic indicator lamp control method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911207789.7A CN110930734A (en) 2019-11-30 2019-11-30 Intelligent idle traffic indicator lamp control method based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN110930734A true CN110930734A (en) 2020-03-27

Family

ID=69848040

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911207789.7A Pending CN110930734A (en) 2019-11-30 2019-11-30 Intelligent idle traffic indicator lamp control method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN110930734A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112233435A (en) * 2020-12-18 2021-01-15 深圳市城市交通规划设计研究中心股份有限公司 Traffic control method, system, terminal device and storage medium
CN112614343A (en) * 2020-12-11 2021-04-06 多伦科技股份有限公司 Traffic signal control method and system based on random strategy gradient and electronic equipment
CN112863206A (en) * 2021-01-07 2021-05-28 北京大学 Traffic signal lamp control method and system based on reinforcement learning
CN114613169A (en) * 2022-04-20 2022-06-10 南京信息工程大学 Traffic signal lamp control method based on double experience pools DQN

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150102945A1 (en) * 2011-12-16 2015-04-16 Pragmatek Transport Innovations, Inc. Multi-agent reinforcement learning for integrated and networked adaptive traffic signal control
CN106910351A (en) * 2017-04-19 2017-06-30 大连理工大学 A kind of traffic signals self-adaptation control method based on deeply study
CN109215355A (en) * 2018-08-09 2019-01-15 北京航空航天大学 A kind of single-point intersection signal timing optimization method based on deeply study
CN109472984A (en) * 2018-12-27 2019-03-15 苏州科技大学 Signalized control method, system and storage medium based on deeply study
CN109509214A (en) * 2018-10-15 2019-03-22 杭州电子科技大学 A kind of ship target tracking based on deep learning
CN109544913A (en) * 2018-11-07 2019-03-29 南京邮电大学 A kind of traffic lights dynamic timing algorithm based on depth Q e-learning
CN109559530A (en) * 2019-01-07 2019-04-02 大连理工大学 A kind of multi-intersection signal lamp cooperative control method based on Q value Transfer Depth intensified learning
CN110060475A (en) * 2019-04-17 2019-07-26 清华大学 A kind of multi-intersection signal lamp cooperative control method based on deeply study
CN110164151A (en) * 2019-06-21 2019-08-23 西安电子科技大学 Traffic lamp control method based on distributed deep-cycle Q network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150102945A1 (en) * 2011-12-16 2015-04-16 Pragmatek Transport Innovations, Inc. Multi-agent reinforcement learning for integrated and networked adaptive traffic signal control
CN106910351A (en) * 2017-04-19 2017-06-30 大连理工大学 A kind of traffic signals self-adaptation control method based on deeply study
CN109215355A (en) * 2018-08-09 2019-01-15 北京航空航天大学 A kind of single-point intersection signal timing optimization method based on deeply study
CN109509214A (en) * 2018-10-15 2019-03-22 杭州电子科技大学 A kind of ship target tracking based on deep learning
CN109544913A (en) * 2018-11-07 2019-03-29 南京邮电大学 A kind of traffic lights dynamic timing algorithm based on depth Q e-learning
CN109472984A (en) * 2018-12-27 2019-03-15 苏州科技大学 Signalized control method, system and storage medium based on deeply study
CN109559530A (en) * 2019-01-07 2019-04-02 大连理工大学 A kind of multi-intersection signal lamp cooperative control method based on Q value Transfer Depth intensified learning
CN110060475A (en) * 2019-04-17 2019-07-26 清华大学 A kind of multi-intersection signal lamp cooperative control method based on deeply study
CN110164151A (en) * 2019-06-21 2019-08-23 西安电子科技大学 Traffic lamp control method based on distributed deep-cycle Q network

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112614343A (en) * 2020-12-11 2021-04-06 多伦科技股份有限公司 Traffic signal control method and system based on random strategy gradient and electronic equipment
CN112233435A (en) * 2020-12-18 2021-01-15 深圳市城市交通规划设计研究中心股份有限公司 Traffic control method, system, terminal device and storage medium
CN112233435B (en) * 2020-12-18 2021-04-02 深圳市城市交通规划设计研究中心股份有限公司 Traffic control method, system, terminal device and storage medium
CN112863206A (en) * 2021-01-07 2021-05-28 北京大学 Traffic signal lamp control method and system based on reinforcement learning
CN114613169A (en) * 2022-04-20 2022-06-10 南京信息工程大学 Traffic signal lamp control method based on double experience pools DQN
CN114613169B (en) * 2022-04-20 2023-02-28 南京信息工程大学 Traffic signal lamp control method based on double experience pools DQN

Similar Documents

Publication Publication Date Title
CN110930734A (en) Intelligent idle traffic indicator lamp control method based on reinforcement learning
WO2021051870A1 (en) Reinforcement learning model-based information control method and apparatus, and computer device
CN109191830B (en) Road congestion detection method based on video image processing
CN110717433A (en) Deep learning-based traffic violation analysis method and device
WO2017156772A1 (en) Method of computing passenger crowdedness and system applying same
CN106205156A (en) A kind of crossing self-healing control method for the sudden change of part lane flow
CN107274672B (en) Signal intersection single vehicle delay time estimation method based on GPS data
CN107016861A (en) Traffic lights intelligent control system based on deep learning and intelligent road-lamp
CN110077398B (en) Risk handling method for intelligent driving
CN110930723B (en) Illegal parking detection implementation method
CN205665896U (en) Intersection signal lamp state recognition device
CN104361648B (en) Event data recorder provided with signal lamps for reminding other vehicles and control method of event data recorder
CN210442948U (en) Automatic pedestrian crossing device with camera
CN113516854B (en) Multi-interface coordination self-adaptive control method based on police card and video detector
CN109489679B (en) Arrival time calculation method in navigation path
CN110321897A (en) Divide the method for identification non-motor vehicle abnormal behaviour based on image, semantic
CN107590999A (en) A kind of traffic state judging method based on bayonet socket data
CN115100904B (en) Forward and game-based slow traffic and automobile conflict early warning method and system
CN104318760B (en) Crossing violation behavior intelligent detection method and system based on analog model
CN113487872B (en) Bus transit time prediction method based on big data and artificial intelligence
CN116524745B (en) Cloud edge cooperative area traffic signal dynamic timing system and method
CN116704807A (en) Guidance system and method for parking spaces in parking lot
CN113284338B (en) Method for calculating influence of motor vehicle emergency avoidance no-lamp control pedestrian crossing on traffic flow
CN114333359A (en) Artificial intelligence-based self-adaptive traffic signal lamp control method and system
CN113096415A (en) Signal coordination optimization control method for secondary pedestrian crossing intersection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200327

RJ01 Rejection of invention patent application after publication