CN110930734A - Intelligent idle traffic indicator lamp control method based on reinforcement learning - Google Patents
Intelligent idle traffic indicator lamp control method based on reinforcement learning Download PDFInfo
- Publication number
- CN110930734A CN110930734A CN201911207789.7A CN201911207789A CN110930734A CN 110930734 A CN110930734 A CN 110930734A CN 201911207789 A CN201911207789 A CN 201911207789A CN 110930734 A CN110930734 A CN 110930734A
- Authority
- CN
- China
- Prior art keywords
- traffic indicator
- defining
- reinforcement learning
- vehicles
- indicator lamp
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/07—Controlling traffic signals
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention relates to a leisure traffic indicator lamp control method based on reinforcement learning, which comprises the following steps: the SlimYOLOv3 model used senses the environment, parses the scene, identifies objects of all vehicle types in the scene, and locates the positions of these objects by defining bounding boxes around each object. Training a traffic indicator control intelligent body by adopting a DQN-based reinforcement learning method: a) defining an action space, randomly selecting actions according to the probability of the traffic indicator light, and selecting the actions according to the probability by adopting a greedy algorithm; b) defining a state space: the road surface state observed at any moment is the number of vehicles in different sections in each direction, and the observed state value is a six-dimensional vector; c) defining a reward function: the punishment weights of the three interval road sections and the sum are respectively sum, and the reward value is the sum of the punishment weights of all the road sections; d) and learning a strategy for enabling the reward value to be the highest by adopting a DQN-based reinforcement learning method to obtain the traffic indicator control intelligent agent with high performance.
Description
Technical Field
The invention belongs to the technical field of intelligent traffic indicator lamps, and particularly relates to a leisure traffic indicator lamp control method based on reinforcement learning.
Background
Along with the increase of the urbanization speed of China, the urban scale is gradually enlarged. In the field of traffic management, governments and related departments are dedicated to strengthening urban public traffic construction, perfecting road layout and opening urban microcirculation. At present, traffic signal lamps at urban street crossroads in China mostly adopt a timing type conversion control mode, namely, the conversion interval time is fixed and invariable. However, in the idle road section where the signal lamps are frequently used, the control method cannot well meet the driving experience of the driver. Compared with the situation that when the vehicle runs at night, the traffic flow of the auxiliary road is less, the red light of the main road is often waited, and the auxiliary road has no embarrassing situation that one vehicle passes. If the main road signal lamp is more, often cause the operation that the vehicle unnecessary waited for or the brake starts many times, not only indirectly shortened the life of vehicle, increase the oil consumption, still can bring the mood of going dysphoria for the driver, greatly reduced driving experience's satisfaction.
In order to solve the problems, the traffic indicator lamp at a certain intersection is set to be in a 'yellow flashing' state, on one hand, the traffic indicator lamp reminds passing vehicles to slow down and walk slowly, and on the other hand, the passing efficiency of the vehicles is also ensured. However, in actual driving, accidents caused by 'yellow flash' are frequent. Especially in idle time road sections, drivers often lose their consciousness because of less vehicles or tired driving at night, and get luck at yellow flash intersections. Therefore, the intelligent idle-time traffic indicator lamp control method is designed to have important invention significance for improving the driving happiness.
In recent years, with the development of artificial intelligence, intelligent traffic indicator light control algorithms are in a variety, and a control mode taking classification discussion as a core is formed. The basis for controlling the release time mainly includes the peak or peak-flat, the traffic flow in different directions of the road junction, the ratio of the traffic flow in each direction, etc. On the basis, the invention introduces an artificial intelligence reinforcement learning method and trains an intelligent agent for controlling the traffic indicator light based on the neural network. By observing road conditions and obtaining feedback values, the optimization process of the change of the traffic indicator lamp is automatically learned, the optimal control decision is given, and the idle traffic indicator lamp control method based on reinforcement learning is provided.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method introduces an intelligent agent for automatic learning and an autonomous learning decision-making process on the basis of the traditional classification discussion. The idle traffic indicator lamp control method based on reinforcement learning mainly comprises an image recognition technology and a reinforcement learning technology based on DQN (Deep Qnetwork). Object detection refers to techniques for identifying the location of objects in an image, and may be used to calculate the number of objects in an image, even in real-time video. The present invention requires that the real-time object detection model should be able to sense the environment, parse the scene, identify objects of all vehicle types in the scene, and locate the positions of these objects by defining bounding boxes around each object. The SlimYOLOv3 model crossroad video monitoring is used for real-time target detection, and data support is provided for an intelligent traffic indicator control method. On the basis, the method adopts a DQN model to train and estimate the neural network and the target neural network respectively, and updates network parameters to obtain the traffic indicator light control intelligent body. In order to achieve the purpose, the invention adopts the following technical scheme:
a control method of idle traffic indicator lamps based on reinforcement learning comprises the following steps:
the first step is as follows: the SlimYOLOv3 model used senses the environment, parses the scene, identifies objects of all vehicle types in the scene, locates the positions of the objects by defining a bounding box around each object, counts the number of road vehicles at the intersection:
a) the crossroad is divided into an east-west direction and a south-north direction which are respectively marked as E-W and S-N, and roads in all directions are divided into x by taking the crossroad as a center according to the distance from the crossroad to the crossroad1、x2And x3Three different intervals;
b) using the vehicle head as a reference, detecting the number of vehicles in each section by using a SlimYOLOv3 model, and recording the number of vehicles in the section of the passing direction i as nBiThe number of vehicles in the waiting direction i interval is nRi。
The second step is that: training a traffic indicator control intelligent body by adopting a DQN-based reinforcement learning method:
a) defining an action space: the traffic indicator lamp has two display states of green light E-W, red light S-N, red light E-W and green light S-N, which are respectively marked as B _ E and B _ S, and the initial traffic indicator lamp state is B _ E; the traffic indicator light has two behaviors of changing and not changing, which are respectively marked as Y and N, and the action space A is { Y, N }; the traffic indicator light randomly selects the action according to the probability of epsilon, and selects the action according to the probability of 1-epsilon by adopting a greedy algorithm;
b) defining a state space: the road surface state observed at any time t is the number of vehicles in different sections in each direction, and the observed state value stIs a six-dimensional vector, st=[nB1,nB2,nB3,nR1,nR2,nR3];
c) Defining a reward function: three block section x1、x2And x3Respectively is w1、w2And w3The reward value is the sum of the penalty weights of all the road sections and is recorded as
d) And initializing an estimated action value network, a target action value network, a traffic indicator lamp state and a road surface state, and learning a strategy for enabling the reward value to be the highest by adopting a DQN-based reinforcement learning method to obtain the traffic indicator lamp control intelligent agent with high performance.
Due to the adoption of the technical scheme, the invention has the following advantages:
(1) the SlimYOLOv3 model can be used to detect the target in real time. Assume that a trained object detection model is used, which takes 2 seconds to detect objects in the image. If the model is deployed in a traffic light system, the identified reasoning will be delayed and the traffic light cannot be adjusted in time. The SlimYOLOv3 model is improved on the traditional YOLOv3 model, and the pruned model results in less training parameters and lower calculation requirements, so that the real-time target detection is more convenient.
(2) Reinforcement learning is used to describe and solve the problem that an agent learns strategies to maximize return or achieve a specific goal during interaction with the environment, and the control essence of each intersection traffic light is the reinforcement learning problem. Compared with the traditional QL learning method, on one hand, the DQN adopts an experience playback strategy to randomly extract experiences and disorder the correlation among the experiences; on the other hand, two neural networks with the same structure but different parameters are adopted, so that the correlation is disturbed, and the updating of the neural networks is more efficient. Therefore, the invention provides a more effective and more intelligent traffic indicator lamp control method based on DQN.
Drawings
Fig. 1 shows the working principle of the slimyoov 3 model.
Fig. 2 is a DQN model framework.
Fig. 3 is a flow of a control method of idle traffic indicator lights based on reinforcement learning.
Fig. 4 is a schematic diagram of an intersection.
Detailed Description
The invention provides a reinforcement learning-based idle traffic indicator control method, which is characterized in that a SlimYOLOv3 model is used for collecting real-time road traffic flow conditions, a DQN reinforcement learning algorithm-based traffic control intelligent agent is used for learning a traffic control intelligent agent, and an intelligent traffic indicator control method is provided for idle road sections, wherein the flow of the method is shown in FIG. 3.
The specific implementation method comprises the following steps:
a) the crossroad is divided into east-west and south-north directions which are respectively marked as E-W and S-N. The traffic indicator light has two display states of green light E-W, red light S-N, red light E-W and green light S-N, which are respectively marked as B _ E and B _ S.
b) And acquiring real-time road traffic flow conditions by using a SlimYOLOv3 model. Specifically, each direction road is divided into x by taking the intersection as the center1、x2And x3Three intervals, as shown in fig. 4. Detecting the number of vehicles in each section by taking the vehicle head as a reference, and respectively recording the number as n1、n2And n3. Observed state value s at time ttIs a six-dimensional vector, st=[nB1,nB2,nB3,nR1,nR2,nR3]. Wherein n isBiNumber of vehicles, n, in section representing direction of passage iRiRepresenting the number of vehicles in the waiting direction i interval.
c) Initializing an experience pool D and estimating an action value network QθAnd a target action value network
d) The initial traffic light state is B _ E, and the road surface state s is initialized0=[nB1,nB2,nB3,nR1,nR2,nR3];
e) The traffic light has two behaviors of changing and not changing, which are respectively marked as Y and N, and the action space A is { Y, N }. The traffic indicator light randomly selects the action a according to the probability of epsilontThe action a is selected by a greedy algorithm with a probability of 1-epsilont=argmaxaQ(st,a;θ);
f) Three road sections x1、x2And x3Respectively is w1、w2And w3The reward value is the sum of the penalty weights of all the road sections and is recorded asTraffic light performing action atObservation of the prize value rtAnd the state of the road surface at the next moment st+1;
g) Will experience(s)t,at,rt,st+1) Recording the data into an experience pool D;
h) randomly draw mini-batch samples(s) from the experience pool Dj,aj,rj,sj+1);
j) Minimizing loss function J (theta) to E [ (y) by using random gradient descent algorithmj-Q(sj+1,aj;θ))2]Updating the estimated action value network parameter theta;
l) repeating the steps d) to k) until a strategy pi enabling the reward value to be the highest is learned, and obtaining the traffic indicator light control intelligent agent with high performance.
Claims (1)
1. A control method of idle traffic indicator lamps based on reinforcement learning comprises the following steps:
the first step is as follows: the SlimYOLOv3 model used senses the environment, parses the scene, identifies objects of all vehicle types in the scene, locates the positions of the objects by defining a bounding box around each object, counts the number of road vehicles at the intersection:
a) the crossroad is divided into an east-west direction and a south-north direction which are respectively marked as E-W and S-N, and roads in all directions are divided into x by taking the crossroad as a center according to the distance from the crossroad to the crossroad1、x2And x3Three different intervals;
b) using the vehicle head as a reference, detecting the number of vehicles in each section by using a SlimYOLOv3 model, and recording the number of vehicles in the section of the passing direction i as nBiThe number of vehicles in the waiting direction i interval is nRi。
The second step is that: training a traffic indicator control intelligent body by adopting a DQN-based reinforcement learning method:
a) defining an action space: the traffic indicator lamp has two display states of green light E-W, red light S-N, red light E-W and green light S-N, which are respectively marked as B _ E and B _ S, and the initial traffic indicator lamp state is B _ E; the traffic indicator light has two behaviors of changing and not changing, which are respectively marked as Y and N, and the action space A is { Y, N }; the traffic indicator light randomly selects the action according to the probability of epsilon, and selects the action according to the probability of 1-epsilon by adopting a greedy algorithm;
b) defining a state space: the road surface state observed at any time t is the number of vehicles in different sections in each direction, and the observed state value stIs a six-dimensional vector, st=[nB1,nB2,nB3,nR1,nR2,nR3];
c) Defining a reward function: three block section x1、x2And x3Respectively is w1、w2And w3The reward value is the sum of the penalty weights of all the road sections and is recorded as
d) And initializing an estimated action value network, a target action value network, a traffic indicator lamp state and a road surface state, and learning a strategy for enabling the reward value to be the highest by adopting a DQN-based reinforcement learning method to obtain the traffic indicator lamp control intelligent agent with high performance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911207789.7A CN110930734A (en) | 2019-11-30 | 2019-11-30 | Intelligent idle traffic indicator lamp control method based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911207789.7A CN110930734A (en) | 2019-11-30 | 2019-11-30 | Intelligent idle traffic indicator lamp control method based on reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110930734A true CN110930734A (en) | 2020-03-27 |
Family
ID=69848040
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911207789.7A Pending CN110930734A (en) | 2019-11-30 | 2019-11-30 | Intelligent idle traffic indicator lamp control method based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110930734A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112233435A (en) * | 2020-12-18 | 2021-01-15 | 深圳市城市交通规划设计研究中心股份有限公司 | Traffic control method, system, terminal device and storage medium |
CN112614343A (en) * | 2020-12-11 | 2021-04-06 | 多伦科技股份有限公司 | Traffic signal control method and system based on random strategy gradient and electronic equipment |
CN112863206A (en) * | 2021-01-07 | 2021-05-28 | 北京大学 | Traffic signal lamp control method and system based on reinforcement learning |
CN114613169A (en) * | 2022-04-20 | 2022-06-10 | 南京信息工程大学 | Traffic signal lamp control method based on double experience pools DQN |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150102945A1 (en) * | 2011-12-16 | 2015-04-16 | Pragmatek Transport Innovations, Inc. | Multi-agent reinforcement learning for integrated and networked adaptive traffic signal control |
CN106910351A (en) * | 2017-04-19 | 2017-06-30 | 大连理工大学 | A kind of traffic signals self-adaptation control method based on deeply study |
CN109215355A (en) * | 2018-08-09 | 2019-01-15 | 北京航空航天大学 | A kind of single-point intersection signal timing optimization method based on deeply study |
CN109472984A (en) * | 2018-12-27 | 2019-03-15 | 苏州科技大学 | Signalized control method, system and storage medium based on deeply study |
CN109509214A (en) * | 2018-10-15 | 2019-03-22 | 杭州电子科技大学 | A kind of ship target tracking based on deep learning |
CN109544913A (en) * | 2018-11-07 | 2019-03-29 | 南京邮电大学 | A kind of traffic lights dynamic timing algorithm based on depth Q e-learning |
CN109559530A (en) * | 2019-01-07 | 2019-04-02 | 大连理工大学 | A kind of multi-intersection signal lamp cooperative control method based on Q value Transfer Depth intensified learning |
CN110060475A (en) * | 2019-04-17 | 2019-07-26 | 清华大学 | A kind of multi-intersection signal lamp cooperative control method based on deeply study |
CN110164151A (en) * | 2019-06-21 | 2019-08-23 | 西安电子科技大学 | Traffic lamp control method based on distributed deep-cycle Q network |
-
2019
- 2019-11-30 CN CN201911207789.7A patent/CN110930734A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150102945A1 (en) * | 2011-12-16 | 2015-04-16 | Pragmatek Transport Innovations, Inc. | Multi-agent reinforcement learning for integrated and networked adaptive traffic signal control |
CN106910351A (en) * | 2017-04-19 | 2017-06-30 | 大连理工大学 | A kind of traffic signals self-adaptation control method based on deeply study |
CN109215355A (en) * | 2018-08-09 | 2019-01-15 | 北京航空航天大学 | A kind of single-point intersection signal timing optimization method based on deeply study |
CN109509214A (en) * | 2018-10-15 | 2019-03-22 | 杭州电子科技大学 | A kind of ship target tracking based on deep learning |
CN109544913A (en) * | 2018-11-07 | 2019-03-29 | 南京邮电大学 | A kind of traffic lights dynamic timing algorithm based on depth Q e-learning |
CN109472984A (en) * | 2018-12-27 | 2019-03-15 | 苏州科技大学 | Signalized control method, system and storage medium based on deeply study |
CN109559530A (en) * | 2019-01-07 | 2019-04-02 | 大连理工大学 | A kind of multi-intersection signal lamp cooperative control method based on Q value Transfer Depth intensified learning |
CN110060475A (en) * | 2019-04-17 | 2019-07-26 | 清华大学 | A kind of multi-intersection signal lamp cooperative control method based on deeply study |
CN110164151A (en) * | 2019-06-21 | 2019-08-23 | 西安电子科技大学 | Traffic lamp control method based on distributed deep-cycle Q network |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112614343A (en) * | 2020-12-11 | 2021-04-06 | 多伦科技股份有限公司 | Traffic signal control method and system based on random strategy gradient and electronic equipment |
CN112233435A (en) * | 2020-12-18 | 2021-01-15 | 深圳市城市交通规划设计研究中心股份有限公司 | Traffic control method, system, terminal device and storage medium |
CN112233435B (en) * | 2020-12-18 | 2021-04-02 | 深圳市城市交通规划设计研究中心股份有限公司 | Traffic control method, system, terminal device and storage medium |
CN112863206A (en) * | 2021-01-07 | 2021-05-28 | 北京大学 | Traffic signal lamp control method and system based on reinforcement learning |
CN114613169A (en) * | 2022-04-20 | 2022-06-10 | 南京信息工程大学 | Traffic signal lamp control method based on double experience pools DQN |
CN114613169B (en) * | 2022-04-20 | 2023-02-28 | 南京信息工程大学 | Traffic signal lamp control method based on double experience pools DQN |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110930734A (en) | Intelligent idle traffic indicator lamp control method based on reinforcement learning | |
WO2021051870A1 (en) | Reinforcement learning model-based information control method and apparatus, and computer device | |
CN109191830B (en) | Road congestion detection method based on video image processing | |
CN110717433A (en) | Deep learning-based traffic violation analysis method and device | |
WO2017156772A1 (en) | Method of computing passenger crowdedness and system applying same | |
CN106205156A (en) | A kind of crossing self-healing control method for the sudden change of part lane flow | |
CN107274672B (en) | Signal intersection single vehicle delay time estimation method based on GPS data | |
CN107016861A (en) | Traffic lights intelligent control system based on deep learning and intelligent road-lamp | |
CN110077398B (en) | Risk handling method for intelligent driving | |
CN110930723B (en) | Illegal parking detection implementation method | |
CN205665896U (en) | Intersection signal lamp state recognition device | |
CN104361648B (en) | Event data recorder provided with signal lamps for reminding other vehicles and control method of event data recorder | |
CN210442948U (en) | Automatic pedestrian crossing device with camera | |
CN113516854B (en) | Multi-interface coordination self-adaptive control method based on police card and video detector | |
CN109489679B (en) | Arrival time calculation method in navigation path | |
CN110321897A (en) | Divide the method for identification non-motor vehicle abnormal behaviour based on image, semantic | |
CN107590999A (en) | A kind of traffic state judging method based on bayonet socket data | |
CN115100904B (en) | Forward and game-based slow traffic and automobile conflict early warning method and system | |
CN104318760B (en) | Crossing violation behavior intelligent detection method and system based on analog model | |
CN113487872B (en) | Bus transit time prediction method based on big data and artificial intelligence | |
CN116524745B (en) | Cloud edge cooperative area traffic signal dynamic timing system and method | |
CN116704807A (en) | Guidance system and method for parking spaces in parking lot | |
CN113284338B (en) | Method for calculating influence of motor vehicle emergency avoidance no-lamp control pedestrian crossing on traffic flow | |
CN114333359A (en) | Artificial intelligence-based self-adaptive traffic signal lamp control method and system | |
CN113096415A (en) | Signal coordination optimization control method for secondary pedestrian crossing intersection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200327 |
|
RJ01 | Rejection of invention patent application after publication |