CN110930734A

CN110930734A - Intelligent idle traffic indicator lamp control method based on reinforcement learning

Info

Publication number: CN110930734A
Application number: CN201911207789.7A
Authority: CN
Inventors: 金志刚; 韩玥
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-11-30
Filing date: 2019-11-30
Publication date: 2020-03-27

Abstract

The invention relates to a leisure traffic indicator lamp control method based on reinforcement learning, which comprises the following steps: the SlimYOLOv3 model used senses the environment, parses the scene, identifies objects of all vehicle types in the scene, and locates the positions of these objects by defining bounding boxes around each object. Training a traffic indicator control intelligent body by adopting a DQN-based reinforcement learning method: a) defining an action space, randomly selecting actions according to the probability of the traffic indicator light, and selecting the actions according to the probability by adopting a greedy algorithm; b) defining a state space: the road surface state observed at any moment is the number of vehicles in different sections in each direction, and the observed state value is a six-dimensional vector; c) defining a reward function: the punishment weights of the three interval road sections and the sum are respectively sum, and the reward value is the sum of the punishment weights of all the road sections; d) and learning a strategy for enabling the reward value to be the highest by adopting a DQN-based reinforcement learning method to obtain the traffic indicator control intelligent agent with high performance.

Description

Intelligent idle traffic indicator lamp control method based on reinforcement learning

Technical Field

The invention belongs to the technical field of intelligent traffic indicator lamps, and particularly relates to a leisure traffic indicator lamp control method based on reinforcement learning.

Background

Along with the increase of the urbanization speed of China, the urban scale is gradually enlarged. In the field of traffic management, governments and related departments are dedicated to strengthening urban public traffic construction, perfecting road layout and opening urban microcirculation. At present, traffic signal lamps at urban street crossroads in China mostly adopt a timing type conversion control mode, namely, the conversion interval time is fixed and invariable. However, in the idle road section where the signal lamps are frequently used, the control method cannot well meet the driving experience of the driver. Compared with the situation that when the vehicle runs at night, the traffic flow of the auxiliary road is less, the red light of the main road is often waited, and the auxiliary road has no embarrassing situation that one vehicle passes. If the main road signal lamp is more, often cause the operation that the vehicle unnecessary waited for or the brake starts many times, not only indirectly shortened the life of vehicle, increase the oil consumption, still can bring the mood of going dysphoria for the driver, greatly reduced driving experience's satisfaction.

In order to solve the problems, the traffic indicator lamp at a certain intersection is set to be in a 'yellow flashing' state, on one hand, the traffic indicator lamp reminds passing vehicles to slow down and walk slowly, and on the other hand, the passing efficiency of the vehicles is also ensured. However, in actual driving, accidents caused by 'yellow flash' are frequent. Especially in idle time road sections, drivers often lose their consciousness because of less vehicles or tired driving at night, and get luck at yellow flash intersections. Therefore, the intelligent idle-time traffic indicator lamp control method is designed to have important invention significance for improving the driving happiness.

In recent years, with the development of artificial intelligence, intelligent traffic indicator light control algorithms are in a variety, and a control mode taking classification discussion as a core is formed. The basis for controlling the release time mainly includes the peak or peak-flat, the traffic flow in different directions of the road junction, the ratio of the traffic flow in each direction, etc. On the basis, the invention introduces an artificial intelligence reinforcement learning method and trains an intelligent agent for controlling the traffic indicator light based on the neural network. By observing road conditions and obtaining feedback values, the optimization process of the change of the traffic indicator lamp is automatically learned, the optimal control decision is given, and the idle traffic indicator lamp control method based on reinforcement learning is provided.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the method introduces an intelligent agent for automatic learning and an autonomous learning decision-making process on the basis of the traditional classification discussion. The idle traffic indicator lamp control method based on reinforcement learning mainly comprises an image recognition technology and a reinforcement learning technology based on DQN (Deep Qnetwork). Object detection refers to techniques for identifying the location of objects in an image, and may be used to calculate the number of objects in an image, even in real-time video. The present invention requires that the real-time object detection model should be able to sense the environment, parse the scene, identify objects of all vehicle types in the scene, and locate the positions of these objects by defining bounding boxes around each object. The SlimYOLOv3 model crossroad video monitoring is used for real-time target detection, and data support is provided for an intelligent traffic indicator control method. On the basis, the method adopts a DQN model to train and estimate the neural network and the target neural network respectively, and updates network parameters to obtain the traffic indicator light control intelligent body. In order to achieve the purpose, the invention adopts the following technical scheme:

a control method of idle traffic indicator lamps based on reinforcement learning comprises the following steps:

the first step is as follows: the SlimYOLOv3 model used senses the environment, parses the scene, identifies objects of all vehicle types in the scene, locates the positions of the objects by defining a bounding box around each object, counts the number of road vehicles at the intersection:

a) the crossroad is divided into an east-west direction and a south-north direction which are respectively marked as E-W and S-N, and roads in all directions are divided into x by taking the crossroad as a center according to the distance from the crossroad to the crossroad₁、x₂And x₃Three different intervals;

b) using the vehicle head as a reference, detecting the number of vehicles in each section by using a SlimYOLOv3 model, and recording the number of vehicles in the section of the passing direction i as n_BiThe number of vehicles in the waiting direction i interval is n_Ri。

The second step is that: training a traffic indicator control intelligent body by adopting a DQN-based reinforcement learning method:

a) defining an action space: the traffic indicator lamp has two display states of green light E-W, red light S-N, red light E-W and green light S-N, which are respectively marked as B _ E and B _ S, and the initial traffic indicator lamp state is B _ E; the traffic indicator light has two behaviors of changing and not changing, which are respectively marked as Y and N, and the action space A is { Y, N }; the traffic indicator light randomly selects the action according to the probability of epsilon, and selects the action according to the probability of 1-epsilon by adopting a greedy algorithm;

b) defining a state space: the road surface state observed at any time t is the number of vehicles in different sections in each direction, and the observed state value s_tIs a six-dimensional vector, s_t＝[n_B1,n_B2,n_B3,n_R1,n_R2,n_R3]；

c) Defining a reward function: three block section x₁、x₂And x₃Respectively is w₁、w₂And w₃The reward value is the sum of the penalty weights of all the road sections and is recorded as

d) And initializing an estimated action value network, a target action value network, a traffic indicator lamp state and a road surface state, and learning a strategy for enabling the reward value to be the highest by adopting a DQN-based reinforcement learning method to obtain the traffic indicator lamp control intelligent agent with high performance.

Due to the adoption of the technical scheme, the invention has the following advantages:

(1) the SlimYOLOv3 model can be used to detect the target in real time. Assume that a trained object detection model is used, which takes 2 seconds to detect objects in the image. If the model is deployed in a traffic light system, the identified reasoning will be delayed and the traffic light cannot be adjusted in time. The SlimYOLOv3 model is improved on the traditional YOLOv3 model, and the pruned model results in less training parameters and lower calculation requirements, so that the real-time target detection is more convenient.

(2) Reinforcement learning is used to describe and solve the problem that an agent learns strategies to maximize return or achieve a specific goal during interaction with the environment, and the control essence of each intersection traffic light is the reinforcement learning problem. Compared with the traditional QL learning method, on one hand, the DQN adopts an experience playback strategy to randomly extract experiences and disorder the correlation among the experiences; on the other hand, two neural networks with the same structure but different parameters are adopted, so that the correlation is disturbed, and the updating of the neural networks is more efficient. Therefore, the invention provides a more effective and more intelligent traffic indicator lamp control method based on DQN.

Drawings

Fig. 1 shows the working principle of the slimyoov 3 model.

Fig. 2 is a DQN model framework.

Fig. 3 is a flow of a control method of idle traffic indicator lights based on reinforcement learning.

Fig. 4 is a schematic diagram of an intersection.

Detailed Description

The invention provides a reinforcement learning-based idle traffic indicator control method, which is characterized in that a SlimYOLOv3 model is used for collecting real-time road traffic flow conditions, a DQN reinforcement learning algorithm-based traffic control intelligent agent is used for learning a traffic control intelligent agent, and an intelligent traffic indicator control method is provided for idle road sections, wherein the flow of the method is shown in FIG. 3.

The specific implementation method comprises the following steps:

a) the crossroad is divided into east-west and south-north directions which are respectively marked as E-W and S-N. The traffic indicator light has two display states of green light E-W, red light S-N, red light E-W and green light S-N, which are respectively marked as B _ E and B _ S.

b) And acquiring real-time road traffic flow conditions by using a SlimYOLOv3 model. Specifically, each direction road is divided into x by taking the intersection as the center₁、x₂And x₃Three intervals, as shown in fig. 4. Detecting the number of vehicles in each section by taking the vehicle head as a reference, and respectively recording the number as n₁、n₂And n₃. Observed state value s at time t_tIs a six-dimensional vector, s_t＝[n_B1,n_B2,n_B3,n_R1,n_R2,n_R3]. Wherein n is_BiNumber of vehicles, n, in section representing direction of passage i_RiRepresenting the number of vehicles in the waiting direction i interval.

c) Initializing an experience pool D and estimating an action value network Q_θAnd a target action value network

d) The initial traffic light state is B _ E, and the road surface state s is initialized₀＝[n_B1,n_B2,n_B3,n_R1,n_R2,n_R3]；

e) The traffic light has two behaviors of changing and not changing, which are respectively marked as Y and N, and the action space A is { Y, N }. The traffic indicator light randomly selects the action a according to the probability of epsilon_tThe action a is selected by a greedy algorithm with a probability of 1-epsilon_t＝argmax_aQ(s_t,a；θ)；

f) Three road sections x₁、x₂And x₃Respectively is w₁、w₂And w₃The reward value is the sum of the penalty weights of all the road sections and is recorded as

Traffic light performing action a_tObservation of the prize value r_tAnd the state of the road surface at the next moment s_t+1；

g) Will experience(s)_t,a_t,r_t,s_t+1) Recording the data into an experience pool D;

h) randomly draw mini-batch samples(s) from the experience pool D_j,a_j,r_j,s_j+1)；

i) Computing

j) Minimizing loss function J (theta) to E [ (y) by using random gradient descent algorithm_j-Q(s_j+1,a_j；θ))²]Updating the estimated action value network parameter theta;

k) repeating steps e) -j), resetting the network every interval c

l) repeating the steps d) to k) until a strategy pi enabling the reward value to be the highest is learned, and obtaining the traffic indicator light control intelligent agent with high performance.

Claims

1. A control method of idle traffic indicator lamps based on reinforcement learning comprises the following steps: