CN113033928A

CN113033928A - Design method, device and system of bus shift scheduling model based on deep reinforcement learning

Info

Publication number: CN113033928A
Application number: CN201911253753.2A
Authority: CN
Inventors: 王乾宇; 周金明; 赵丽
Original assignee: Nanjing Xingzheyi Intelligent Transportation Technology Co ltd
Current assignee: Nanjing Xingzheyi Intelligent Transportation Technology Co ltd
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2021-06-25
Anticipated expiration: 2039-12-09
Also published as: CN113033928B

Abstract

The invention discloses a design method of a bus scheduling model based on deep reinforcement learning, which comprises the following steps of 1, converting a scheduling process into a Markov decision process, 2, solving the Markov decision process, 3, scheduling according to a solving result, scheduling a departure timetable by using the deep reinforcement learning method, establishing a scheduling mathematical model, parameterizing related information, and scheduling aiming at different cities only by adjusting parameters; the bus operation efficiency is improved, and the bus operation cost is reduced.

Description

Design method, device and system of bus shift scheduling model based on deep reinforcement learning

Technical Field

The invention relates to the field of intelligent transportation and deep learning research, in particular to a method, a device and a system for designing a bus scheduling model based on deep reinforcement learning, and belongs to the field of intelligent bus scheduling and scheduling.

Background

With the continuous improvement of the motorization level of China, the construction and development of urban infrastructure are rapid, the urban area is continuously expanded, the public transportation construction of a city is more and more comprehensive, however, with the enlargement of the public transportation scale, the public transportation scheduling becomes more and more difficult, and the intelligent scheduling method plays a crucial role in efficiently and reasonably providing the public transportation resource, and is beneficial to more efficiently utilizing the public transportation resource and providing higher-quality public transportation service. In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: the traditional public transport scheduling in China mainly depends on manual scheduling and on the experience of scheduling personnel, so that the efficiency is low and the rationality of the scheduling cannot be ensured; the existing scheduling method has low efficiency, and the next scheduling is often performed after one scheduling, so that the method cannot flexibly cope with the constantly changing passenger flow.

Disclosure of Invention

In order to overcome the defects of the prior art, the embodiment of the disclosure provides a design method, a device and a system of a bus scheduling model based on deep reinforcement learning, which greatly improve the scheduling efficiency, and the technical scheme is as follows:

in a first aspect, a design method of a bus shift scheduling model based on deep reinforcement learning is provided, and the method includes:

step 1, generating three matrixes according to a departure schedule: a regular matrix X, a scheduling matrix Y and an optional position matrix Z; establishing a Markov decision process;

the rule matrix X belongs to {0,1}^N×NElement X of a regular matrix_i,jHas the following meanings

The rule matrix can be generated according to a timetable, the shifts i and j represent the shifts corresponding to the numbers i and j, the departure timetable has a total of N shifts, and the shift is numbered according to the time sequence for each shift in the timetable: 1,2, …, N;

the scheduling matrix Y belongs to {0,1}^N×NElement Y_i,jHas the following meanings

The initialization elements of the shift scheduling matrix are all 0, and the values of the elements are changed according to the strategy of each step.

The selectable position matrix Z e 0,1}^N×Nelement Z of the matrix_i,jHas the following meanings

The optional position matrix is initialized to Z ═ X, and the value of the optional position matrix is changed subsequently according to an execution strategy;

the Markov decision establishment process comprises the following steps: the Markov decision process consists of (S, A, R, π, G), where S represents the state space, A represents the action space, π_θRepresenting the strategy, and theta is a parameter of the strategy; by pi_θ(a | s) denotes in strategy π_θAnd the probability distribution of the action a under the state s, wherein R represents a return reward function, and G represents the return reward accumulated along with time;

defining a Markov decision process according to the task of the shift:

strategy pi_θThe method specifically comprises the following steps: strategic neural network

And a state s: (X, Y, Z) is E.S

Action a: (i, j) ∈ A, and the execution process of the action a is as follows: at Y_i,jFill in 1 and set the ith row and jth column of Z to 0

Reward R (s, a):

score (Y) is a scoring function, score (Y) e

Representing a real number field, wherein the scoring function is used for evaluating the shift scheduling result;

step 2, training the shift scheduling strategy neural network:

obtaining an initialization State s₀Said initialization state s₀The initial values of a regular matrix, a scheduling matrix and an optional position matrix are obtained;

calculating the state s_tProbability distribution of corresponding actions pi_θ(a|s_t)：

The input to the strategic neural network is the state s_tI.e. the NxNx3 tensor of the three matrices, the output of the network being N²A vector of dimensions representing a selected position in the shift scheduling matrix, wherein t represents the t-th operation performed;

randomly selecting action a according to probability distribution_t；

Performing action a_tThen obtaining the state s_t+1；

Calculating a reward r of return_t＝R(s_t,a_t)；

Finish a execution_tThen obtain s_t+1If the state action a_tCorresponding to Z_i,jIf the value is 0, exiting; if a is executed_tThen, Z becomes all 0, and quitting; otherwise, returning to the step: calculating the state s_t+1Probability distribution of corresponding actions pi_θ(a|s_t+1)

From this, the track tau of the shift is obtained

τ＝s₀,a₀,r₀,s₁,a₁,r₁,…,s_T,a_T,r_T

And updating parameters of the strategy neural network according to the objective function and the strategy gradient of the reinforcement learning to obtain the bus scheduling model.

Preferably, the shift j may be executed by the same vehicle after the shift i is executed, specifically: the departure time of the shift j is within 10-40 min after the arrival time of the shift i.

Preferably, the scoring function score (Y) is

Where alpha and beta are hyper-parameters for controlling the ratio.

Preferably, the objective function is:

the strategy gradient is:

the method for updating the parameters of the strategy neural network comprises the following steps:

further, the method also comprises a step 3 of using the model trained in the step 2 to carry out scheduling, wherein the action selected in each step is a_t＝maxπ_θ(a|s_t) And finally obtaining a scheduling matrix Y to obtain a scheduling result.

Preferably, the method for generating the departure schedule includes:

acquiring historical passenger flow data of buses, wherein the historical passenger flow data comprises the number of passengers getting on the buses and the getting-on time, the number of passengers getting off the buses and the getting-off time of each bus stop;

acquiring n-day history passenger flow data of previous continuous same type dates, and aggregating the historical passenger flow data of each day according to the time of every Q min to obtain the average passenger flow of each day at the time interval of every Q min, wherein the same type dates refer to the same working day or the same holiday;

dividing the m average passenger flows into h time periods according to passenger flow characteristics, and calculating departure interval delta t of each time period_i，i∈{1,2,…,h}

And obtaining a departure schedule according to the departure interval.

In a second aspect, a design device of a bus shift scheduling model based on deep reinforcement learning is provided, and specifically comprises a design module and a training module

The design module is used for executing the step 1 of the design method of the bus shift scheduling model based on the deep reinforcement learning in any one of all possible implementation methods;

the training module is used for executing the step 2 of the design method of the bus shift scheduling model based on the deep reinforcement learning in any one of all possible implementation methods.

Preferably, the device further comprises a scheduling module, wherein the scheduling module is used for executing the step 3 of the design method of the bus scheduling model based on the deep reinforcement learning in any one of all possible implementation methods.

In a third aspect, the embodiment of the present disclosure provides a design system of a bus shift scheduling model based on deep reinforcement learning, and the system includes any one of the above design devices of a bus shift scheduling model based on deep reinforcement learning.

Compared with the prior art, one of the technical schemes has the following beneficial effects:

scheduling the departure schedule by using a deep reinforcement learning method, establishing a scheduling mathematical model, parameterizing related information, and scheduling aiming at different cities only by adjusting parameters; improves the operation efficiency of the public transport, reduces the operation cost of the public transport, can continuously adjust the schedule of the public transport according to the passenger flow in the previous period of time,

drawings

Fig. 1 is a structural diagram of a strategic neural network provided in an embodiment of the present disclosure.

Fig. 2 is a scheduling result of a bus scheduling model based on deep reinforcement learning according to an embodiment of the present disclosure.

Detailed Description

In order to clarify the technical solution and the working principle of the present invention, the embodiments of the present disclosure will be described in further detail with reference to the accompanying drawings.

All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

The terms "step 1," "step 2," "step 3," and the like in the description and claims of this application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.

In this embodiment, the departure timetable includes, but is not limited to, a bus departure timetable of a bus company, and also includes an enterprise regular bus departure timetable, a subway departure timetable, and the like, which adopt a similar operation mode with a bus.

In a first aspect: the embodiment of the disclosure provides a design method of a bus shift scheduling model based on deep reinforcement learning, and fig. 1 is a mechanism diagram of a strategic neural network provided by the embodiment of the disclosure, and in combination with the diagram, the method mainly comprises the following steps:

step 1, generating three matrixes X, Y and Z according to a departure schedule; establishing a Markov decision process;

converting the scheduling problem into the operation of three matrixes X, Y and Z, wherein the horizontal and vertical titles of the matrixes correspond to the departure time of a departure schedule arranged according to the time sequence, and the three matrixes are defined as follows:

preferably, said shift j may be executed by the same vehicle after execution of shift i,the method specifically comprises the following steps: the departure time of the shift j is within a certain time range (for example, 10-40 min) after the arrival time of the shift i; (for example, if the departure time of a vehicle after the 1 st shift is 07:00, the arrival time of the vehicle at the destination is 08:00, then 3 shifts of 08:08, 08:16, 08:24 (6 th, 7 th, 8 shifts, respectively) with departure time between 08:00-08:30, then R is_1,6、R_1,7、R_1,8All 1, i.e. after a vehicle has performed the ith shift, there are several possible outcomes for the next shift performed (jth shift).

The initialization elements of the shift scheduling matrix are all 0, and the values of the elements are changed according to the strategy of each step. (the execution of shift j by the same vehicle after the execution of shift i is a selection that is actually executable)

The selectable position matrix Z is an element of {0,1}^N×NElement Z of the matrix_i,jHas the following meanings

Initializing a selectable position matrix, namely Z-X, and subsequently changing the value of the selectable position matrix according to each step of strategy;

the scheduling problem is converted into that under the constraint of a regular matrix X, an optional position matrix Z is used as the constraint in each step, a position is generated in each step to change a typesetting matrix Y, and finally a scheduling table is generated through Y; according to the definition of three matrixes, when all the matrixes Z are 0, Y is one shift;

preferably, the method for generating the departure schedule includes:

acquiring n-day history passenger flow data of previous continuous same type dates, and aggregating the historical passenger flow data of each day according to the time of every Q min to obtain the average passenger flow of each day at the time interval of every Q min, wherein the same type dates refer to the same working day or the same holiday; for example: aggregating the passenger flow data of each day according to the proportion of 6:00-6:30, 6:30-7:00 … … every half hour, then calculating the average value of the aggregated passenger flow in 8 continuous Mondays according to the number of days to obtain the average passenger flow of each half hour, and obtaining m average passenger flows in one day; regarding the value of the time interval of Q, if the time interval is too small, the randomness of the traffic will increase, and the accuracy of the traffic prediction will decrease, and if the time interval is too large, for example, the traffic prediction is performed every 2h, so that there are only 12 traffic prediction values in a day, and it is difficult and not reasonable to apply the 12 traffic prediction values in the above 7 time periods.

Dividing the m average passenger flows into h time periods according to passenger flow characteristics, and calculating departure interval delta t of each time period_iI belongs to {1,2, …, h } such as the number of people in a single bus core is 60, and the expected real load rate is 0.6

Obtaining a departure schedule according to the departure interval

defining a Markov decision process according to the task of the shift:

strategy pi_θThe method specifically comprises the following steps: the strategic neural network, the structure of which is shown in figure 2,

and a state s: (X, Y, Z) is E.S

Reward R (s, a):

score (Y) is a scoring function, score (Y) e

Representing a real number field, the scoring function is used for evaluating the shift result

Preferably, the scoring function score (Y) is

Wherein alpha and beta are hyper-parameters for controlling the ratio;

step 2, training the shift scheduling strategy neural network:

1. obtaining an initialization State s₀Said initialization state s₀The initial values of three matrixes of a regular matrix, a scheduling matrix and an optional position matrix

2. Calculating the state s_tProbability distribution of corresponding actions pi_θ(a|s_t) The input to the strategic neural network is state s_tI.e. the NxNx3 tensor of the three matrices, the output of the network being N²A vector of dimensions representing a selected position in the shift scheduling matrix, wherein t represents the t-th operation performed;

3. randomly selecting action a according to probability distribution_t

4. Performing action a_tThen obtaining the state s_t+1

5. Calculating a reward r of return_t＝R(s_t,a_t)

6. If the status is action a_tCorresponding to Z_i,jIf the value is 0, exiting; if a is executed_tThen, Z becomes all 0, and quitting; otherwise go to 2

From this, the track tau of the shift is obtained

τ＝s₀,a₀,r₀,s₁,a₁,r₁,…,s_T,a_T,r_T

7. And updating parameters of the strategy neural network according to the objective function and the strategy gradient of the reinforcement learning to obtain the bus scheduling model.

Preferably, the objective function is:

the strategy gradient is

The parameter updating mode is

Preferably, the method also comprises the step of scheduling by using the model trained in the step 2, wherein the action selected in each step is

a_t＝maxπ_θ(a|s_t)

Finally obtaining a scheduling matrix Y, namely obtaining a scheduling result, wherein the scheduling result is shown in figure 2, all 0 in one row represents the corresponding shift of the row and is the first shift of the vehicle executed by the row; one of the behaviors 0 represents the shift corresponding to the current behavior and is the last shift of the vehicle executed by the behavior 0.

The second aspect provides a design device of a bus scheduling model based on deep reinforcement learning, and based on the same technical concept, the device can execute the process of the design method of the bus scheduling model based on deep reinforcement learning; the device specifically comprises a design module and a training module

It should be noted that, when the design apparatus for a bus shift scheduling model based on deep reinforcement learning provided in the foregoing embodiment executes a design method for a bus shift scheduling model based on deep reinforcement learning, the division of the functional modules is merely illustrated, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the embodiment of the design device of the bus scheduling model based on the deep reinforcement learning and the embodiment of the design method of the bus scheduling model based on the deep reinforcement learning belong to the same concept, and the specific implementation process is detailed in the method embodiment and is not described herein again.

In a third aspect, the embodiment of the present disclosure provides a design system of a bus shift scheduling model based on deep reinforcement learning, and the system includes any one of the above design devices of the bus shift scheduling model based on deep reinforcement learning.

The invention has been described above by way of example with reference to the accompanying drawings, it being understood that the invention is not limited to the specific embodiments described above, but is capable of numerous insubstantial modifications when implemented in accordance with the principles and solutions of the present invention; or directly apply the conception and the technical scheme of the invention to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.

Claims

1. A design method of a bus shift scheduling model based on deep reinforcement learning is characterized by comprising the following steps:

Initializing elements of the scheduling matrix are all 0, and changing the value of the elements according to each step of strategy;

the Markov decision process is as follows: the Markov decision process consists of (S, A, R, π, G), where S represents the state space, A represents the action space, π_θRepresenting the strategy, and theta is a parameter of the strategy; by pi_θ(a | s) denotes in strategy π_θAnd the probability distribution of the action a under the state s, wherein R represents a return reward function, and G represents the return reward accumulated along with time;

defining a Markov decision process according to the task of the shift:

And a state s: (X, Y, Z) is E.S

Reward R (s, a):

the score (Y) is a function of the score,

step 2, training the shift scheduling strategy neural network:

randomly selecting action a according to probability distribution_t；

Performing action a_tThen obtaining the state s_t+1；

Calculating a reward r of return_t＝R(s_t,a_t)；

Finish a execution_tThen obtain s_t+1If the state action a_tCorresponding to Z_i,j0, then quit(ii) a If a is executed_tThen, Z becomes all 0, and quitting; otherwise, returning to the step: calculating the state s_t+1Probability distribution of corresponding actions pi_θ(a|s_t+1)

From this, the track tau of the shift is obtained

τ＝s₀,a₀,r₀,s₁,a₁,r₁,…,s_T,a_T,r_T

Updating the parameters of the strategy neural network according to the objective function and the strategy gradient of the reinforcement learning,

and obtaining the bus shift scheduling model.

2. The design method of the bus scheduling model based on the deep reinforcement learning as claimed in claim 1, wherein the shift j can be executed by the same vehicle after the shift i is executed, specifically: the departure time of the shift j is within 10-40 min after the arrival time of the shift i.

3. The method as claimed in claim 1, wherein the score function score (y) is

Where alpha and beta are hyper-parameters for controlling the ratio.

4. The design method of the bus shift scheduling model based on the deep reinforcement learning as claimed in claim 1,

the objective function is:

the strategy gradient is:

5. the design method of the bus scheduling model based on the deep reinforcement learning as claimed in any one of claims 1 to 4, further comprising a step 3 of scheduling by using the model trained in the step 2, wherein the action selected in each step is

a_t＝maxπ_θ(a|s_t)

Finally obtaining a scheduling matrix Y, and obtaining a scheduling result.

6. The design method of the bus scheduling model based on the deep reinforcement learning as claimed in any one of claims 1 to 5, wherein the generation method of the departure schedule is as follows:

And obtaining a departure schedule according to the departure interval.

7. A design device of a bus shift scheduling model based on deep reinforcement learning is characterized by comprising a design module and a training module

The design module is used for executing the step 1 of the design method of the bus shift scheduling model based on deep reinforcement learning of any one of claims 1 to 6;

the training module is used for executing the step 2 of the design method of the bus shift scheduling model based on deep reinforcement learning in any one of claims 1 to 6.

8. The device for designing the bus scheduling model based on the deep reinforcement learning as claimed in claim 7, further comprising a scheduling module, wherein the scheduling module is used for executing the step 3 of the method for designing the bus scheduling model based on the deep reinforcement learning as claimed in any one of claims 5 to 6.

9. A design system of a bus shift scheduling model based on deep reinforcement learning is characterized by comprising a design device of the bus shift scheduling model based on deep reinforcement learning as claimed in any one of claims 7-8.