CN113033928A - Design method, device and system of bus shift scheduling model based on deep reinforcement learning - Google Patents

Design method, device and system of bus shift scheduling model based on deep reinforcement learning Download PDF

Info

Publication number
CN113033928A
CN113033928A CN201911253753.2A CN201911253753A CN113033928A CN 113033928 A CN113033928 A CN 113033928A CN 201911253753 A CN201911253753 A CN 201911253753A CN 113033928 A CN113033928 A CN 113033928A
Authority
CN
China
Prior art keywords
scheduling
shift
matrix
reinforcement learning
bus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911253753.2A
Other languages
Chinese (zh)
Other versions
CN113033928B (en
Inventor
王乾宇
周金明
赵丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Xingzheyi Intelligent Transportation Technology Co ltd
Original Assignee
Nanjing Xingzheyi Intelligent Transportation Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Xingzheyi Intelligent Transportation Technology Co ltd filed Critical Nanjing Xingzheyi Intelligent Transportation Technology Co ltd
Priority to CN201911253753.2A priority Critical patent/CN113033928B/en
Publication of CN113033928A publication Critical patent/CN113033928A/en
Application granted granted Critical
Publication of CN113033928B publication Critical patent/CN113033928B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a design method of a bus scheduling model based on deep reinforcement learning, which comprises the following steps of 1, converting a scheduling process into a Markov decision process, 2, solving the Markov decision process, 3, scheduling according to a solving result, scheduling a departure timetable by using the deep reinforcement learning method, establishing a scheduling mathematical model, parameterizing related information, and scheduling aiming at different cities only by adjusting parameters; the bus operation efficiency is improved, and the bus operation cost is reduced.

Description

Design method, device and system of bus shift scheduling model based on deep reinforcement learning
Technical Field
The invention relates to the field of intelligent transportation and deep learning research, in particular to a method, a device and a system for designing a bus scheduling model based on deep reinforcement learning, and belongs to the field of intelligent bus scheduling and scheduling.
Background
With the continuous improvement of the motorization level of China, the construction and development of urban infrastructure are rapid, the urban area is continuously expanded, the public transportation construction of a city is more and more comprehensive, however, with the enlargement of the public transportation scale, the public transportation scheduling becomes more and more difficult, and the intelligent scheduling method plays a crucial role in efficiently and reasonably providing the public transportation resource, and is beneficial to more efficiently utilizing the public transportation resource and providing higher-quality public transportation service. In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: the traditional public transport scheduling in China mainly depends on manual scheduling and on the experience of scheduling personnel, so that the efficiency is low and the rationality of the scheduling cannot be ensured; the existing scheduling method has low efficiency, and the next scheduling is often performed after one scheduling, so that the method cannot flexibly cope with the constantly changing passenger flow.
Disclosure of Invention
In order to overcome the defects of the prior art, the embodiment of the disclosure provides a design method, a device and a system of a bus scheduling model based on deep reinforcement learning, which greatly improve the scheduling efficiency, and the technical scheme is as follows:
in a first aspect, a design method of a bus shift scheduling model based on deep reinforcement learning is provided, and the method includes:
step 1, generating three matrixes according to a departure schedule: a regular matrix X, a scheduling matrix Y and an optional position matrix Z; establishing a Markov decision process;
the rule matrix X belongs to {0,1}N×NElement X of a regular matrixi,jHas the following meanings
Figure BDA0002309738970000011
The rule matrix can be generated according to a timetable, the shifts i and j represent the shifts corresponding to the numbers i and j, the departure timetable has a total of N shifts, and the shift is numbered according to the time sequence for each shift in the timetable: 1,2, …, N;
the scheduling matrix Y belongs to {0,1}N×NElement Yi,jHas the following meanings
Figure BDA0002309738970000021
The initialization elements of the shift scheduling matrix are all 0, and the values of the elements are changed according to the strategy of each step.
The selectable position matrix Z e 0,1}N×Nelement Z of the matrixi,jHas the following meanings
Figure BDA0002309738970000022
The optional position matrix is initialized to Z ═ X, and the value of the optional position matrix is changed subsequently according to an execution strategy;
the Markov decision establishment process comprises the following steps: the Markov decision process consists of (S, A, R, π, G), where S represents the state space, A represents the action space, πθRepresenting the strategy, and theta is a parameter of the strategy; by piθ(a | s) denotes in strategy πθAnd the probability distribution of the action a under the state s, wherein R represents a return reward function, and G represents the return reward accumulated along with time;
defining a Markov decision process according to the task of the shift:
strategy piθThe method specifically comprises the following steps: strategic neural network
And a state s: (X, Y, Z) is E.S
Action a: (i, j) ∈ A, and the execution process of the action a is as follows: at Yi,jFill in 1 and set the ith row and jth column of Z to 0
Reward R (s, a):
Figure BDA0002309738970000023
score (Y) is a scoring function, score (Y) e
Figure BDA0002309738970000024
Representing a real number field, wherein the scoring function is used for evaluating the shift scheduling result;
step 2, training the shift scheduling strategy neural network:
obtaining an initialization State s0Said initialization state s0The initial values of a regular matrix, a scheduling matrix and an optional position matrix are obtained;
calculating the state stProbability distribution of corresponding actions piθ(a|st):
The input to the strategic neural network is the state stI.e. the NxNx3 tensor of the three matrices, the output of the network being N2A vector of dimensions representing a selected position in the shift scheduling matrix, wherein t represents the t-th operation performed;
randomly selecting action a according to probability distributiont
Performing action atThen obtaining the state st+1
Calculating a reward r of returnt=R(st,at);
Finish a executiontThen obtain st+1If the state action atCorresponding to Zi,jIf the value is 0, exiting; if a is executedtThen, Z becomes all 0, and quitting; otherwise, returning to the step: calculating the state st+1Probability distribution of corresponding actions piθ(a|st+1)
From this, the track tau of the shift is obtained
τ=s0,a0,r0,s1,a1,r1,…,sT,aT,rT
And updating parameters of the strategy neural network according to the objective function and the strategy gradient of the reinforcement learning to obtain the bus scheduling model.
Preferably, the shift j may be executed by the same vehicle after the shift i is executed, specifically: the departure time of the shift j is within 10-40 min after the arrival time of the shift i.
Preferably, the scoring function score (Y) is
Figure BDA0002309738970000031
Where alpha and beta are hyper-parameters for controlling the ratio.
Preferably, the objective function is:
Figure BDA0002309738970000032
the strategy gradient is:
Figure BDA0002309738970000033
the method for updating the parameters of the strategy neural network comprises the following steps:
Figure BDA0002309738970000034
further, the method also comprises a step 3 of using the model trained in the step 2 to carry out scheduling, wherein the action selected in each step is at=maxπθ(a|st) And finally obtaining a scheduling matrix Y to obtain a scheduling result.
Preferably, the method for generating the departure schedule includes:
acquiring historical passenger flow data of buses, wherein the historical passenger flow data comprises the number of passengers getting on the buses and the getting-on time, the number of passengers getting off the buses and the getting-off time of each bus stop;
acquiring n-day history passenger flow data of previous continuous same type dates, and aggregating the historical passenger flow data of each day according to the time of every Q min to obtain the average passenger flow of each day at the time interval of every Q min, wherein the same type dates refer to the same working day or the same holiday;
dividing the m average passenger flows into h time periods according to passenger flow characteristics, and calculating departure interval delta t of each time periodi,i∈{1,2,…,h}
Figure BDA0002309738970000041
Figure BDA0002309738970000042
And obtaining a departure schedule according to the departure interval.
In a second aspect, a design device of a bus shift scheduling model based on deep reinforcement learning is provided, and specifically comprises a design module and a training module
The design module is used for executing the step 1 of the design method of the bus shift scheduling model based on the deep reinforcement learning in any one of all possible implementation methods;
the training module is used for executing the step 2 of the design method of the bus shift scheduling model based on the deep reinforcement learning in any one of all possible implementation methods.
Preferably, the device further comprises a scheduling module, wherein the scheduling module is used for executing the step 3 of the design method of the bus scheduling model based on the deep reinforcement learning in any one of all possible implementation methods.
In a third aspect, the embodiment of the present disclosure provides a design system of a bus shift scheduling model based on deep reinforcement learning, and the system includes any one of the above design devices of a bus shift scheduling model based on deep reinforcement learning.
Compared with the prior art, one of the technical schemes has the following beneficial effects:
scheduling the departure schedule by using a deep reinforcement learning method, establishing a scheduling mathematical model, parameterizing related information, and scheduling aiming at different cities only by adjusting parameters; improves the operation efficiency of the public transport, reduces the operation cost of the public transport, can continuously adjust the schedule of the public transport according to the passenger flow in the previous period of time,
drawings
Fig. 1 is a structural diagram of a strategic neural network provided in an embodiment of the present disclosure.
Fig. 2 is a scheduling result of a bus scheduling model based on deep reinforcement learning according to an embodiment of the present disclosure.
Detailed Description
In order to clarify the technical solution and the working principle of the present invention, the embodiments of the present disclosure will be described in further detail with reference to the accompanying drawings.
All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
The terms "step 1," "step 2," "step 3," and the like in the description and claims of this application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.
In this embodiment, the departure timetable includes, but is not limited to, a bus departure timetable of a bus company, and also includes an enterprise regular bus departure timetable, a subway departure timetable, and the like, which adopt a similar operation mode with a bus.
In a first aspect: the embodiment of the disclosure provides a design method of a bus shift scheduling model based on deep reinforcement learning, and fig. 1 is a mechanism diagram of a strategic neural network provided by the embodiment of the disclosure, and in combination with the diagram, the method mainly comprises the following steps:
step 1, generating three matrixes X, Y and Z according to a departure schedule; establishing a Markov decision process;
converting the scheduling problem into the operation of three matrixes X, Y and Z, wherein the horizontal and vertical titles of the matrixes correspond to the departure time of a departure schedule arranged according to the time sequence, and the three matrixes are defined as follows:
the rule matrix X belongs to {0,1}N×NElement X of a regular matrixi,jHas the following meanings
Figure BDA0002309738970000051
The rule matrix can be generated according to a timetable, the shifts i and j represent the shifts corresponding to the numbers i and j, the departure timetable has a total of N shifts, and the shift is numbered according to the time sequence for each shift in the timetable: 1,2, …, N;
preferably, said shift j may be executed by the same vehicle after execution of shift i,the method specifically comprises the following steps: the departure time of the shift j is within a certain time range (for example, 10-40 min) after the arrival time of the shift i; (for example, if the departure time of a vehicle after the 1 st shift is 07:00, the arrival time of the vehicle at the destination is 08:00, then 3 shifts of 08:08, 08:16, 08:24 (6 th, 7 th, 8 shifts, respectively) with departure time between 08:00-08:30, then R is1,6、R1,7、R1,8All 1, i.e. after a vehicle has performed the ith shift, there are several possible outcomes for the next shift performed (jth shift).
The scheduling matrix Y belongs to {0,1}N×NElement Yi,jHas the following meanings
Figure BDA0002309738970000061
The initialization elements of the shift scheduling matrix are all 0, and the values of the elements are changed according to the strategy of each step. (the execution of shift j by the same vehicle after the execution of shift i is a selection that is actually executable)
The selectable position matrix Z is an element of {0,1}N×NElement Z of the matrixi,jHas the following meanings
Figure BDA0002309738970000062
Initializing a selectable position matrix, namely Z-X, and subsequently changing the value of the selectable position matrix according to each step of strategy;
the scheduling problem is converted into that under the constraint of a regular matrix X, an optional position matrix Z is used as the constraint in each step, a position is generated in each step to change a typesetting matrix Y, and finally a scheduling table is generated through Y; according to the definition of three matrixes, when all the matrixes Z are 0, Y is one shift;
preferably, the method for generating the departure schedule includes:
acquiring historical passenger flow data of buses, wherein the historical passenger flow data comprises the number of passengers getting on the buses and the getting-on time, the number of passengers getting off the buses and the getting-off time of each bus stop;
acquiring n-day history passenger flow data of previous continuous same type dates, and aggregating the historical passenger flow data of each day according to the time of every Q min to obtain the average passenger flow of each day at the time interval of every Q min, wherein the same type dates refer to the same working day or the same holiday; for example: aggregating the passenger flow data of each day according to the proportion of 6:00-6:30, 6:30-7:00 … … every half hour, then calculating the average value of the aggregated passenger flow in 8 continuous Mondays according to the number of days to obtain the average passenger flow of each half hour, and obtaining m average passenger flows in one day; regarding the value of the time interval of Q, if the time interval is too small, the randomness of the traffic will increase, and the accuracy of the traffic prediction will decrease, and if the time interval is too large, for example, the traffic prediction is performed every 2h, so that there are only 12 traffic prediction values in a day, and it is difficult and not reasonable to apply the 12 traffic prediction values in the above 7 time periods.
Dividing the m average passenger flows into h time periods according to passenger flow characteristics, and calculating departure interval delta t of each time periodiI belongs to {1,2, …, h } such as the number of people in a single bus core is 60, and the expected real load rate is 0.6
Figure BDA0002309738970000063
Figure BDA0002309738970000071
Obtaining a departure schedule according to the departure interval
The Markov decision establishment process comprises the following steps: the Markov decision process consists of (S, A, R, π, G), where S represents the state space, A represents the action space, πθRepresenting the strategy, and theta is a parameter of the strategy; by piθ(a | s) denotes in strategy πθAnd the probability distribution of the action a under the state s, wherein R represents a return reward function, and G represents the return reward accumulated along with time;
defining a Markov decision process according to the task of the shift:
strategy piθThe method specifically comprises the following steps: the strategic neural network, the structure of which is shown in figure 2,
and a state s: (X, Y, Z) is E.S
Action a: (i, j) ∈ A, and the execution process of the action a is as follows: at Yi,jFill in 1 and set the ith row and jth column of Z to 0
Reward R (s, a):
Figure BDA0002309738970000072
score (Y) is a scoring function, score (Y) e
Figure BDA0002309738970000073
Representing a real number field, the scoring function is used for evaluating the shift result
Preferably, the scoring function score (Y) is
Figure BDA0002309738970000074
Wherein alpha and beta are hyper-parameters for controlling the ratio;
step 2, training the shift scheduling strategy neural network:
1. obtaining an initialization State s0Said initialization state s0The initial values of three matrixes of a regular matrix, a scheduling matrix and an optional position matrix
2. Calculating the state stProbability distribution of corresponding actions piθ(a|st) The input to the strategic neural network is state stI.e. the NxNx3 tensor of the three matrices, the output of the network being N2A vector of dimensions representing a selected position in the shift scheduling matrix, wherein t represents the t-th operation performed;
3. randomly selecting action a according to probability distributiont
4. Performing action atThen obtaining the state st+1
5. Calculating a reward r of returnt=R(st,at)
6. If the status is action atCorresponding to Zi,jIf the value is 0, exiting; if a is executedtThen, Z becomes all 0, and quitting; otherwise go to 2
From this, the track tau of the shift is obtained
τ=s0,a0,r0,s1,a1,r1,…,sT,aT,rT
7. And updating parameters of the strategy neural network according to the objective function and the strategy gradient of the reinforcement learning to obtain the bus scheduling model.
Preferably, the objective function is:
Figure BDA0002309738970000081
the strategy gradient is
Figure BDA0002309738970000082
The parameter updating mode is
Figure BDA0002309738970000083
Preferably, the method also comprises the step of scheduling by using the model trained in the step 2, wherein the action selected in each step is
at=maxπθ(a|st)
Finally obtaining a scheduling matrix Y, namely obtaining a scheduling result, wherein the scheduling result is shown in figure 2, all 0 in one row represents the corresponding shift of the row and is the first shift of the vehicle executed by the row; one of the behaviors 0 represents the shift corresponding to the current behavior and is the last shift of the vehicle executed by the behavior 0.
The second aspect provides a design device of a bus scheduling model based on deep reinforcement learning, and based on the same technical concept, the device can execute the process of the design method of the bus scheduling model based on deep reinforcement learning; the device specifically comprises a design module and a training module
The design module is used for executing the step 1 of the design method of the bus shift scheduling model based on the deep reinforcement learning in any one of all possible implementation methods;
the training module is used for executing the step 2 of the design method of the bus shift scheduling model based on the deep reinforcement learning in any one of all possible implementation methods.
It should be noted that, when the design apparatus for a bus shift scheduling model based on deep reinforcement learning provided in the foregoing embodiment executes a design method for a bus shift scheduling model based on deep reinforcement learning, the division of the functional modules is merely illustrated, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the embodiment of the design device of the bus scheduling model based on the deep reinforcement learning and the embodiment of the design method of the bus scheduling model based on the deep reinforcement learning belong to the same concept, and the specific implementation process is detailed in the method embodiment and is not described herein again.
In a third aspect, the embodiment of the present disclosure provides a design system of a bus shift scheduling model based on deep reinforcement learning, and the system includes any one of the above design devices of the bus shift scheduling model based on deep reinforcement learning.
The invention has been described above by way of example with reference to the accompanying drawings, it being understood that the invention is not limited to the specific embodiments described above, but is capable of numerous insubstantial modifications when implemented in accordance with the principles and solutions of the present invention; or directly apply the conception and the technical scheme of the invention to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.

Claims (9)

1. A design method of a bus shift scheduling model based on deep reinforcement learning is characterized by comprising the following steps:
step 1, generating three matrixes according to a departure schedule: a regular matrix X, a scheduling matrix Y and an optional position matrix Z; establishing a Markov decision process;
the rule matrix X belongs to {0,1}N×NElement X of a regular matrixi,jHas the following meanings
Figure FDA0002309738960000011
The rule matrix can be generated according to a timetable, the shifts i and j represent the shifts corresponding to the numbers i and j, the departure timetable has a total of N shifts, and the shift is numbered according to the time sequence for each shift in the timetable: 1,2, …, N;
the scheduling matrix Y belongs to {0,1}N×NElement Yi,jHas the following meanings
Figure FDA0002309738960000012
Initializing elements of the scheduling matrix are all 0, and changing the value of the elements according to each step of strategy;
the selectable position matrix Z is an element of {0,1}N×NElement Z of the matrixi,jHas the following meanings
Figure FDA0002309738960000013
The optional position matrix is initialized to Z ═ X, and the value of the optional position matrix is changed subsequently according to an execution strategy;
the Markov decision process is as follows: the Markov decision process consists of (S, A, R, π, G), where S represents the state space, A represents the action space, πθRepresenting the strategy, and theta is a parameter of the strategy; by piθ(a | s) denotes in strategy πθAnd the probability distribution of the action a under the state s, wherein R represents a return reward function, and G represents the return reward accumulated along with time;
defining a Markov decision process according to the task of the shift:
strategy piθThe method specifically comprises the following steps: strategic neural network
And a state s: (X, Y, Z) is E.S
Action a: (i, j) ∈ A, and the execution process of the action a is as follows: at Yi,jFill in 1 and set the ith row and jth column of Z to 0
Reward R (s, a):
Figure FDA0002309738960000021
the score (Y) is a function of the score,
Figure FDA0002309738960000023
Figure FDA0002309738960000024
representing a real number field, wherein the scoring function is used for evaluating the shift scheduling result;
step 2, training the shift scheduling strategy neural network:
obtaining an initialization State s0Said initialization state s0The initial values of a regular matrix, a scheduling matrix and an optional position matrix are obtained;
calculating the state stProbability distribution of corresponding actions piθ(a|st):
The input to the strategic neural network is the state stI.e. the NxNx3 tensor of the three matrices, the output of the network being N2A vector of dimensions representing a selected position in the shift scheduling matrix, wherein t represents the t-th operation performed;
randomly selecting action a according to probability distributiont
Performing action atThen obtaining the state st+1
Calculating a reward r of returnt=R(st,at);
Finish a executiontThen obtain st+1If the state action atCorresponding to Zi,j0, then quit(ii) a If a is executedtThen, Z becomes all 0, and quitting; otherwise, returning to the step: calculating the state st+1Probability distribution of corresponding actions piθ(a|st+1)
From this, the track tau of the shift is obtained
τ=s0,a0,r0,s1,a1,r1,…,sT,aT,rT
Updating the parameters of the strategy neural network according to the objective function and the strategy gradient of the reinforcement learning,
and obtaining the bus shift scheduling model.
2. The design method of the bus scheduling model based on the deep reinforcement learning as claimed in claim 1, wherein the shift j can be executed by the same vehicle after the shift i is executed, specifically: the departure time of the shift j is within 10-40 min after the arrival time of the shift i.
3. The method as claimed in claim 1, wherein the score function score (y) is
Figure FDA0002309738960000022
Where alpha and beta are hyper-parameters for controlling the ratio.
4. The design method of the bus shift scheduling model based on the deep reinforcement learning as claimed in claim 1,
the objective function is:
Figure FDA0002309738960000031
the strategy gradient is:
Figure FDA0002309738960000032
the method for updating the parameters of the strategy neural network comprises the following steps:
Figure FDA0002309738960000033
5. the design method of the bus scheduling model based on the deep reinforcement learning as claimed in any one of claims 1 to 4, further comprising a step 3 of scheduling by using the model trained in the step 2, wherein the action selected in each step is
at=maxπθ(a|st)
Finally obtaining a scheduling matrix Y, and obtaining a scheduling result.
6. The design method of the bus scheduling model based on the deep reinforcement learning as claimed in any one of claims 1 to 5, wherein the generation method of the departure schedule is as follows:
acquiring historical passenger flow data of buses, wherein the historical passenger flow data comprises the number of passengers getting on the buses and the getting-on time, the number of passengers getting off the buses and the getting-off time of each bus stop;
acquiring n-day history passenger flow data of previous continuous same type dates, and aggregating the historical passenger flow data of each day according to the time of every Q min to obtain the average passenger flow of each day at the time interval of every Q min, wherein the same type dates refer to the same working day or the same holiday;
dividing the m average passenger flows into h time periods according to passenger flow characteristics, and calculating departure interval delta t of each time periodi,i∈{1,2,…,h}
Figure FDA0002309738960000034
Figure FDA0002309738960000035
And obtaining a departure schedule according to the departure interval.
7. A design device of a bus shift scheduling model based on deep reinforcement learning is characterized by comprising a design module and a training module
The design module is used for executing the step 1 of the design method of the bus shift scheduling model based on deep reinforcement learning of any one of claims 1 to 6;
the training module is used for executing the step 2 of the design method of the bus shift scheduling model based on deep reinforcement learning in any one of claims 1 to 6.
8. The device for designing the bus scheduling model based on the deep reinforcement learning as claimed in claim 7, further comprising a scheduling module, wherein the scheduling module is used for executing the step 3 of the method for designing the bus scheduling model based on the deep reinforcement learning as claimed in any one of claims 5 to 6.
9. A design system of a bus shift scheduling model based on deep reinforcement learning is characterized by comprising a design device of the bus shift scheduling model based on deep reinforcement learning as claimed in any one of claims 7-8.
CN201911253753.2A 2019-12-09 2019-12-09 Method, device and system for designing bus shift model based on deep reinforcement learning Active CN113033928B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911253753.2A CN113033928B (en) 2019-12-09 2019-12-09 Method, device and system for designing bus shift model based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911253753.2A CN113033928B (en) 2019-12-09 2019-12-09 Method, device and system for designing bus shift model based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113033928A true CN113033928A (en) 2021-06-25
CN113033928B CN113033928B (en) 2023-10-31

Family

ID=76451359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911253753.2A Active CN113033928B (en) 2019-12-09 2019-12-09 Method, device and system for designing bus shift model based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113033928B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114117883A (en) * 2021-09-15 2022-03-01 兰州理工大学 Self-adaptive rail transit scheduling method, system and terminal based on reinforcement learning
CN114781267A (en) * 2022-04-28 2022-07-22 ***通信集团浙江有限公司杭州分公司 Multi-source big data-based dynamic bus management method and system for stop and transfer
CN116704778A (en) * 2023-08-04 2023-09-05 创意(成都)数字科技有限公司 Intelligent traffic data processing method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881992A (en) * 2015-06-12 2015-09-02 天津大学 Urban public transport policy analysis platform based on multi-agent simulation
CN106228314A (en) * 2016-08-11 2016-12-14 电子科技大学 The workflow schedule method of study is strengthened based on the degree of depth
CN109166303A (en) * 2018-08-30 2019-01-08 北京航天控制仪器研究所 A kind of public transport is arranged an order according to class and grade the method and system of scheduling
CN110084505A (en) * 2019-04-22 2019-08-02 南京行者易智能交通科技有限公司 A kind of smart shift scheduling method and device based on passenger flow, mobile end equipment, server
EP3543918A1 (en) * 2018-03-20 2019-09-25 Flink AI GmbH Reinforcement learning method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881992A (en) * 2015-06-12 2015-09-02 天津大学 Urban public transport policy analysis platform based on multi-agent simulation
CN106228314A (en) * 2016-08-11 2016-12-14 电子科技大学 The workflow schedule method of study is strengthened based on the degree of depth
EP3543918A1 (en) * 2018-03-20 2019-09-25 Flink AI GmbH Reinforcement learning method
CN109166303A (en) * 2018-08-30 2019-01-08 北京航天控制仪器研究所 A kind of public transport is arranged an order according to class and grade the method and system of scheduling
CN110084505A (en) * 2019-04-22 2019-08-02 南京行者易智能交通科技有限公司 A kind of smart shift scheduling method and device based on passenger flow, mobile end equipment, server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王庆荣;朱昌盛;梁剑波;冯文熠;: "基于遗传算法的公交智能排班***应用研究", 计算机仿真, no. 03 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114117883A (en) * 2021-09-15 2022-03-01 兰州理工大学 Self-adaptive rail transit scheduling method, system and terminal based on reinforcement learning
CN114781267A (en) * 2022-04-28 2022-07-22 ***通信集团浙江有限公司杭州分公司 Multi-source big data-based dynamic bus management method and system for stop and transfer
CN114781267B (en) * 2022-04-28 2023-08-29 ***通信集团浙江有限公司杭州分公司 Multi-source big data-based job-living connection dynamic bus management method and system
CN116704778A (en) * 2023-08-04 2023-09-05 创意(成都)数字科技有限公司 Intelligent traffic data processing method, device, equipment and storage medium
CN116704778B (en) * 2023-08-04 2023-10-24 创意(成都)数字科技有限公司 Intelligent traffic data processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN113033928B (en) 2023-10-31

Similar Documents

Publication Publication Date Title
CN104504229B (en) A kind of intelligent public transportation dispatching method based on hybrid metaheuristics
CN113033928A (en) Design method, device and system of bus shift scheduling model based on deep reinforcement learning
El-Tantawy et al. Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC): methodology and large-scale application on downtown Toronto
Yang et al. A bi-objective timetable optimization model incorporating energy allocation and passenger assignment in an energy-regenerative metro system
CN102044149B (en) City bus operation coordinating method and device based on time variant passenger flows
Zhong et al. A differential evolution algorithm with dual populations for solving periodic railway timetable scheduling problem
Zhao et al. An integrated approach of train scheduling and rolling stock circulation with skip-stopping pattern for urban rail transit lines
Qin et al. Reinforcement learning for ridesharing: An extended survey
CN111105141B (en) Demand response type bus scheduling method
CN110114806A (en) Signalized control method, relevant device and system
Garrison et al. Travel, work, and telecommunications: a view of the electronics revolution and its potential impacts
CN107919014B (en) Taxi running route optimization method for multiple passenger mileage
Li et al. Deep learning based parking prediction on cloud platform
Chen et al. Real-time bus holding control on a transit corridor based on multi-agent reinforcement learning
CN107392389A (en) Taxi dispatching processing method based on ARIMA models
CN110211379A (en) A kind of public transport method for optimizing scheduling based on machine learning
CN112417753A (en) Urban public transport resource joint scheduling method
CN107464059A (en) A kind of public transport company based on historical information automates control method of arranging an order according to class and grade
CN115222251A (en) Network taxi appointment scheduling method based on hybrid layered reinforcement learning
CN117371611A (en) Subway train operation plan programming method, medium and system
CN112766605A (en) Multi-source passenger flow prediction system and method based on container cloud platform
CN115352502A (en) Train operation scheme adjusting method and device, electronic equipment and storage medium
CN115510664A (en) Instant delivery real-time cooperation scheduling system based on layered reinforcement learning
Hao et al. Timetabling for a congested urban rail transit network based on mixed logic dynamic model
CN114117883A (en) Self-adaptive rail transit scheduling method, system and terminal based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant