CN110348969A

CN110348969A - Taxi based on deep learning and big data analysis seeks objective policy recommendation method

Info

Publication number: CN110348969A
Application number: CN201910641328.4A
Authority: CN
Inventors: 王桐; 孙博; 张乐君; 李升波
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2019-07-16
Filing date: 2019-07-16
Publication date: 2019-10-18
Anticipated expiration: 2039-07-16
Also published as: CN110348969B

Abstract

The invention discloses the taxis based on deep learning and big data analysis to seek objective policy recommendation method, and the recommended method is the following steps are included: step 1: cleaning to taxi historical trajectory data；Step 2: taxi pickup point is extracted；Step 3: taxi hot spot is extracted；Step 4: passenger capacity prediction is carried out to hot spot；Step 5: taxi recommended models are proposed.Taxi caused by the present invention is mismatched for taxi in city and Customer information seeks objective difficult problem, it proposes one kind and the following passengers quantity is predicted using deep learning based on traffic big data, and with can the markov decision process of time-varying the technology of objective strategy of seeking is provided for taxi driver by Policy iteration, solve the problems, such as that existing matching mechanisms matching is difficult, increase the working efficiency of taxi, so that taxi income is more scientific, and passenger's chauffeur is also more easier.

Description

Taxi based on deep learning and big data analysis seeks objective policy recommendation method

Technical field

The invention belongs to taxis to seek objective distribution field, propose the taxi based on deep learning and big data analysis and seek Objective policy recommendation method.

Background technique

With the continuous improvement of national economic development level, motor vehicle occupies increasing ratio in urban transportation, And corresponding road area per capita is constantly in low-level state, exerts heavy pressures on to urban transportation.In addition, China is existing Having city road network is typically all that density is low, arterial highway spacing is excessive, branch is short, function is chaotic, belongs to the traffic system of low speed, difficult To adapt to the needs of Hyundai Motor traffic, the modern facilities of traffic control and management and traffic safety management are not able to satisfy reality Demand.The big or middle city in the whole nation is crowded in the prevalence of road, the phenomenon of congested with cars, traffic order confusion.Urban population swashs Increasing requires taxi trade to have to fast development, this has caused a series of problems.Taxi trade has obtained fastish Development, this has caused the problems such as some taxis match " hardly possible " with passenger, and many taxi drivers devote a tremendous amount of time searching Passenger, and passenger waits for quite a long time in certain places.Occur much for the uneven phenomenon of this matching Chauffeur software, this can also partially alleviate this problem.However, taxi driver drives after passenger getting off car there is no specific Direction, experienced taxi driver can tell the popular place of different periods；The insufficient driver of experience then can only be blindly It drives, finds passenger at random.For these experienced taxi drivers it is unconspicuous seek objective rule and be all hidden in be collected into Taxi historical data in.The present invention carries out mining analysis for these taxi historical datas, and it is unobvious to search out these Seek objective trend, to provide more reasonable carrying route for taxi driver.

Summary of the invention

The purpose of the present invention is to propose to the taxis based on deep learning and big data analysis to seek objective policy recommendation method, from The best angle for finding passenger's strategy is recommended to start with for the taxi of empty driving, the real time information and historical track number in conjunction with taxi According to the Generalization bounds for proposing carrying hot spot region, is provided for taxi driver and most preferably seek objective scheme, to solve existing matching machine The problem of system matching hardly possible, low efficiency.

The invention is realized by the following technical scheme: the taxi based on deep learning and big data analysis is sought objective strategy and is pushed away Recommend method, which is characterized in that the recommended method the following steps are included:

Step 1: taxi historical trajectory data is cleaned；

Step 2: taxi pickup point is extracted from the taxi historical trajectory data through over cleaning；

Step 3: taxi hot spot is extracted from the taxi pickup point extracted；

Step 4: passenger capacity prediction is carried out to taxi hot spot；

Step 5: taxi recommended models are proposed.

Further, in step 1, specifically, including: removal invalid data, removal nothing to historical trajectory data cleaning With field, go unless this city data, removal repeated data.

Further, include: in step 2

Step 2 one: taxi historical trajectory data is uploaded into Hadoop cluster；

Step 2 two: data are mapped to using the map function of Spark to go out by the historical trajectory data that load was cleaned Number of hiring a car is major key, other fields are the RDD object of value；

Step 2 three: the element with identical major key is pooled to one using the groupByKey method of Spark platform In set；

Step 2 four: it is ranked up using value of the sort method to element each in RDD according to time field；

Step 2 five: filter out taxi passenger carrying status by 0 change to 10000000 data.

Further, in step 3, hot spot is found using the DBSCAN clustering algorithm based on density, DBSCAN includes two A important parameter: minimum includes points minPts and sweep radius eps, and is respectively set minimum comprising points and scanning half Diameter.

Further, in step 4, specifically, being followed to the return value of the input addition itself of hidden layer neuron The output of ring neural network RNN, RNN hidden layer s and output layer o are as follows:

s_t=f (Ux_t+Ws_t-1), (1)

o_t=g (Vs_t), (2)

Wherein, t is the moment, and x is input layer, and s is hidden layer, and o is output layer, and matrix W is that the value of hidden layer last time is made For the weight of this input, U and V are weight matrix,

RNN is trained using error backpropagation algorithm, and the error amount of i-th layer of t moment is propagated in both directions:

One direction is to be transmitted to a layer network, this part is only related with weight matrix U；

Another direction is along timeline by being propagated recursively to initial time, this part is related with weight matrix W.

Further, Dropout layers are added on the basis of 3 layers of RNN.

Further, in step 5, specifically, making recommendation mould with improved Markovian decision process IMDP Type,

The Markovian decision process IMDP is expressed as five-tuple (S, A, a P_sa,γ,R_sa), wherein S is state set It closes；A={ is waited, be moved to next hot spot }, is a series of actions that driver can do in pickup point；P_saFor transition probability square Battle array, is obtained by data mining, is indicated in state S_iThe probability that movement a ∈ A is transferred to NextState is carried out in the case of ∈ S；γ ∈ (0,1) is discount factor；It is in state s_iReturn after taking behavior a ∈ A under ∈ S, after current hotspot makes a choice Cost be C_ij=E_ij, E_ijIt is vehicle from state s_iTo s_jIdle time, correspondingly, obtained return is denoted asWherein, X_jFor state s_jPrediction get on the bus number,

Taxi is from a state s in state set S₀Start, selects a movement a in behavior aggregate A₀Reach next shape State s₁, a movement a is equally selected in this case₁Arrival state s₂, this process is described as

Under the influence of the above process, defining Reward Program is

It is abbreviated as R (s₀)+γR(s₁)+γ²R(s₂)+…

Our target is the mean value E [R (s for making global Reward Program by selection strategy₀)+γR(s₁)+γ²R(s₂) + ...] reach maximum, herein, tactful π is the function that state is mapped to movement, i.e. π: S → A, therefore at a state s, The movement a=π (s) that we select,

When using strategy π, desired value of the accumulation return at state s is defined as state-value function:

V^π(s)=E [R (s₀)+γR(s₁)+γ²R(s₂)+…|s₀=s, π] (3)

Turn to Bellman equation:

R (s) expression is returned at once in formula (4), i.e., the return that selection movement a can be obtained immediately at state s, and equation is right Side Section 2 is sum term, indicates return following under this selection,

Define optimum state-value function are as follows:

The form of the graceful equation of Bell are as follows:

Equally define optimal policy function are as follows:

From formula (4)~(7):

Following Utilization strategies iteratively solve IMDP, specific steps are as follows:

Step 1: initialization stateful V (s) and π (s), wherein be initialized as randomized policy；

Step 2: assessing current strategies with current V (s), the V (s) of each state is calculated, until V (s) Convergence, has just trained this state value function V (s)；

Step 3: being improved with current strategies valuation functions V (s) obtained in the previous step, in each state s, to every A possible movement a calculates the expected value for reaching NextState after taking this to act, chooses the phase for making to reach NextState The maximum movement of cost function is hoped to carry out more new strategy π (s), then circulation step two and step 3 again, until V (s) and π (s) All convergences, finally obtain taxi recommended models.

The beneficial effects of the present invention are: taxi caused by the present invention is mismatched for taxi in city and Customer information Vehicle seeks objective difficult problem, propose it is a kind of the following passengers quantity is predicted using deep learning based on traffic big data, and with can when The markov decision process of change provides the technology for seeking objective strategy by Policy iteration into taxi driver, solves existing With the difficult problem of mechanism matching, the working efficiency of taxi is increased, so that taxi income is more scientific, and passenger's chauffeur It is more easier.

Detailed description of the invention

Fig. 1 is the method stream that the taxi of the invention based on deep learning and big data analysis seeks objective policy recommendation method Cheng Tu；

Fig. 2 is carrying hot spot thermodynamic chart of the visualization in Baidu map；

Fig. 3 is Recognition with Recurrent Neural Network RNN schematic diagram.

Specific embodiment

Technical solution in the embodiment of the present invention that following will be combined with the drawings in the embodiments of the present invention carries out clear, complete Ground description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Based on this Embodiment in invention, every other reality obtained by those of ordinary skill in the art without making creative efforts Example is applied, shall fall within the protection scope of the present invention.

Shown in referring to Fig.1, the invention is realized by the following technical scheme: the taxi based on deep learning and big data analysis Vehicle seeks objective policy recommendation method, the recommended method the following steps are included:

Step 1: taxi historical trajectory data is cleaned；

Step 3: taxi hot spot is extracted from the taxi pickup point extracted；

Step 4: passenger capacity prediction is carried out to taxi hot spot；

Step 5: taxi recommended models are proposed.

Specifically, the present invention taxi historical track is stored using big data processing platform Hadoop and Spark, It excavates, training set data is trained using the RNN in Python deep learning library Keras by Pycharm platform, and pass through survey Examination collection data carry out the accuracy of verification algorithm.During prediction, root-mean-square error RMSE and MRE couples of average relative error are utilized Prediction result, which is done, to be assessed and compares with traditional algorithms results such as SVM and BPNN.

The recommendation stage hires out car data to Beijing first with the library Numpy and the library Pandas of Python and does further It excavates and arranges, obtain state transition probability matrix corresponding with behavior and return matrix, be updated in IMDP model and imitated Really obtain Generalization bounds, comparison seeks the objective time using seeking in objective time and former data of obtaining of Generalization bounds.

In the preferred embodiment of this part, in step 1, specifically, including: removal nothing to historical trajectory data cleaning Effect data, removal hashed field are gone unless this city data, removal repeated data.

The partial results of data cleansing are referring to shown in table 1:

Table 1

Specifically, data cleansing (Data cleaning) is big data information excavating and last training pattern or carries out it His any beginning further applied.Data before data cleansing processing, referred to as initial data, wherein generally can include many heavy Multiple information, inessential or even wrong information.Therefore it is necessary to which initial data is further screened and is handled.Number It is not only to screen out not needing and analyze data the information to cut little ice according to cleaning, further includes to the mistake in data Corrigendum and the recovery of default value of information etc..Since the data in data warehouse are the collection for the data of a certain specific content It closes, these data extract from multiple operation systems and cover a series of historical informations.However this will appear and has The information of data be the problems such as incorrect and some data clash between each other.These are incorrect or have conflict Data be significant adverse to the research work of next step, these data are called " dirty data ".In general, can be according to certain rule " dirty data " is washed off, here it is data cleansings.Data cleansing mainly solves the problems, such as it is the undesirable data of filtering, no The data for meeting specification generally include incomplete data, the data of mistake, duplicate data these three types.

And data cleansing of the invention mainly includes following four partial content:

(1) it removes invalid data: including partial invalidity data in taxi historical track, and these data are to of the invention Research contents does not have any value；

(2) remove hashed field: the positioning device of taxi includes bulk information, such as taxi mode bit, message id etc. Field.But the present invention does not use, therefore removes；

(3) go unless this city data: taxi may go to non-this city place during business, and GPS positioning device Still there is record to this partial data.For convenience of the subsequent analysis of this paper, need to remove this partial data；

(4) repeated data is removed.

In the preferred embodiment of this part, include: in step 2

Step 2 one: taxi historical trajectory data is uploaded into Hadoop cluster；

Specifically, the original historical data of taxi only includes the anchor point of taxi and the various states of anchor point, In order to further analyze the trip mode of passenger, the present invention is firstly the need of the passenger point for extracting taxi.Due to taxi history Data volume is huge, and needs repeatedly to be ranked up taxi historical data, screen etc. operation during extracting passenger point, So the present invention analyzes taxi historical data using the storage capacity of Hadoop and the distributed computing function of Spark, The partial data that passenger point is extracted is as shown in table 2:

Table 2

Shown in referring to Fig.1, in the preferred embodiment of this part, in step 3, clustered using the DBSCAN based on density Algorithm finds hot spot, and DBSCAN includes two important parameters: minimum includes points minPts and sweep radius eps, and respectively Setting is minimum to include points and sweep radius.

Specifically, the present invention can analyze the carrying hot spot region in city in taxi point data,

In order to find the big region of carrying density, and it is this using distance as the cluster of standard and discomfort to be similar to K-means For finding the big region of carrying density, therefore the present invention finds hot spot using the DBSCAN clustering algorithm based on density. DBSCAN includes two important parameters: minimum includes points (minPts) and sweep radius (eps).Minimum includes this hair of counting It is bright to be set as 3000 according to the frequency of carrying time.For sweep radius, the present invention is averagely sought the objective time 8 with taxi and divided Clock is foundation, and assumes that the average speed per hour of cab driving is 20km/h, therefore it is about 3 public that taxi, which averagely seeks visitor's distance, In, to guarantee that driver can search out the hot spot within 3 kilometers in any position as far as possible, between two nearest hot spots of distance away from From should be maintained near 6 kilometers, therefore show that sweep radius eps should be 6 kilometers.Visualize the carrying hot spot in Baidu map Thermodynamic chart is as shown in Figure 2.

Referring to shown in Fig. 3, in the preferred embodiment of this part, in step 4, specifically, due to the uncertainty of passenger And in 00:00~06:00 period, since passenger is sparse and contingency is larger, in order to accurately predict that hot spot future multiplies Guest's number, common machine learning algorithm (such as: KNN, SVM etc.) tend not to predict that future gets on the bus number very accurately.And because Value for traditional full Connection Neural Network BPNN hidden layer is solely dependent upon input, generally requires to know when to time series forecasting Road previous state or preceding several states, for this problem, present invention proposition adds itself to the input of hidden layer neuron Return value obtain Recognition with Recurrent Neural Network RNN, which is most good at processing time series, and structure is as shown in Figure 3.

Referring to shown in Fig. 3, in the preferred embodiment of this part, the Recognition with Recurrent Neural Network RNN is inversely propagated using error Algorithm training.

Specifically, being different from fully-connected network, the output of RNN hidden layer s and output layer o are as follows:

s_t=f (Ux_t+Ws_t-1), (1)

o_t=g (Vs_t), (2)

Wherein, t is the moment, and x is input layer, and s is hidden layer, and o is output layer, and matrix W is that the value of hidden layer last time is made For the weight of this input, U and V are weight matrix.

Due to the presence of iterative relation it can be seen from formula (1), theoretically RNN can look forward any number of inputs, this It is exactly the reason of RNN is suitble to time series forecasting.RNN is equally trained using error backpropagation algorithm, i-th layer of t moment Error amount is propagated in both directions:

In the preferred embodiment of this part, Dropout layers are added on the basis of 3 layers of RNN.

Specifically, Dropout layers are added on the basis of 3 layers of RNN in the training process to prevent network over-fitting The input neuron connection of certain percentage (rate=0.2) is disconnected when each undated parameter at random.

Referring to shown in Fig. 3, in the preferred embodiment of this part, in step 5, specifically, coming for taxi driver It says, the place of the upper taxi of passenger is called pickup point.When taxi empty, driver needs to find next bit passenger as early as possible.For This, driver should select a place as potential pickup point.So when there is the pickup point of many candidates, how driver is selected Could no-load ratio be reduced to the greatest extent to increase taxi income by selecting? the improved Markovian decision process of the present invention IMDP studies this problem to making better recommendation.

Traditional Markovian decision process MDP can be expressed as five-tuple (S, A, a P_sa,γ,R_sa), wherein S is State set；A is behavior set (waiting or go to next hot spot in current hotspot)；P_saFor transition probability matrix, indicate State S_iThe probability that movement a ∈ A is transferred to NextState is carried out in the case of ∈ S；γ ∈ (0,1) is discount factor；It is in shape State s_iReturn after taking behavior a ∈ A under ∈ S.The core of MDP is to find optimal policy, and so-called strategy refers to state to movement Mapping, be expressed as π (x).If a strategy maximizes total prospective earnings, it is exactly optimal.

Recommend this problem specific to taxi, since the seating capacity of each hot spot changes over time, so each hot spot Carrying probability and taxi to seek the return of visitor be also time-varying, therefore traditional MDP can not provide strategy for taxi and push away It recommends.Each carrying hot spot is extended to 96 states (every 15 minutes states), each state pair by the IMDP that the present invention uses The seating capacity for answering current slot, by this method, the MDP of non-time-varying can become time-varying MDP, thus preferably for out It hires a car recommendation.

In IMDP, S is state set, quantity N=100*96；Driver indicates in a series of actions that pickup point can be done It { is waited, be moved to next hot spot } for A=, the selection of next hot spot is depended on being currently located hot spot；P_saPass through data mining It obtains；Take γ=0.9；Cost after current hotspot makes a choice is C_ij=E_ij, E_ijIt is vehicle from state s_iTo s_jZero load Time, correspondingly, obtained return is denoted asX_jFor state s_jPrediction get on the bus number.

Under the influence of the above process, defining Reward Program is

It is abbreviated as R (s₀)+γR(s₁)+γ²R(s₂)+…

Our target is the mean value E [R (s for making global Reward Program by selection strategy₀)+γR(s₁)+γ²R(s₂) + ...] reach maximum.Herein, tactful π is the function that state is mapped to movement, i.e. π: S → A, therefore at a state s, The movement a=π (s) that we select.

V^π(s)=E [R (s₀)+γR(s₁)+γ²R(s₂)+…|s₀=s, π] (3)

Turn to Bellman equation

R (s) expression is returned at once in formula (4), i.e., the return that selection movement a can be obtained immediately at state s, and equation is right Side Section 2 is sum term, indicates return following under this selection.

Defining optimum state-value function is

The form of the graceful equation of Bell is

Equally defining optimal policy function is

From formula (4)~(7):

Following Utilization strategies iteratively solve IMDP, and the method that Policy iteration solves MDP optimal policy is initial from one The strategy of change sets out, and first carries out Policy evaluation, then improvement strategy, assesses improved strategy, further improvement strategy, passes through Continuous iteration updates, until strategy is restrained.Specific steps are as follows:

Step 1: initialization stateful V (s) and π (s) (being initialized as randomized policy)；

Step 2: assessing current strategies with current V (s), the V (s) of each state is calculated, until V (s) Convergence, has just trained this state value function V (s).

Step 3: being improved with current strategies valuation functions V (s) obtained in the previous step, in each state s, to every A possible movement a calculates the expected value for reaching NextState after taking this to act, chooses the phase for making to reach NextState The maximum movement of cost function is hoped to carry out more new strategy π (s), then circulation step two and step 3 again, until V (s) and π (s) All convergences.

The present invention stores Beijing's taxi historical track using big data processing platform Hadoop and Spark, It excavates, training set data is trained using the RNN in Python deep learning library Keras by Pycharm platform, and pass through survey Examination collection data carry out the accuracy of verification algorithm.During prediction, root-mean-square error (RMSE) and average relative error are utilized (MRE) prediction result is done and assesses and is compared with traditional algorithms results such as SVM and BPNN.

Claims

1. the taxi based on deep learning and big data analysis seeks objective policy recommendation method, which is characterized in that the recommendation side Method the following steps are included:

Step 1: taxi historical trajectory data is cleaned；

Step 3: taxi hot spot is extracted from the taxi pickup point extracted；

Step 4: passenger capacity prediction is carried out to taxi hot spot；

Step 5: taxi recommended models are proposed.

2. the taxi according to claim 1 based on deep learning and big data analysis seeks objective policy recommendation method, Be characterized in that, in step 1, specifically, to historical trajectory data cleaning include: removal invalid data, removal hashed field, It goes unless this city data, removal repeated data.

3. the taxi according to claim 1 based on deep learning and big data analysis seeks objective policy recommendation method, It is characterized in that, includes: in step 2

Step 2 one: taxi historical trajectory data is uploaded into Hadoop cluster；

Step 2 two: data are mapped to using the map function of Spark with taxi by the historical trajectory data that load was cleaned Number is major key, other fields are the RDD object of value；

Step 2 three: the element with identical major key is pooled to a set using the groupByKey method of Spark platform In；

4. the taxi according to claim 1 based on deep learning and big data analysis seeks objective policy recommendation method, It is characterized in that, in step 3, hot spot is found using the DBSCAN clustering algorithm based on density, DBSCAN includes two important Parameter: minimum includes points minPts and sweep radius eps, is respectively set minimum comprising points and sweep radius.

5. the taxi according to claim 1 based on deep learning and big data analysis seeks objective policy recommendation method, It is characterized in that, in step 4, passenger capacity prediction is carried out to taxi hot spot and has used Recognition with Recurrent Neural Network RNN, specifically , Recognition with Recurrent Neural Network RNN, RNN hidden layer s and output layer are obtained to the return value of the input addition itself of hidden layer neuron The output of o are as follows:

s_t=f (Ux_t+Ws_t-1), (1)

o_t=g (Vs_t), (2)

Wherein, t is the moment, and x is input layer, and s is hidden layer, and o is output layer, and matrix W is the value of hidden layer last time as this The weight of secondary input, U and V are weight matrix,

6. the taxi according to claim 5 based on deep learning and big data analysis seeks objective policy recommendation method, It is characterized in that, Dropout layers is added on the basis of 3 layers of RNN.

7. the taxi according to claim 1 based on deep learning and big data analysis seeks objective policy recommendation method, It is characterized in that, in step 5, specifically, taxi recommended models are made with Markovian decision process IMDP, specifically, institute It states Markovian decision process IMDP and is expressed as five-tuple (S, A, a P_sa,γ,R_sa), wherein S is state set；A={ etc. Wait be moved to next hot spot }, it is a series of actions that driver can do in pickup point；P_saFor transition probability matrix, pass through data Excavation obtains, and indicates in state S_iThe probability that movement a ∈ A is transferred to NextState is carried out in the case of ∈ S；γ ∈ (0,1) is folding Detain the factor；It is in state s_iReturn after taking behavior a ∈ A under ∈ S, the cost after current hotspot makes a choice are C_ij =E_ij, E_ijIt is vehicle from state s_iTo s_jIdle time, obtained return is denoted asWherein, X_jFor state s_jPrediction get on the bus number,

Taxi is from a state s in state set S₀Start, selects a movement a in behavior aggregate A₀Reach NextState s₁, Equally one movement a of selection in this case₁Arrival state s₂, this process is described as:

Under the influence of the above process, Reward Program is defined are as follows:

It writes a Chinese character in simplified form are as follows:

R(s₀)+γR(s₁)+γ²R(s₂)+…

Using tactful π as a stateful corresponding strategy set, tactful π is the function that state is mapped to movement, i.e., π: S → A, at a state s, selection acts a=π (s),

V^π(s)=E [R (s₀)+γR(s₁)+γ²R(s₂)+…|s₀=s, π] (3)

Turn to Bellman equation:

R (s) expression is returned at once in formula (4), i.e., the return that can obtain immediately of selection movement a state s at, equation right side the Binomial is sum term, indicates return following under this selection,

Define optimum state-value function are as follows:

The form of the graceful equation of Bell are as follows:

Equally define optimal policy function are as follows:

From formula (4)~(7):

Step 2: assessing current strategies with current V (s), calculates the V (s) of each state, until V (s) is received It holds back, has just trained this state value function V (s)；

Step 3: being improved with current strategies valuation functions V (s) obtained in the previous step, in each state s, to it is each can The movement a of energy calculates the expected value for reaching NextState after taking this to act, chooses the expectation valence for making to reach NextState The maximum movement of value function carrys out more new strategy π (s), then circulation step two and step 3 again, until V (s) and π (s) all Convergence, finally obtains taxi recommended models.