CN104464344B

CN104464344B - A kind of vehicle running path Forecasting Methodology and system

Info

Publication number: CN104464344B
Application number: CN201410628190.1A
Authority: CN
Inventors: 马传香; 王时绘; 余啸; 曾诚; 陈昊; 张*; 张; 吕顺营; 宋建华; 吴思尧
Original assignee: Hubei University
Current assignee: Hubei University
Priority date: 2014-11-07
Filing date: 2014-11-07
Publication date: 2016-09-14
Anticipated expiration: 2034-11-07
Also published as: CN104464344A

Abstract

A kind of vehicle running path Forecasting Methodology and system, determine minimum internal memory, scanning pattern greatest length including based on Hadoop platform, and original route sequence library is averagely divided into n disjoint subpath sequence library；Original path sequence library and n sub-path sequence data base are uploaded to HDFS respectively；By main controlled node, n sub-path sequence data base is dispatched to different Map nodes, each Map node performs the GSP algorithm improved, according to minimum support x set in advance, the subpath sequence library in Map node memory is left in scanning in, calculating local path sequence pattern, Reduce node carries out merger process and obtains overall situation candidate sequence pattern；Scanning original path sequence library obtains global path sequence pattern again；Produced path correlation rule by global path sequence pattern and calculate the confidence level of path correlation rule, obtaining vehicle running path and predict the outcome.

Description

A kind of vehicle running path Forecasting Methodology and system

Technical field

The invention belongs to intelligent transportation system technical field, particularly relate to a kind of vehicle running path Forecasting Methodology and be System.

Background technology

(1) intelligent transportation system

Along with the development of geographic positioning technology is with ripe, and the rise of mobile computing, based on path and geographical position Application becomes academia and the common focus of industrial quarters even government.Routing information and geographical position are as the weight moving object Want attribute, can be that a lot of improvement serviced with application system provides important support.By path and the position letter of mobile object Cease and input as system, expedited the emergence of numerous emerging application.Intelligent transportation system is exactly that the most famous one should Use field.The predecessor of intelligent transportation system is intelligent vehicle roadnet.Intelligent transportation system is by advanced information technology, data Communication transfer technology, Electronic transducer technology, electron controls technology and computer processing technology etc. be effectively integrated apply to whole Individual traffic management system, and set up a kind of on a large scale in, comprehensive play a role, in real time, accurately and efficiently Multi-transportation and management system.Intelligent transportation system is a complicated comprehensive system, can divide from the angle of system composition Some subsystems below becoming:

1) advanced transportation information service systems (ATIS)

On the basis of ATIS is built upon perfect information network.Traffic participant by being equipped on road, Che Shang, change Take advantage of that station is upper, on parking lot and the sensor of RSMC and transmission equipment, provide the real-time friendship of various places to traffic information center Communication breath；ATIS obtain these information and by process after, in real time to traffic participant provide Traffic Information, public transport Information, transfer information, traffic weather information, parking lot information and other information relevant to trip；Traveler is according to these Information determines the trip mode of oneself, selects route.Further, when be equipped with on car be automatically positioned with navigation system time, should System can help driver to automatically select travel route.

2) advanced traveler information systems (ATMS)

ATMS some with ATIS shared information collection, process and transmission system, but ATMS is mainly to traffic pipe Reason person uses, and is used for detecting control and management highway communication, provides communication contact between road, vehicle and driver.It Traffic in roadnet, vehicle accident, meteorological condition and traffic environment will be carried out real-time supervision, rely on advanced person's Vehicle testing techniques and Computerized Information Processing Tech, it is thus achieved that about the information of traffic, and according to the information pair collected Traffic is controlled, and such as signal lights, issues induction information, road control, accident treatment and rescue etc..

3) advanced public transportation system (APTS)

The main purpose of APTS is the development using various intellectual technologies to promote public transportation industry, makes public transit system realize peace Target the most convenient, that economy, freight volume are big.As by personal computer, closed-circuit television etc. to the public with regard to trip mode and event, road Line and train number selection etc. provide consulting, are provided the real-time traffic information of vehicle to the person of waiting by display in bus stop.? Public transit vehicle administrative center, can dispatch a car according to the real-time status reasonable arrangement of vehicle, the plan of returning the vehicle to the garage and knock off etc., improve work efficiency and Service quality.

4) advanced vehicle control system (AVCS)

The purpose of AVCS is that exploitation helps driver to carry out the various technology of this wagon control, so that running car peace Entirely, efficiently.AVCS includes the warning to driver and help, and barrier is avoided waiting automatic Pilot technology.

5) transportation management system

Herein referring to based on expressway network and information management system, utilize that Logistics Theory is managed is intelligentized Logistic management system.Comprehensive utilization satellite fix, GIS-Geographic Information System, logistics information and network technology effectively organize goods to transport Defeated, improve shipping efficiency.

6) E-payment system (ETC)

ETC is the most state-of-the-art toll on the road and bridge's mode.By the vehicle carried device that is arranged in vehicle windscreen with The special short range communication of microwave between microwave antenna on charge station ETC track, utilizes Computer Networking to carry out with bank Backstage settlement process, thus reach vehicle and be not required to stop by toll on the road and bridge station and the purpose of road and bridge expense can be paid, and paid Expense sorting after background process give relevant income owner.Existing track is installed electric non-stop toll system System, can make the traffic capacity in track improve 3～5 times.

7) emergency rescuing system (EMS)

EMS is a special system, and its basis is ATIS, ATMS and relevant rescue facility and facility, passes through The rescue facility of traffic surveillance and control center with occupation is unified into organic whole by ATIS and ATMS, provides vehicle event for road user The services such as barrier on-the-spot emergency action, trailer, on-the-spot rescue, eliminating accident vehicle.

(2) path Predicting Technique

The method of path prediction is broadly divided into following two classes:

1) path based on Markov model Forecasting Methodology.Document [1]: Simmons R, Browning B, Zhang Y, et al.Learning to predict driver route and destination intent[C].Proceedings Of Intelligent Transportation Systems Conference, even if 2006:127-132. proposition has more preferably Path, people also habitual can select that passes by the past to be familiar with route.Based on this premise, by driver history is travelled road The observation of footpath data, sets up Markov probabilistic model and generates Markov probability tree, accordingly can be by current time state, it was predicted that car The Path selection of subsequent time.Document [2]: ETC charge data Research on Mining [J] based on mixing Markov model. traffic Transport system engineering and information .2012.12 (4). choose ETC historical data build path sequence transaction database, it is proposed that one Plant method based on forecast model prediction vehicle on highway path, mixing Markov path, utilize and the method achieve public affairs at a high speed Road ETC vehicle will pass through the prediction of state future.But the distance of the method prediction is short, it is merely able to predict that vehicle subsequent time will arrive The section reached.

2) path based on sequential mode mining Forecasting Methodology.Document [3]: Yang J, Hu M.Trajpattern: mining sequential patterns from imprecise trajectories of mobile objects[C] .Proceedings of the International Conferences on Extending Database Technology, 2006:664 681. is for the position prediction problem of moving target under mobile computing environment, it is proposed that a kind of The method excavating target travel rule from historical trajectory data, is first divided into several lattice of area equation by moving region Son, then changes into by the ordered sequence formed through these grid limits by target trajectory, then uses standard GSP to calculate Method is excavated Frequent Sequential Patterns therein and generates rule of inference.Document [4]: Giannotti F, Nanni M, Pedreschi D.Trajectory pattern mining[C].Proceedings of the 13th ACM SlGKDD International Conference on Knowledge Discovery and Data Mining, 2007；330— 339. propose a kind of frequent Sequential Pattern Mining Algorithm being provided data by GPS device, on the basis of the algorithm of document [3] On, add this parameter of the time of staying in grid.But the method operational capability when processing mass data can not expire far away The requirement of foot people.Therefore, it is necessary to give full play to the newest fruits of computer software and hardware development, improve computational efficiency.

At present, intelligent transportation system uses substantial amounts of advanced sensing device, network technology, camera arrangement and supercomputing Machine system, it is possible to monitor in real time and collect substantial amounts of traffic data.Assume with series installation the intersection of electronic eye Transportation network is constituted, then vehicle running path sequence (hereinafter referred to as path sequence) can be come with node sequence arrangement for node Represent.If I={i_k, k=1,2 ..., n} is a project set, project i_kRepresent, on the road i.e. road of circuit node, electronics is installed The intersection of eye, n is intersection number.Path sequence is the ordered arrangement of disparity items, and path sequence S can be expressed as S=< s₁,s₂,…s_j.,…s_n>, wherein s_jFor the project in project set I.The most individual continuous item group in one path sequence The sequence become is referred to as the subpath sequence of this path sequence.If the subpath sequence that path sequence α is path sequence β, Ze Cheng road Footpath sequence β comprises path sequence α.Path sequence S is in path sequence data base at the support counting of path sequence data base Comprise the path sequence number of S.Path sequence S is that the path sequence comprising S is in path in the support of path sequence data base Percentage ratio shared in sequence library, is designated as Support (S).Given minimum support ξ, if path sequence S is in path sequence Support in column database is not less than ξ, then path sequence S is called path sequence pattern.Path sequence has following character (following It is called for short character 1): each two adjacent items contained by path sequence is road two adjacent sections point.

(3) Map-Reduce programming framework

Map-Reduce is a kind of programming framework, have employed concept " Map (mapping) " and " Reduce (reduction) ", for big The concurrent operation of scale data collection (more than 1TB).At relevant document: [3] Jeffrey Dean and Sanjay Ghemawat.Map-Reduce:Simplified data processing on large Cluster[C] .Commuication of the ACM, propose in 2008,51 (1): 107-113..User only need to write two be referred to as Map and The function of Reduce, system can manage the coordination between execution and the task of Map or Reduce parallel task, and The situation of certain mission failure above-mentioned can be processed, and the fault-tolerance to hardware fault can be ensured simultaneously.

Calculating process based on Map-Reduce is as follows:

1) first input file is divided into M data fragmentation by the Map-Reduce storehouse in user program, each burst big Little general from 16 to 64MB the size of each data slot (user can be controlled by optional parameter), then Map- Reduce storehouse creates substantial amounts of copies of programs in a group of planes.

2) these copies of programs have a special program-primary control program, and in copy, other program is all by master control journey The working procedure of sequence distribution task.Having M Map task and R Reduce task to be allocated, a Map is appointed by primary control program Business or Reduce task distribute to an idle working procedure.

3) working procedure that Map task is assigned reads relevant input data slot, from the data slot of input Parsing key-value (key, value) right, then key-value to passing to user-defined Map function, Map function will produce The interim key-value in centre to being saved in local memory cache.

4) key-value in caching is divided into R region to by partition functions, is the most periodically written to local disk On.The key-value of caching will pass back to primary control program to the storage position on local disk, primary control program be responsible for these Storage position passes to the working procedure that Reduce task is assigned again.

5) receive, when the working procedure that Reduce task is assigned, the data storage location information that primary control program is sent After, use remote procedure call (remote procedure calls) from the working procedure place master that Map task is assigned These are read data cached on the disk of machine.When the working procedure that Reduce task is assigned have read all of intermediate data After, there is the data aggregate of same keys together by making after key is ranked up.Owing to many different keys can be mapped to In identical Reduce task, it is therefore necessary to be ranked up.If intermediate data cannot complete the most greatly sequence in internal memory, then Will be ranked up in outside.

6) the intermediate data working procedure traversal sequence of Reduce task after is assigned, for each unique in Between key-value pair, the working procedure of Reduce task is assigned and passes to the set of intermediate value associated with it for this key use Family self-defining Reduce function.The output of Reduce function is appended to the output file of affiliated subregion.

7) after all of Map and Reduce task all completes, primary control program wakes up user program up. during this time, Calling Map-Reduce in user program just returns.

(4) Hadoop cloud calculates platform

Hadoop is the open source software meeting reliability, extensibility, Distributed Calculation developed by Apache foundation Project.User can develop distributed program in the case of not knowing about distributed low-level details.Make full use of the power of cluster Carry out high-speed computation and storage.Hadoop achieves a distributed file system (Hadoop Distributed File System), it is called for short HDFS.HDFS has the feature of high fault tolerance, and is designed to be deployed on cheap hardware；And it carries Carry out the data of access application for high-throughput, be suitable for those application programs having super large data set.HDFS relaxes The requirement of POSIX, can access the data in file system in the form of streaming.

The design that the framework of Hadoop is most crucial is exactly: HDFS and Map-Reduce.HDFS is that the data of magnanimity provide Storage, Map-Reduce is that the data of magnanimity provide calculating.

But, for concrete technical problems, need to solve how planning technology scheme is to use Map-Reduce parallel The problem realized.Not yet there is the technical scheme with ideal effect in the art.

Summary of the invention

For existing path based on Markov model Forecasting Methodology prediction distance short, be merely able to predict vehicle next The section that moment will arrive, existing path based on sequential mode mining Forecasting Methodology is processing mass data and high dimensional data The problem of operational capability poor efficiency, and the character 1 being had for vehicle running path sequence, the present invention improves original GSP algorithm The generation process of candidate sequence pattern, promotes the operational performance of original GSP algorithm, and utilizes Map-Reduce programming framework to changing Entering GSP algorithm and carry out parallelization, design meets the sequence library decomposition strategy of concurrent operation requirement, reduces I/O expense.At this On the basis of make full use of Hadoop cloud calculate platform Large-scale parallel computing ability improve mass data sequential mode mining Efficiency, shortens working hours.

The technical scheme that the present invention provides is a kind of vehicle running path Forecasting Methodology, carries out following based on Hadoop platform Step,

Step 1, according to the internal memory situation of every computer in Hadoop platform, determines the minimum internal memory of all nodes, and Being designated as Q, unit is GB；

Step 2, scanning storage has the original path sequence library of vehicle running path sequence, obtains original path sequence In data base, the bar number scale of path sequence is m bar, and every paths sequence includes more than one crossing, original path sequence library The actual storage size of middle longest path sequence is designated as P, and unit is B；

Step 3, is averagely divided into n disjoint subpath sequence by original route sequence library by horizontal division mode Data base, wherein P × (m/n)≤Q × 10⁹；

Step 4, uploads to original path sequence library in certain specified folder of HDFS；

Step 5, uploads to n sub-path sequence data base in another specified folder of HDFS；

Step 6, the main controlled node of Hadoop platform n step 5 uploaded a sub-path sequence data base is dispatched to not Same Map node, each Map node performs the GSP algorithm improved, and according to minimum support ξ set in advance, scanning is left in Subpath sequence library in Map node memory, calculates local path sequence pattern, with<key, value>to form Passing to Reduce node, wherein key is local path sequence pattern, and value is the support meter of local path sequence pattern Number；

The GSP algorithm that each Map node performs to improve is as follows,

Operation a, for being assigned to the subpath sequence library of this Map node, scanning subpath sequence library obtains 1-path sequence pattern L₁, make k=1,

Operation b, by k-path sequence pattern L_kProduce candidate's k+1-path sequence C_k+1, again scan former sequence library, Calculate the support of each path candidate sequence, produce k+1-path sequence pattern L_k+1；Wherein, candidate k+1-path sequence is produced Row C_k+1Divide the following two kinds situation,

(1) if being produced candidate's 2-path sequence pattern, scanning storage traffic network information by 1-path sequence pattern Adjacency list, check 1-path sequence pattern L₁In each path sequence pattern s₁Adjacent node, will be with s₁Adjacent node entry Mesh adds s to₁In；

(2) if by k-path sequence pattern produce candidate's k+1-path sequence pattern, k > 1,

First, to any two path sequence pattern s in k-path sequence pattern₁And s₂If removing path sequence pattern s₁First project with remove path sequence pattern s₂Last project obtained by path sequence identical, then by s₁With s₂It is attached；Then, prune, if certain the subpath sequence including certain path candidate sequence pattern is not path sequence Pattern, then delete from path candidate sequence pattern；

Operation c, makes k=k+1, repetitive operation b, until not having new path candidate sequence to produce；

Step 7,<key, the value>that Map node is passed over by Reduce node obtains the overall situation to carrying out merger process Candidate sequence pattern；

Step 8, scanning step 4 leaves the original path sequence library in HDFS in overall situation candidate sequence mould again Formula counts, and finds out and meets the sequence pattern not less than minimum support ξ set in advance, obtains global path sequence pattern；

Step 9, is produced path correlation rule by the global path sequence pattern produced in step 8 and calculates path association rule Confidence level then, obtains vehicle running path and predicts the outcome.

The present invention correspondingly provides a kind of vehicle running path prognoses system, arranges based on Hadoop platform with lower module,

Internal memory confirms module, for according to the internal memory situation of every computer in Hadoop platform, determines in all nodes The internal memory of the machine that internal memory is minimum, and it is designated as Q, unit is GB；

Longest path sequence confirms module, has the original path sequence data of vehicle running path sequence for scanning storage Storehouse, the bar number scale obtaining path sequence in original path sequence library is m bar, and every paths sequence includes more than one road Mouthful, in original path sequence library, the actual storage size of longest path sequence is designated as P, and unit is B；

Subpath sequence library divides module, for averagely being divided by horizontal division mode by original route sequence library For n disjoint subpath sequence library, wherein P × (m/n)≤Q × 10⁹；

Transmission module on raw data base, for uploading to certain specified folder of HDFS by original path sequence library In；

Transmission module on subdata base, for uploading to another specified folder of HDFS by n sub-path sequence data base In；

Local path sequence pattern module, for making the main controlled node of Hadoop platform be uploaded by transmission module on subdata base N sub-path sequence data base be dispatched to different Map nodes, each Map node performs the GSP algorithm improved, according in advance The minimum support ξ first set, scanning is left the subpath sequence library in Map node memory in, is calculated local path Sequence pattern, with<key, value>to form pass to Reduce node, wherein key is local path sequence pattern, Value is the support counting of local path sequence pattern；

The GSP algorithm that each Map node performs to improve is as follows,

Overall situation candidate sequence mode module is right for<key, the value>making Reduce node pass over Map node Carry out merger process and obtain overall situation candidate sequence pattern；

Global path sequence pattern module, on scanning raw data base again transmission module leave in HDFS former Beginning path sequence data base, to overall situation candidate sequence mode counting, finds out and meets not less than minimum support ξ's set in advance Sequence pattern, obtains global path sequence pattern；

Predict the outcome module, for being produced road by the global path sequence pattern produced in global path sequence pattern module Footpath correlation rule also calculates the confidence level of path correlation rule, obtains vehicle running path and predicts the outcome.

Relative to domestic and international existing vehicle running path Forecasting Methodology, the present invention is according to Map-Reduce programming framework Basic demand, has redesigned and vehicle running path sequence has carried out sequential mode mining and generates the stream of path correlation rule Journey.Present invention is alternatively directed to vehicle running path sequence character 1 the generation process of original GSP algorithm candidate sequence pattern is changed Entering, the present invention have also been devised rational sequence library decomposition strategy, it is achieved that improves the parallelization of GSP algorithm, reduces I/O and opens Pin, can give full play to share the disposal ability of the cluster computer of storage, improve work efficiency.Technical scheme has Simply, quick feature, it is possible to preferably improve and vehicle running path sequence carries out sequential mode mining and generates path closing The efficiency of connection rule.

Accompanying drawing explanation

Fig. 1 is the flow chart of the embodiment of the present invention；

Fig. 2 is the simulation traffic network schematic diagram of the embodiment of the present invention；

Fig. 3 is the adjacency list of the storage simulation traffic network of the embodiment of the present invention；

Fig. 4 is that the original path sequence library of the embodiment of the present invention divides schematic diagram；

Fig. 5 is that embodiment of the present invention antithetical phrase path sequence data base 1 performs Map task schematic diagram；

Fig. 6 is that embodiment of the present invention antithetical phrase path sequence data base 2 performs Map task schematic diagram；

Fig. 7 is that embodiment of the present invention antithetical phrase path sequence data base 3 performs Map task schematic diagram.

Detailed description of the invention

Technical solution of the present invention is described in detail below in conjunction with drawings and Examples.

Embodiment, as a example by simulation traffic network as shown in Figure 2, all has electronic eye in 14 intersections of A～N Gather data.Owing to the present invention will utilize the information of traffic network, so using adjacency list storage traffic network information, this road The adjacency list that net is corresponding is shown in that accompanying drawing 3, A crossing adjoin with B, C crossing, and B crossing adjoins with A, D crossing, and C crossing is adjacent with A, E crossing Connecing, D crossing and B, G, F crossing adjoin, and E crossing adjoins with C, F, H crossing, and F crossing adjoins with D, G, J, H, E crossing, G crossing with D, I, F crossing adjoins, and H crossing adjoins with F, K, E crossing, and I crossing adjoins with G, L crossing, and J crossing adjoins with F, N crossing, K road Mouth adjoins with H, M crossing, and L crossing adjoins with I, N crossing, and M crossing and K, N crossing adjoin, and N crossing adjoins with J, L, M crossing.Will The traveling of the vehicle of electronic eye collection records corresponding path sequence and is stored in vehicle running path sequence library, every paths Sequence includes more than one crossing, such as shown in following table.

Path sequence
	<A B D F H K>
<A C E F G I L>
	<A B D F H K M N>
<C E F G I L N>
	<A B D F H K>

<C E F G I L N>
	<A B D G I L N>
<A B D F H K M>
	<A B D F H K>
<E F G I L N>
	<A B D G I L N>
<A B D F H K M N>

What path sequence pattern reflected is the route selection of vehicle regularity.Produced by path sequence pattern and there is directivity Path correlation rule, rule former piece represent the path sequence that vehicle has travelled, consequent represents what vehicle will travel Path sequence.Such as<A B D>→<confidence level conf (<A B D>→<F H K>) definition of F H K>this paths correlation rule For path sequence data base comprises the number of path sequence<A B D F H K>and the number comprising path sequence<A B D> Ratio.I.e. represent run over the following probability through FHK node of vehicle of A tri-nodes of B D be conf (<A B D>→ <F H K>)。

Based on the above-mentioned original path sequence library previously generated, what the present invention designed programs frame based on Map-Reduce The vehicle running path Forecasting Methodology flow process of frame is shown in that accompanying drawing 1, institute can be used computer software by those skilled in the art in steps Technology realizes flow process and automatically runs.It is as follows that embodiment implements process:

Step 1, according to the internal memory situation of every computer in Hadoop platform, determines the machine that in all nodes, internal memory is minimum The internal memory of device, and it is designated as Q (unit: GB).In embodiment, obtain Q=2GB.

Owing to original path sequence library will be averagely divided into n disjoint subpath sequence data by step 3 Storehouse, and subpath sequence library is put in node memory.So in order to not allow the computer one-tenth that wherein certain internal memory is less Bottleneck for computing, it is proposed that when being embodied as, in Hadoop platform, the internal memory of every computer is the same with operational performance.

Step 2, (original path sequence library can be with the form of text document for run-down original path sequence library Storage, is beneficial in incoming for original path sequence library HDFS), the bar number scale obtaining path sequence in data base is m bar, number It is P (unit: B) according to the actual storage size of longest path sequence in storehouse.In embodiment, in data base, the bar number of path sequence is Article 12, owing to a character is taken up space as 1B, therefore maximum length sequence actual storage size is 17B (including space and angle brackets), Therefore m=12, P=17B are obtained.

Step 3, is averagely divided into n disjoint subpath sequence by original route sequence library by horizontal division mode Data base's form storage of text document (n disjoint subpath sequence library can also).General m can be divided exactly by n, Make each subpath sequence library include m/n paths sequence, i.e. first sub-path sequence data base comprises original road The path sequence of the 1st article to the m/n article of footpath sequence library, the individual sub-path sequence data base of kth (1 < k < n) comprises original road The path sequence of (k-1) × (m/n)+1 article of footpath sequence library Dao k × (m/n) article, the n-th subpath sequence library Comprise the path sequence of (n-1) × (m/n)+1 article Dao m article of original path sequence library.In order to path candidate sequence The original route sequence library being placed in external memory need not be scanned during mode counting, reduce I/O expense, each subpath sequence should be made Data base can put into internal memory.I.e. should meet P × (m/n)≤Q × 10⁹.When P, Q use other unit, also should meet corresponding bar Part, in the protection scheme of the present invention.

Such as Fig. 4, embodiment sets and original path sequence library is divided into n=3 sub-path sequence data base, implements 17 × (12/3) < 2 × 10 in example⁹, meet and subpath sequence library put into the requirement in internal memory.

Original path sequence library is divided the subpath sequence library 1,2,3 obtained as follows:

The path sequence table of subpath sequence library 1

Path sequence
	<A B D F H K>
<A C E F G I L>
	<A B D F H K M N>
<C E F G I L N>

The path sequence table of subpath sequence library 2

Path sequence
	<A B D F H K>
<C E F G I L N>
	<A B D G I L N>
<A B D F H K M>

The path sequence table of subpath sequence library 3

Path sequence
	<A B D F H K>
<E F G I L N>
	<A B D G I L N>
<A B D F H K M N>

Each path sequence includes project set { some projects in A, B, C, D, E, F, G, H, I, J, K, L, M, N} respectively.Son Path sequence data base 1 comprises the 1st article of original path sequence library to the 4th paths sequence, subpath sequence library 2 The 5th article that comprises original path sequence library comprises original path sequence to the 8th paths sequence, subpath sequence library 3 The 9th article of column database to the 12nd paths sequence.

If the number of Map node is q in Hadoop platform, it is proposed that the number of subpath sequence library is equal to Map node Number, i.e. n=q.If n < q, when running the method, (q-n) individual Map node is had to obtain not in the case of not having mission failure To utilizing, Duty-circle is the highest.If n > q, when running the method, n-q subpath in the case of not having mission failure Sequence library needs just can be processed after q the complete front q of Map node processing sub-path sequence data base, treatment effeciency The highest.Therefore n=q can meet Duty-circle and treatment effeciency simultaneously.

Step 4, uploads in certain specified folder of HDFS by original path sequence library, and step 8 will scan deposits It is placed on the path sequence data base of this specified folder.

Step 5, uploads in another specified folder of HDFS by n sub-path sequence data base, the n in this document folder Individual sub-path sequence data base is the input file that step 6 processes.

Step 6, main controlled node (running the computer node of primary control program) n sub-path sequence step 5 uploaded Data base is dispatched to different Map nodes (performing the computer node of Map task), and the GSP that each Map node performs to improve calculates Method, according to minimum support ξ set in advance, scanning is left the subpath sequence library in Map node memory in, is calculated Local path sequence pattern, with<key, value>to form pass to Reduce node (perform Reduce task computer Node), wherein key is local path sequence pattern, and value is the support counting of local path sequence pattern.

The GSP algorithm that each Map node performs to improve is as follows:

Operation a, for being assigned to the subpath sequence library of this Map node, first scanning subpath sequence library Obtain 1-path sequence pattern L₁, the most a length of 1 and support in subpath sequence library be not less than the path sequence of ξ Set.If the collection of the path sequence that a length of k and the support in subpath sequence library are not less than ξ is combined into k-path Sequence pattern L_k；Make k=1,

Operation b, then by k-path sequence pattern L_kProduce candidate's k+1-path sequence C_k+1, again scan former sequence number According to storehouse, calculate the support of each path candidate sequence, produce k+1-path sequence pattern L_k+1；

Operation c, makes k=k+1, repetitive operation b afterwards, until not having new path candidate sequence to produce, and gained 1-path Sequence pattern L₁, 2-path sequence pattern L₂... it is all local path sequence pattern.The number of times of scan database and the path of generation The greatest length of sequence pattern is identical.

Wherein, produce path candidate sequence pattern and mainly divide the following two kinds situation:

(1) if being produced candidate's 2-path sequence pattern by 1-path sequence pattern, scanning adjacency list, checking 1-path Sequence pattern L₁In each path sequence pattern s₁Adjacent node, if s₁Adjacent node also in 1-path sequence pattern L₁ In, then s₁With s₁Adjacent node connects, will be with s₁Adjacent node project adds s to₁In.

(2) if being produced candidate's k+1-path sequence pattern (k > 1) by k-path sequence pattern, path candidate sequence is produced Row pattern is main in two steps:

First, to any two path sequence pattern s in k-path sequence pattern₁And s₂If removing path sequence pattern s₁First project with remove path sequence pattern s₂Last project obtained by path sequence identical, then can be by s₁With s₂It is attached, will s₂Last project add s to₁In.Then prune: if certain path candidate sequence mould Certain subpath sequence of formula is not path sequence pattern, then this path candidate sequence pattern is unlikely to be path sequence pattern, It is deleted from path candidate sequence pattern.

Embodiment sets minimum support as 50%, performs to improve concrete steps such as Fig. 5,6,7 of GSP algorithm.It is assigned to The Map node of subpath sequence library 1, scanning subpath sequence library 1 obtains 1-path sequence pattern L₁, then by 1- Path sequence pattern L₁Produce candidate's 2-path sequence pattern C₂, again scan former sequence library, calculate each path candidate sequence The support of row pattern, produces 2-path sequence pattern L₂, repetitive operation afterwards, until not having new path candidate sequence pattern Produce.Antithetical phrase path sequence data base 2, subpath sequence library 3 are respectively by the corresponding Map node respective handling being assigned to.

See Fig. 5, the following each table of acquired results during antithetical phrase path sequence data base 1 execution:

L₁(1-path sequence pattern)

Path sequence	Support counting
		<A>	3
<B>	2
		<C>	2
<D>	2
		<E>	2
<F>	4
		<G>	2
<H>	2
		<I>	2
<K>	2
		<L>	2
<N>	2

C₂(candidate's 2-path sequence pattern)

Path sequence
	<A B>
<A C>
	<B A>
<B D>
	<C A>
<C E>
	<D B>
<D G>
	<D F>
<E F>
	<E H>
<E C>
	<F D>
<F G>
	<F H>
<F E>
	<G D>
<G I>
	<G F>
<H E>
	<H F>
<H K>
	<I G>
<I L>
	<K H>
<L I>
	<L N>

<N L>

L₂(2-path sequence pattern)

Path sequence	Support counting
		<A B>	2
<B D>	2
		<C E>	2
<D F>	2
		<E F>	2
<F G>	2
		<F H>	2
<G I>	2
		<H K>	2
<I L>	2

C₃(candidate's 3-path sequence pattern)

Path sequence
	<A B D>
<B D F>
	<C E F>
<D F G>
	<D F H>
<E F G>
	<E F H>
<F G I>
	<F H K>
<G I L>

L₃(3-path sequence pattern)

Path sequence

Support counting

<A B D>	2
		<B D F>	2
<C E F>	2
		<D F H>	2
<E F G>	2
		<F G I>	2
<F H K>	2
		<G I L>	2

C₄(candidate's 4-path sequence pattern)

Path sequence
	<A B D F>
<B D F H>
	<C E F G>
<D F H K>
	<E F G I>
<F G I L>

L₄(4-path sequence pattern)

Path sequence	Support counting
		<A B D F>	2
<B D F H>	2
		<C E F G>	2
<D F H K>	2
		<E F G I>	2
<F G I L>	2

C₅(candidate's 5-path sequence pattern)

Path sequence

<A B D F H>
	<B D F H K>
<C E F G I>
	<E F G I L>

L₅(5-path sequence pattern)

Path sequence	Support counting
		<A B D F H>	2
<B D F H K>	2
		<C E F G I>	2
<E F G I L>	2

C₆(candidate's 6-path sequence pattern)

Path sequence
	<A B D F H K>
<C E F G I L>

L₆(6-path sequence pattern)

Path sequence	Support counting
		<A B D F H K>	2
<C E F G I L>	2

See Fig. 6, the following each table of acquired results during antithetical phrase path sequence data base 2 execution:

L₁(1-path sequence pattern)

Path sequence	Support counting
		<A>	3
<B>	3
		<D>	3

<F>	3
		<G>	2
<H>	2
		<I>	2
<K>	2
		<L>	2
<N>	2

C₂(candidate's 2-path sequence pattern)

Path sequence
	<A B>
<B A>
	<B D>
<D B>
	<D G>
<D F>
	<F D>
<F G>
	<F H>
<G D>
	<G I>
<G F>
	<H E>
<H F>
	<H K>
<I G>
	<I L>
<K H>
	<L I>

<L N>
	<N L>

L₂(2-path sequence pattern)

Path sequence	Support counting
		<A B>	3
<B D>	3
		<D F>	2
<F H>	2
		<G I>	2
<H K>	2
		<I L>	2
<L N>	2

C₃(candidate's 3-path sequence pattern)

Path sequence
	<A B D>
<B D F>
	<D F H>
<F H K>
	<G I L>
<I L N>

L₃(3-path sequence pattern)

Path sequence	Support counting
		<A B D>	3
<B D F>	2
		<D F H>	2
<F H K>	2

<G I L>	2
		<I L N>	2

C₄(candidate's 4-path sequence pattern)

Path sequence
	<A B D F>
<B D F H>
	<D F H K>
<G I L N>

L₄(4-path sequence pattern)

Path sequence	Support counting
		<A B D F>	2
<B D F H>	2
		<D F H K>	2
<G I L N>	2

C₅(candidate's 5-path sequence pattern)

Path sequence
	<A B D F H>
<B D F H K>

L₅(5-path sequence pattern)

Path sequence	Support counting
		<A B D F H>	2
<B D F H K>	2

C₆(candidate's 6-path sequence pattern)

Path sequence

L₆(6-path sequence pattern)

Path sequence	Support counting
		<A B D F H K>	2

See Fig. 7, the following each table of acquired results during antithetical phrase path sequence data base 3 execution:

L₁(1-path sequence pattern)

Path sequence	Support counting
		<A>	3
<B>	3
		<D>	3
<F>	3
		<G>	2
<H>	2
		<I>	2
<K>	2
		<L>	2
<N>	3

C₂(candidate's 2-path sequence pattern)

Path sequence
	<A B>
<B A>
	<B D>
<D B>
	<D G>
<D F>
	<F D>

<F G>
	<F H>
<G D>
	<G I>
<G F>
	<H E>
<H F>
	<H K>
<I G>
	<I L>
<K H>
	<L I>
<L N>
	<N L>

L₂(2-path sequence pattern)

C₃(candidate's 3-path sequence pattern)

Path sequence
	<A B D>

<B D F>
	<D F H>
<F H K>
	<G I L>
<I L N>

L₃(3-path sequence pattern)

Path sequence	Support counting
		<A B D>	3
<B D F>	2
		<D F H>	2
<F H K>	2
		<G I L>	2
<I L N>	2

C₄(candidate's 4-path sequence pattern)

Path sequence
	<A B D F>
<B D F H>
	<D F H K>
<G I L N>

L₄(4-path sequence pattern)

Path sequence	Support counting
		<A B D F>	2
<B D F H>	2
		<D F H K>	2
<G I L N>	2

C₅(candidate's 5-path sequence pattern)

Path sequence
	<A B D F H>
<B D F H K>

L₅(5-path sequence pattern)

Path sequence	Support counting
		<A B D F H>	2
<B D F H K>	2

C₆(candidate's 6-path sequence pattern)

Path sequence
	<A B D F H K>

L₆(6-path sequence pattern)

Path sequence	Support counting
		<A B D F H K>	2

Map working node passes to<key, the value>of Reduce working node to such as following table:

key	value
		<A>	3
<B>	2
		<C>	2
<D>	2
		<E>	2
<F>	4
		<G>	2
<H>	2
		<I>	2

<K>	2
		<L>	2
<N>	2
		<A B>	2
<B D>	2
		<C E>	2
<D F>	2
		<E F>	2
<F G>	2
		<F H>	2
<G I>	2
		<H K>	2
<I L>	2
		<A B D>	2
<B D F>	2
		<C E F>	2
<D F H>	2
		<E F G>	2
<F G I>	2
		<F H K>	2
<G I L>	2
		<A B D F>	2
<B D F H>	2
		<C E F G>	2
<D F H K>	2
		<E F G I>	2
<F G I L>	2
		<A B D F H>	2
<B D F H K>	2

<C E F G I>	2
		<E F G I L>	2
<A B D F H K>	2
		<C E F G I L>	2
<A>	3
		<B>	3
<D>	3
		<F>	3
<G>	2
		<H>	2
<I>	2
		<K>	2
<L>	2
		<N>	2
<A B>	3
		<B D>	3
<D F>	2
		<F H>	2
<G I>	2
		<H K>	2
<I L>	2
		<L N>	2
<A B D>	3
		<B D F>	2
<D F H>	2
		<F H K>	2
<G I L>	2
		<I L N>	2
<A B D F>	2

<B D F H>	2
		<D F H K>	2
<G I L N>	2
		<A B D F H>	2
<B D F H K>	2
		<A B D F H K>	2
<A>	3
		<B>	3
<D>	3
		<F>	3
<G>	2
		<H>	2
<I>	2
		<K>	2
<L>	2
		<N>	3
<A B>	3
		<B D>	3
<D F>	2
		<F H>	2
<G I>	2
		<H K>	2
<I L>	2
		<L N>	2
<A B D>	3
		<B D F>	2
<D F H>	2
		<F H K>	2
<G I L>	2

<I L N>	2
		<A B D F>	2
<B D F H>	2
		<D F H K>	2
<G I L N>	2
		<A B D F H>	2
<B D F H K>	2
		<A B D F H K>	2

N sub-path sequence data base can be dispatched to different Map working nodes by Master node by Hadoop automatically, And the coordination between execution and the task of Map parallel task can be managed, and certain mission failure above-mentioned can be processed Situation.Realize relatively easy, quick in this way.

Step 7, merger is processed and obtains overall situation candidate by<key, the value>that Map node is passed over by Reduce node Sequence pattern, is i.e. combined<key, the value>that key is identical, and by<key, value>, to being converted to,<key, this key are correlated with The set of value >, the overall candidate sequence pattern such as following table that embodiment produces.

key	Value gathers
		<A>	{3,3,3}
<B>	{2,3,3}
		<C>	{2}
<D>	{2,3,3}
		<E>	{2}
<F>	{4,3,3}
		<G>	{2,2,2}
<H>	{2,2,2}
		<I>	{2,2,2}
<K>	{2,2,2}
		<L>	{2,2,2}
<N>	{2,2,3}
		<A B>	{2,3,3}

<B D>	{2,3,3}
		<C E>	{2}
<D F>	{2,2,2}
		<E F>	{2}
<F G>	{2}
		<F H>	{2,2,2}
<G I>	{2,2,2}
		<H K>	{2,2,2}
<I L>	{2,2,2}
		<A B D>	{2,3,3}
<B D F>	{2,2,2}
		<C E F>	{2}
<D F H>	{2,2,2}
		<E F G>	{2}
<F G I>	{2}
		<F H K>	{2,2,2}
<G I L>	{2,2,2}
		<A B D F>	{2,2,2}
<B D F H>	{2,2,2}
		<C E F G>	{2}
<D F H K>	{2,2,2}
		<E F G I>	{2}
<F G I L>	{2}
		<A B D F H>	{2,2,2}
<B D F H K>	{2,2,2}
		<C E F G I>	{2}
<E F G I L>	{2}
		<A B D F H K>	{2,2,2}
<C E F G I L>	{2}

<L N>	{2,2}
		<I L N>	{2,2}
<G I L N>	{2,2}

At merger, reason Hadoop is automatically performed in order to do not repeat identical local sequence pattern counting.

Step 8, scanning step 4 leaves the original path sequence library in HDFS in overall situation candidate sequence mould again Formula counts, and finds out and meets not less than the sequence pattern of minimum support ξ set in advance, embodiment output<key, value>as Following table.The local sequence pattern that Map task is merely creating, and it is unsatisfactory for the minimum support of the overall situation, so again scanning former sequence Column database, obtains the path sequence pattern of the overall situation.Scan former sequence library to step 7 gained overall situation path candidate sequence mould Key counting in formula, has obtained global path sequence pattern, has i.e. obtained the key in following table.

key	value
		<A>	9
<B>	8
		<D>	8
<F>	10
		<G>	6
<H>	6
		<I>	6
<K>	6
		<L>	6
<N>	7
		<A B>	8
<B D>	8
		<D F>	6
<F H>	6
		<G I>	6
<H K>	6
		<I L>	6
<A B D>	8
		<B D F>	6

<D F H>	6
		<F H K>	6
<G I L>	6
		<A B D F>	6
<B D F H>	6
		<D F H K>	6
<A B D F H>	6
		<B D F H K>	6
<A B D F H K>	6

Step 9, is produced path correlation rule by the global path sequence pattern produced in step 8 and calculates path association rule Confidence level then, obtains vehicle running path and predicts the outcome.The concrete of path correlation rule is produced by global path sequence pattern Step is: using front n the project (1≤n<L) of L-path sequence pattern (L>1) as rule former piece, rear L-n project is as rule Then consequent, the ratio of support of the support that confidence level is whole path sequence pattern of rule and rule former piece.The road produced Footpath correlation rule and confidence level thereof such as following table:

Path correlation rule	Confidence level
		<A>→<B>	88.89%
<B>→<D>	100%
		<D>→<F>	75%
<F>→<H>	60%
		<G>→<I>	100%
<H>→<K>	100%
		<I>→<L>	100%
<A>→<B D>	88.89%
		<A B>→<D>	100%
<B>→<D F>	75%
		<B D>→<F>	75%
<D>→<F H>	75%
		<D F>→<H>	100%
<F>→<H K>	60%

<F H>→<K>	100%
		<G>→<I L>	100%
<G I>→<L>	100%
		<A>→<B D F>	66.67%
<A B>→<D F>	75%
		<A B D>→<F>	75%
<B>→<D F H>	75%
		<B D>→<F H>	75%
<B D F>→<H>	100%
		<D>→<F H K>	100%
<D F>→<H K>	100%
		<D F H>→<K>	100%
<A>→<B D F H>	66.67%
		<A B>→<D F H>	75%
<A B D>→<F H>	75%
		<A B D F>→<H>	100%
<B>→<D F H K>	75%
		<B D>→<F H K>	75%
<B D F>→<H K>	100%
		<B D F H>→<K>	100%
<A>→<B D F H K>	66.67%
		<A B>→<D F H K>	75%
<A B D>→<F H K>	75%
		<A B D F>→<H K>	100%
<A B D F H>→<K>	100%

When being embodied as, step 1～5 can be performed by the main controlled node of Hadoop platform, and step 6 is by the master of Hadoop platform Control node is dispatched to Map node and performs, and step 7, step 8, step 9 are performed by the Reduce node of Hadoop platform.

The present invention correspondingly provides a kind of vehicle running path prognoses system, arranges based on Hadoop platform with lower module, Internal memory confirms module, for according to the internal memory situation of every computer in Hadoop platform, determines that in all nodes, internal memory is minimum The internal memory of machine, and be designated as Q；

Longest path sequence confirms module, has the original path sequence data of vehicle running path sequence for scanning storage Storehouse, the bar number scale obtaining path sequence in original path sequence library is m bar, and every paths sequence includes more than one road Mouthful, in original path sequence library, the actual storage size of longest path sequence is designated as P；

Subpath sequence library divides module, for averagely being divided by horizontal division mode by original route sequence library For n disjoint subpath sequence library；

The GSP algorithm that each Map node performs to improve is as follows,

(1) if being produced candidate's 2-path sequence pattern, scanning storage traffic network information by 1-path sequence pattern Adjacency list, check 1-path sequence pattern L₁In each path sequence pattern s₁Adjacent node, if s₁Adjacent node also In 1-path sequence pattern L₁In, will be with s₁Adjacent node project adds s to₁In；

Specific embodiment described herein is only to present invention spirit explanation for example.Technology neck belonging to the present invention Described specific embodiment can be made various amendment or supplements or use similar mode to replace by the technical staff in territory Generation, but without departing from the spirit of the present invention or surmount scope defined in appended claims.

Claims

1. a vehicle running path Forecasting Methodology, it is characterised in that: follow the steps below based on Hadoop platform,

Step 1, according to the internal memory situation of every computer in Hadoop platform, determines the minimum internal memory of all nodes, and is designated as Q, unit is GB；

Step 2, scanning storage has the original path sequence library of vehicle running path sequence, obtains original path sequence data In storehouse, the bar number scale of path sequence is m bar, and every paths sequence includes more than one crossing, in original path sequence library The actual storage size of long path sequence is designated as P, and unit is B；

Step 3, is averagely divided into n disjoint subpath sequence number by original path sequence library by horizontal division mode According to storehouse, wherein P × (m/n)≤Q × 10⁹；

Step 6, the main controlled node of Hadoop platform n step 5 uploaded a sub-path sequence data base is dispatched to different Map node, each Map node performs the GSP algorithm improved, and according to minimum support ξ set in advance, Map is left in scanning in Subpath sequence library in node memory, calculates local path sequence pattern, with<key, value>to form transmission To Reduce node, wherein key is local path sequence pattern, and value is the support counting of local path sequence pattern；

The GSP algorithm that each Map node performs to improve is as follows,

Operation a, for being assigned to the subpath sequence library of this Map node, scanning subpath sequence library obtains 1-road Footpath sequence pattern L₁, make k=1,

Operation b, by k-path sequence pattern L_kProduce candidate's k+1-path sequence C_k+1, scanning original path sequence data again Storehouse, calculates the support of each path candidate sequence, produces k+1-path sequence pattern L_k+1；Wherein, candidate k+1-path is produced Sequence C_k+1Divide the following two kinds situation,

(1) if being produced candidate's 2-path sequence pattern, the neighbour of scanning storage traffic network information by 1-path sequence pattern Connect table, check 1-path sequence pattern L₁In each path sequence pattern s₁Adjacent node, if s₁Adjacent node also at 1- Path sequence pattern L₁In, will be with s₁Adjacent node project adds s to₁In；

First, to any two path sequence pattern s in k-path sequence pattern₁And s₂If removing path sequence pattern s₁'s First project with remove path sequence pattern s₂Last project obtained by path sequence identical, then by s₁With s₂Enter Row connects；Then, prune, if certain the subpath sequence including certain path candidate sequence pattern is not path sequence mould Formula, then delete from path candidate sequence pattern；

Step 7,<key, the value>that Map node is passed over by Reduce node obtains overall situation candidate to carrying out merger process Sequence pattern；

Step 8, scanning step 4 leaves the original path sequence library in HDFS in overall situation candidate sequence pattern meter again Number, finds out and meets the sequence pattern not less than minimum support ξ set in advance, obtain global path sequence pattern；

Step 9, is produced path correlation rule by the global path sequence pattern produced in step 8 and calculates path correlation rule Confidence level, obtains vehicle running path and predicts the outcome.

2. a vehicle running path prognoses system, it is characterised in that: arrange based on Hadoop platform with lower module,

Internal memory confirms module, for according to the internal memory situation of every computer in Hadoop platform, determines internal memory in all nodes The internal memory of minimum machine, and it is designated as Q, unit is GB；

Longest path sequence confirms module, has the original path sequence library of vehicle running path sequence for scanning storage, The bar number scale obtaining path sequence in original path sequence library is m bar, and every paths sequence includes more than one crossing, former In beginning path sequence data base, the actual storage size of longest path sequence is designated as P, and unit is B；

Subpath sequence library divides module, for being averagely divided into by horizontal division mode by original path sequence library N disjoint subpath sequence library, wherein P × (m/n)≤Q × 10⁹；

Transmission module on raw data base, for uploading to original path sequence library in certain specified folder of HDFS；

Transmission module on subdata base, for uploading to n sub-path sequence data base in another specified folder of HDFS；

Local path sequence pattern module, for the n making the main controlled node of Hadoop platform be uploaded by transmission module on subdata base Individual sub-path sequence data base is dispatched to different Map nodes, and each Map node performs the GSP algorithm improved, according to setting in advance Fixed minimum support ξ, scanning is left the subpath sequence library in Map node memory in, is calculated local path sequence Pattern, with<key, value>to form pass to Reduce node, wherein key is local path sequence pattern, and value is The support counting of local path sequence pattern；

The GSP algorithm that each Map node performs to improve is as follows,

Overall situation candidate sequence mode module,<key, the value>that be used for making Reduce node pass over Map node is to carrying out Merger processes and obtains overall situation candidate sequence pattern；

Global path sequence pattern module, on scanning raw data base again, transmission module leaves the original road in HDFS in Footpath sequence library, to overall situation candidate sequence mode counting, is found out and is met the sequence not less than minimum support ξ set in advance Pattern, obtains global path sequence pattern；

Predict the outcome module, closes for being produced path by the global path sequence pattern produced in global path sequence pattern module Join rule and calculate the confidence level of path correlation rule, obtaining vehicle running path and predict the outcome.